Using LINQ for Data Access


Introduction

Data access is an underlying aspect of almost all applications, whether through external sources such as XML and databases, or information internal to the application. For a long time, despite advancements in object-oriented programming, there was a lack of good facilities for querying and manipulating data.  Then, along came .NET Language Integrated Query (LINQ), providing a host of query-capabilities.

LINQ (pronounced “link” or sometimes “lin-q”) is a set of technologies integrated into .NET programming languages such as C#, which allows for simpler data-querying code, while still experiencing the full benefits of Visual Studio’s coding support tools. Instead of manually looping through objects and properties, one can achieve the same results using LINQ’s standard query operators. Many of these will be familiar to those who have worked with SQL, with Select, Where, and OrderBy being three of the most fundamental operators.

LINQ was introduced by Microsoft in .NET Framework 3.5, and getting started with it is as easy as

using System.Linq;

Anything that implements the IEnumerable<T> interface can be queried using LINQ. Getting technical, the IEnumerable interface exposes the enumerator of the collection, providing a means by which to iterate over it – a core requirement of data-querying. Of course, everything that LINQ can do can be done manually; albeit in a more verbose manner.

At Sinara, we have used LINQ successfully in several applications. This article will introduce its syntax, discuss the reasons for its use, and provide some of our insights into its merits and shortcomings.

Why Use LINQ?

Previously, if a .NET programmer wanted to query a collection of data, they would have needed to write a series of tedious and ugly nested loops to iterate through it. Now, the same results can be achieved through LINQ statements which are less verbose, more coherent, and which read similarly to SQL, the language classically associated with data access. LINQ’s query-expressions allows you to utilise your chosen IDE’s full arsenal of coding support tools, from syntax-checking to intelligent code completion.

Take the following example: you have a list of users, and each user has a number of assets. You want a list of all users in possession of an asset with a value greater than 1000. To do this without LINQ, you would iterate through your list of users (List<User> users), like so:

List<User> matches = new List<User>();
foreach (User user in users)
{
 foreach (Asset asset in user.Assets)
 {
  if (Asset.Value > 1000)
  {
   matches.Add(user);
   break;
  }
 }
}

The above example demonstrates that when objects contain collections, it can be unnecessarily complicated to scan through and acquire data – and it’s easy to lose track of what’s going on in nested loops. By contrast, LINQ (with lambda expressions) can not only condense the code, but also make it more coherent:

IEnumerable<User> matches = users.Where(
 u => u.Assets.Any(a => a.Value > 1000)
 );

In this version of the query, a LINQ Where operator filters the collection of users using a lambda expression. The lambda expression (denoted by ‘=>’) takes each user (u) in users and applies an operation to it. In the above case, each User’s Assets are scanned to see if Any of them fulfil a given condition: having value (a.Value) greater than 1000.

If you are not a fan of lambda expressions, LINQ also supports syntax more similar to SQL:

IEnumerable<User> matches = from user in users
                            from asset in user.Assets
                            where asset.Value > 1000
                            select user;

Notice that the LINQ query returns an IEnumerable. This is another standout feature of LINQ; deferred execution. Often, you may be building a list of objects, passing from method to method, until finally reaching a point where you wish to perform an operation upon your final collection. With LINQ, the choice is yours as to when the list is enumerated: until you call a method like ToList(), the collection remains IQueryable, and none of the query operations are actually performed. It is not until the moment that your code actually requests the data (with ToList() etc.) that LINQ will execute your query. You can think of an IEnumerable collection as like Schrödinger’s Cat: it is only when you choose to enumerate it that it returns data. Just as you cannot know if the cat is alive or dead until you look in the box, you cannot know the data within the IEnumerable until you force it to execute by performing an enumeration of it.

The power of deferred execution is particularly visible when filtering and mapping data: deferring the execution ensures that the collection is iterated through only once, with each item in the iteration having all filters and mappings applied to them at once. Consider the code below:

IEnumerable<User> matches = Users.Where(
 u => u.Type == UserType.Trader
 );
// (...)
matches = matches.Where(
  u => u.Assets.Any(a => a.Value > 1000)
 );
// (...)
List<User> matchesList = matches.ToList();

Even though additional conditions are added to the matches query throughout the code, it is the final line that actually triggers the enumeration of matches. There’s only one interaction with the collection, improving the code’s efficiency.

Amongst the benefits discussed above, LINQ makes your code more readable, understandable, and concise, and boosts developer productivity by making data access code quicker to write and easier to maintain. For agile development, using LINQ in your back-end can help to reduce the headaches caused by changing requirements.

What About Its Performance?

At Sinara, we pride ourselves on high-performance financial software: let’s make that very clear. Performance is an important feature in many applications, particularly those in the finance sector. Though LINQ and high-performance applications are not incompatible, in many cases it is true (but misleading) to say that LINQ is slower than a more “classic” manual iteration.  What needs to be comprehended is the scale of the speeds under comparison.

If you could run 100 m twice as fast as me, that would be impressive. If we were choosing who best to compete in the Olympics, we would probably rule me out. But if I were to tell you that I could blink twice as fast as you, I might be correct, but it would hardly matter. Blinking is incredibly fast anyway, so what does it matter who can do it the fastest? Your blink might take 100% longer than mine, but that’s a matter of 150 ms. We certainly wouldn’t write off your eyes as inferior on this basis alone.

A similar line of reasoning applies to LINQ. Yes, it is technically slower; however, we’re talking about a difference on the scale of 10 μs (depending on list size). In all but a handful of business application components, the productivity and maintenance benefits of using LINQ will far outweigh other considerations. Nevertheless, it should be noted that contrary to the frivolous example above, in certain components of performance-critical systems you obviously will be counting every millisecond. Indeed, at Sinara we have produced components where classic stored procedures have been used instead of LINQ for just this reason.

Following on from the user and asset querying examples given above, below are the results of a study comparing the performance of the two querying methods, using code similar to the examples.

Figure 1: The performance speed of LINQ vs. Manual looping, using the code analogous to that shown in the previous section. Every duration measurement was repeated 100 times, each with a randomly populated list of objects (in each case, the same randomly generated list was queried by both methods).

Figure 2: A condensed set of the data from Figure 1, showing only the smaller list sizes to display a better perspective in the y axis.

It is fair to draw the conclusion that for all list sizes, LINQ is outperformed by manual looping. However, we should disregard the performance of the two querying methods relative to one another and focus our attention on the raw speed of each type of query: with list sizes up to 5000, we’re talking about a fraction of a millisecond. Even at list sizes on the scales of 104, both LINQ and manual looping can operate over a few milliseconds. For all except the most performance-critical components of a typical business application, this is negligible when compared to the overall benefits of using LINQ in your project.

LINQ & Databases

An important use of LINQ’s querying capabilities is for database access. When users request information from a system, the chances are that the answer lies in the database. In addition to LINQ itself, the .NET Framework provides facilities such as LINQ-to-SQL and Entity Framework, which provide a C# interface to the database, called a context, which enables the use of LINQ for operations on the database.

Database operations are often very similar to querying lists. Taking our earlier example, of looking for users with assets of value over 1000, imagine that instead of an in-memory list in our C# code, “Users” is a database table. Using the database context, you can query the database table just as you would a list:

IEnumerable<User> matches = _dbContext.Users.Where(
 u => u.Assets.Any(a => a.Value > 1000)
 );

It’s not just Read actions that are this simple. Creating, Updating, and Deleting can be accessed in similar ways:

// Create a User
User newUser = new User("New_User", "trader123@exchange.com",
 UserType.Trader);
dbContext.Users.Add(newUser);

// Update a User
User userToUpdate = dbContext.Users.Single(
 u => u.Email == "traderA@exchange.com"
 );
userToUpdate.Email = "trader_update@exchange.com";

// Delete a User
User userToDelete = Users.Single(
 u => u.Email == "trader@exchange.com"
);
dbContext.Users.Remove(userToDelete);

What About Stored Procedures?

When compared to stored procedures, a practice that many favour for database access, direct access using LINQ offers a number of benefits. Type safety is an outstanding feature of LINQ, eliminating any fears of unexpected or undefined object responses. Debugging is arguably easier in LINQ, whereby you can utilise the full support of your chosen .NET IDE; debugging stored procedures can be an arduous process, depending on your database management system. For newcomers, using LINQ can be easier than learning T-SQL and implementing an API to call stored procedures.

However, for database access, query performance can be a concern. For complicated queries involving multiple joins, the SQL generated by LINQ can sometimes be overly complex, which can result in execution times significantly slower than that of well optimised manually written code. This can be particularly apparent when dealing with very large databases. This means that even if you are using LINQ for most of your database operations, stored procedures should still be considered a good choice for complex queries–this is the practice we have developed at Sinara. Nevertheless, the quality of generated code keeps improving with successive .NET releases and will be sufficient for most purposes.

Stored procedures also offer a certain amount of flexibility in redeployment, an undertaking in software development that clients rarely enjoy! Stored procedures often contain a good deal of business logic, so in some cases they can be safely altered and redeployed without needing to recompile the application itself. This offers benefits in testing, since only a small area of the system is affected, and also provides an easy means of rolling back in the rare case that an error occurs.

Conclusion

Hopefully, from what has been outlined above, you have gained some perspective on the reason for LINQ’s existence and an insight into its uses. With the wide spectrum of coding facilities available to you, the choice is yours as to whether LINQ is right for you and your application.