4. Filtering data with LINQ
In the previous section, we
looked at how to filter queries server-side using the REST API. We’ll
now look at how the REST API maps onto the LINQ queries.
As you may have guessed,
LINQ queries eventually get resolved to the REST API URIs like the ones
we looked at in the previous section. This means that although LINQ has a
large and rich syntax, only those methods that map directly to the REST
API can be supported.
While you’re debugging a LINQ query in Visual Studio, you can either hover over or put a watch on a context object (such as shirtContext in figure 1) and you’ll be able to see the underlying REST API query. Figure 1 shows the REST API query for a LINQ query that returns all products in the Shirts partition.
Let’s now look at the typical queries that you’ll be able to perform.
Equality Comparisons
As you can see from the list in table 2,
only equality, range comparisons, and Boolean comparisons can be
performed using the Table service. The following queries are typical
equality comparisons that can be performed:
where shirt.RowKey == "Red Shirt"
or
where shirt.Description != "A Red Shirt"
or
where shirt.Partition == "Shirts"
&& shirt.Description != "A Red Shirt"
Range Comparisons
The Table service supports the filtering of range data using range queries. For example, the following WHERE clause will return those shirts priced at $50 or more, and less than $70:
where shirt.Price >= 50 && shirt.Price < 70
Because
data is stored in the Table service as native types, rather than as
string representations, the Table service will perform comparison
routines using the native types rather than string comparisons. The
following query will return all shirts whose price is greater than or
equal to $50.20:
where shirt.Price >= 50.20
If this query were
performed as a string comparison (which you would have to do with Amazon
SimpleDB), it would not return shirts priced at $60 (because there are
fewer characters in the string than 50.20) unless the price were stored
as 60.00.
In Windows Azure Table
service, the only time you need to worry about performing equivalent
string comparisons is if you store a non-native string type as a
partition or row key. Partition and row keys are always represented as
strings in the Table service, so if you need to perform range
comparisons on these entities, you’ll need to ensure that the string
lengths of the stored data are correct.
Boolean Logic
As stated earlier, the Table
service does respect property types. This means you can perform Boolean
logic against entity properties that are defined as bool. For example, you could perform the following WHERE clause against a shirt that’s marked as a genuine Hawaiian shirt:
where shirt.IsMadeInHawaii && shirt.Price > 50
Prefix Queries
Using the range comparison
and Boolean logic, you can manipulate your LINQ and REST queries to
return all entities that start with a particular string. For example, if
you wanted to return all shirts that were present in any of partition1, partition2, partition3, or partition4, you could use the following query:
where shirt.PartitionKey.CompareTo("Partition1") >= 0 &&
shirt.PartitionKey.CompareTo("Partition5") < 0
LINQ to Objects Queries
Even though only a small
subset of the LINQ syntax is available to be executed by the Table
service, you can still perform in-memory LINQ queries (LINQ to Objects).
In-memory LINQ queries do provide full access to the LINQ syntax, but
all queries are executed on the client side, so they require the full
dataset to be returned by the Table service first. This approach isn’t
suitable for situations where you’re working with a large set of data.
By now you should have a taste
of the types of queries that you can perform against the Table service.
Let’s now look at how you can shape the data that’s returned from your
queries.
5. Selecting data using the LINQ syntax
As you’ll have noticed in the supported LINQ syntax list (table 2), there was no mention of the SELECT statement. You can use the SELECT statement to return the entire entity, but you can’t use SELECT to instruct the Table service to only return a subset of the entity properties.
Returning an Entire Entity Using Select
To illustrate the limitations of using SELECT, let’s look again at a LINQ query that returns a product entity in its entirety:
var shirts = from shirt in shirtContext.Products
where shirt.PartitionKey == "Shirts"
select shirt;
This LINQ query was used
earlier to return all entities that reside in the Products table. The
following code is an Atom XML extract of one of the entities returned by
the preceding LINQ query:
<content type="application/xml">
<m:properties>
<d:PartitionKey>Shirts</d:PartitionKey>
<d:RowKey>shirts0</d:RowKey>
<d:Timestamp m:type="Edm.DateTime">
2009-07-29T21:14:45.022Z
</d:Timestamp>
<d:Description>A Shirt</d:Description>
<d:Name>shirtshirts0</d:Name>
</m:properties>
</content>
As you can see from the XML for the returned entity, every property of the product entity is returned by the Table service (PartitionKey, RowKey, Timestamp, Description, and Name).
If the Products table was
held in SQL Server rather than the Table service, and the LINQ
statement was executed against the database using LINQ2SQL or
LINQ2Entities, the following SQL statement would be generated and
executed on the SQL Server database:
SELECT PartitionKey, RowKey, Timestamp, Description, Name
FROM Products
WHERE PartitionKey = 'Shirts'
Shaping the Query
If you’re using
LINQ2SQL or LINQ2Entities with a SQL Server database, and you don’t need
to return the entire entity, you might choose to write a more efficient
LINQ query that only requests and returns specific columns from the SQL
Server Database. The following SQL statement requests just the Name and Description properties:
SELECT Name, Description
FROM Products
WHERE PartitionKey="Shirts"
The preceding SQL
statement is less intensive to execute on the server (as there is less
data being queried) and it will also use less network bandwidth due to
the reduced dataset being returned to the application.
When you’re using LINQ2SQL or LINQ2Entities, you can modify your less efficient LINQ statements, like this:
to generate the more efficient SQL statement:
select new
{
Name = newShirt.Name,
Description = newShirt.Description
};
This would modify the previous select entity LINQ statement so it looks like this:
var shirts = from shirt in shirtContext.Products
where shirt.PartitionKey == "Shirts"
select new
{
Name = newShirt.Name,
Description = newShirt.Description
};
Unfortunately, because the Table service doesn’t support data shaping using the SELECT
statement, you’d get a nasty exception if you attempted to run the
preceding LINQ query. As a result, whenever you execute queries against
the Table service, every property of the entity will always be returned
as part of the query.
If you really do need to shape
the returned data in your application, and you don’t mind that the
entire entity will be returned from the server, you can always shape it
locally using the following code:
var shirts = from newShirt in
(
from shirt inshirtContext.Products
where shirt.PartitionKey == "Shirts"
select shirt
).ToList()
select new
{
Name = newShirt.Name,
Description = newShirt.Description
};
The preceding code uses the same LINQ query as in section 2 to filter the data in the Table service, but this time it returns the entire entity. By calling the ToList method on the inner LINQ query, you can ensure that the server-side query will return all properties of the entity.
Finally, the result of the ToList
method is fed into the outer LINQ2Object query, which performs
in-memory shaping of the data, returning a new anonymous type containing
the two properties that you want.
You
should be aware that although this query returns the entities shaped as
you specify, it won’t improve server-side or bandwidth efficiency. If
you have a very large entity with an infrequently used property that you
don’t need in a particular query, this unused property will still be
returned by the Table service.
6. Paging data
By default, SELECT
queries will only return 1,000 items in a single result set. Not only is
this the default amount of data returned, but it’s also the maximum
amount of data returned.
If you wish to return a smaller amount of data, you can set this with the Take statement in LINQ, as follows:
(from shirt inshirtContext.Products
where shirt.PartitionKey == "Shirts"
select shirt).Take(100);
The preceding LINQ statement will return the first 100 items in the Shirts partition. The LINQ Take extension method will be resolved to the following query string parameter in the URI for the REST API call:
If more items could be
returned by the query than are present in the result set, continuation
tokens will be provided to allow you to retrieve the next set of data in
the query. This method of using continuation tokens effectively
provides a method of paging.
If you wanted to return all items in the Shirts partition of the Products table, but it potentially contains more than 1,000 items, you could run the following REST API query:
http://silverlightukstorage.table.core.windows.net/Products?$filter=PartitionKey%20eq %20'Shirts'
Because more than 1,000 items
would normally be returned in the query, you’ll receive the following
continuation tokens in the response:
x-ms-continuation-NextPartitionKey: Shirts
x-ms-continuation-NextRowKey: 1001
If you wanted to return all the items in the Shirts
partition that were not returned as part of the original query, you
could retrieve the next set of data using the following query:
http://silverlightukstorage.table.core.windows.net/Products?$filter=PartitionKey%20eq %20'Shirts'&NextPartitionKey=Shirts&NextRowKey=1001
The preceding query would return all products in the Shirts partition from RowKey 1001 onwards, or at least the next 1,000 entities.