Windows Azure provides the following set of storage
services (collectively referred to as Windows Azure Storage), each of
which is suitable for different types of data access requirements:
Tables
provide structured storage, as they do in regular databases.
Essentially, each table consists of a set of data entities that each
contain a set of properties.
Queues provide reliable storage and delivery of messages. They are often used between roles to communicate with each other.
Blobs
are used to store large binary objects (files). They provide a simple
interface for storing named files along with metadata and include
support for CDN (Content Delivery Network).
Windows Azure Drives provide durable NTFS volumes for Windows Azure applications.
Windows
Azure Storage supplies a managed API and a REST API, both of which
essentially provide the same level of functionality. The managed API is
provided through the Microsoft.WindowsAzure.StorageClient
namespace. To interact with the storage services, you can also use
familiar programming interfaces, such as ADO.NET Data Services
(available in the .NET framework version 3.5 SP1).
Note that access to storage
is regulated via Windows Azure Storage accounts that use 256-bit secret
keys. Also, there are some storage size limitations. For example, each
storage account will have a maximum 100 terabytes of total storage
capacity.
Tables
Windows Azure Tables (WATs)
are similar to relational tables insofar as they both are used to store
structured data. However, it’s important to understand that WAT storage
is not a relational database management system for the cloud (that’s
what SQL Azure is for). In other words, there is no support for common
database features, such as joins, aggregates, stored procedures, or
indexes.
WATs were built primarily
to realize scalability, availability, and durability of data.
Individual tables can be scaled to billions of entities (rows) with data
totaling into the order of terabytes. Part of the scaling algorithm is
that as application traffic and usage grows, WATs will automatically
scale out to potentially tens, to hundreds, to thousands of servers.
With regards to availability, each WAT is replicated at least three
times.
Entities and Properties
Windows Azure Storage introduces some specific terminology and relationships for WATs:
You create a storage account, each of which can have multiple tables.
Data stored within a table is organized into entities. A database row is comparable to an entity.
Each entity contains a set of properties. A database column is comparable to a property.
A table is comprised of a set of entities, each of which is comprised of a set of properties.
Each entity contains two key properties that together form the unique ID of the entity in that table. The first key is the PartitionKey, which allows you to group entities together. This tells the Windows Azure Storage system to not split this group up when scaling out the table.
In other words, partition
keys are used to group table entities into partitions that provide a
unit of scale that Windows Azure Storage uses to properly load balance
data. Partition keys also allow you to control the physical locality of
the entity data. Everything within a partition will live on a single
server.
The second key is the RowKey,
which provides uniqueness within a partition in that the PartitionKey
together with the RowKey uniquely identify a given entity (as well as
the sort order). You can think of these two keys as a clustered index
for a table.
The third required attribute is the Timestamp,
which is a read-only attribute used to control optimistic concurrency.
That is, if you try to update a row that another program has already
updated, your update attempt will fail because of the timestamp
mismatch.
Data Access
When interacting with
entities and properties, you are provided the full range of regular data
access functions (get, insert, update, delete), in addition to special
features, such as the partial update (merge), the entire update
(replace), and the entity group transaction.
Entity group
transactions allow you to atomically perform multiple insert, update,
and delete commands over a set of entities in the same partition as part
of a single transaction.
Queues
As with traditional
messaging queues, the Windows Azure queues provide a reliable
intermediary mechanism for delivering messages. For example, a common
scenario is to set up a queue as the communication proxy between an
application’s Web role (of which there may be one or two instances) and
its worker roles (of which there can be many instances). For this
scenario you would likely set up at least two queues. The first would
allow the Web role to submit messages for the worker roles to process.
The worker roles would poll the queue for new messages until one is
received. The second queue would then be for the worker roles to
communicate back to the Web role. This architecture allows the Web role
to delegate and spread out resource-intensive work to the worker roles.
Just like with tables,
queues are scoped by the storage account that you create. An account can
have many queues, each of which can contain an unlimited amount of messages.
Also, dequeued counts are tracked, allowing you to determine how often a
given message has been dequeued by a worker process.
Queues offer a range of
data access functions, including the ability to create, delete, list,
and get/set queued metadata. Additionally, you can add (enqueue) and
retrieve (dequeue) sets of messages, and delete and “peek” at messages
individually.
Blobs
Each storage account can
have containers that can be used to store blobs. There is no limit to
the number of containers that you can have as long as they will fit into
your storage account limit.
Containers have the ability
to set public or private access policies. The private access level will
only allow access to consumers that have been given permission. Public
access allows any consumer to interact with the container’s blobs using a
URL. You can also have container metadata, which, like blob metadata,
is stored in name-value pairs.
You have two choices for
the type of blob that you can use: block and page. Both types have
characteristics that make them applicable to specific requirements.
Block Blobs
A block blob is
primarily geared towards streaming media files. Each blob is organized
into a sequential list of “blocks” that can be created and uploaded out
of order and in parallel for increased performance. Once uploaded, each
block is in an uncommitted state, meaning that you cannot access the
blob until its blocks are committed. To commit the blocks as well as
define the correct block order, you use the PutBlobList command.
Each block is immutable and is
further defined by a block ID. After you have successfully uploaded a
block, that block (identified by its block ID) cannot be changed. That
also means that if you have updated a block on-premise, then you will
need to re-upload or copy the entire block with the same block ID.
Blobs can be accessed via an
available REST API that provides standard data access operations, as
well as special functions, such as CopyBlob that allows you to copy an existing blob to a new blob name.
Page Blobs
Page
blobs are suitable for random I/O operations. With this kind of blob,
you must first pre-allocate space (up to 1TB), wherein the blob is
divided into 512-byte “pages.” To access or update a page, you must
address it using a byte offset. Another key difference is that changes
to page blobs are immediate.
You can expand the blob size
at any point by increasing its maximum size setting. You are also
allowed to shrink the blob by truncating pages. You can update a page in
one of two ways: PutPage or ClearPage.
With PutPage, you specify the payload and the range of pages, whereas
ClearPage basically zeroes out a page range up to the entire blob. There
are several other commands that can be used to work with page blobs.
Windows Azure Drive
Windows Azure Drive is a
storage service that provides a durable NTFS volume for Windows Azure
applications. An application needs to mount the volume prior to using it
and, when done, the application then unmounts the same volume.
Throughout this period, the volume data is kept intact, even if the
application should crash.
A Windows Azure Drive
volume is actually a page blob. Specifically, it exists as a page blob
that has been formatted as an NTFS single volume virtual hard drive
(VHD). As such, these drives can be up to 1TB in size.