Cloud Services with Windows Azure : Windows Azure Storage

4/16/2011 3:35:51 PM

Windows Azure provides the following set of storage services (collectively referred to as Windows Azure Storage), each of which is suitable for different types of data access requirements:

Tables provide structured storage, as they do in regular databases. Essentially, each table consists of a set of data entities that each contain a set of properties.
Queues provide reliable storage and delivery of messages. They are often used between roles to communicate with each other.
Blobs are used to store large binary objects (files). They provide a simple interface for storing named files along with metadata and include support for CDN (Content Delivery Network).
Windows Azure Drives provide durable NTFS volumes for Windows Azure applications.

Windows Azure Storage supplies a managed API and a REST API, both of which essentially provide the same level of functionality. The managed API is provided through the Microsoft.WindowsAzure.StorageClient namespace. To interact with the storage services, you can also use familiar programming interfaces, such as ADO.NET Data Services (available in the .NET framework version 3.5 SP1).

Note that access to storage is regulated via Windows Azure Storage accounts that use 256-bit secret keys. Also, there are some storage size limitations. For example, each storage account will have a maximum 100 terabytes of total storage capacity.

Tables

Windows Azure Tables (WATs) are similar to relational tables insofar as they both are used to store structured data. However, it’s important to understand that WAT storage is not a relational database management system for the cloud (that’s what SQL Azure is for). In other words, there is no support for common database features, such as joins, aggregates, stored procedures, or indexes.

WATs were built primarily to realize scalability, availability, and durability of data. Individual tables can be scaled to billions of entities (rows) with data totaling into the order of terabytes. Part of the scaling algorithm is that as application traffic and usage grows, WATs will automatically scale out to potentially tens, to hundreds, to thousands of servers. With regards to availability, each WAT is replicated at least three times.

Entities and Properties

Windows Azure Storage introduces some specific terminology and relationships for WATs:

You create a storage account, each of which can have multiple tables.
Data stored within a table is organized into entities. A database row is comparable to an entity.
Each entity contains a set of properties. A database column is comparable to a property.
A table is comprised of a set of entities, each of which is comprised of a set of properties.

Each entity contains two key properties that together form the unique ID of the entity in that table. The first key is the PartitionKey, which allows you to group entities together. This tells the Windows Azure Storage system to not split this group up when scaling out the table.

In other words, partition keys are used to group table entities into partitions that provide a unit of scale that Windows Azure Storage uses to properly load balance data. Partition keys also allow you to control the physical locality of the entity data. Everything within a partition will live on a single server.

The second key is the RowKey, which provides uniqueness within a partition in that the PartitionKey together with the RowKey uniquely identify a given entity (as well as the sort order). You can think of these two keys as a clustered index for a table.

The third required attribute is the Timestamp, which is a read-only attribute used to control optimistic concurrency. That is, if you try to update a row that another program has already updated, your update attempt will fail because of the timestamp mismatch.

Data Access

When interacting with entities and properties, you are provided the full range of regular data access functions (get, insert, update, delete), in addition to special features, such as the partial update (merge), the entire update (replace), and the entity group transaction.

Entity group transactions allow you to atomically perform multiple insert, update, and delete commands over a set of entities in the same partition as part of a single transaction.

Queues

As with traditional messaging queues, the Windows Azure queues provide a reliable intermediary mechanism for delivering messages. For example, a common scenario is to set up a queue as the communication proxy between an application’s Web role (of which there may be one or two instances) and its worker roles (of which there can be many instances). For this scenario you would likely set up at least two queues. The first would allow the Web role to submit messages for the worker roles to process. The worker roles would poll the queue for new messages until one is received. The second queue would then be for the worker roles to communicate back to the Web role. This architecture allows the Web role to delegate and spread out resource-intensive work to the worker roles.

Just like with tables, queues are scoped by the storage account that you create. An account can have many queues, each of which can contain an unlimited amount of messages. Also, dequeued counts are tracked, allowing you to determine how often a given message has been dequeued by a worker process.

Queues offer a range of data access functions, including the ability to create, delete, list, and get/set queued metadata. Additionally, you can add (enqueue) and retrieve (dequeue) sets of messages, and delete and “peek” at messages individually.

Blobs

Each storage account can have containers that can be used to store blobs. There is no limit to the number of containers that you can have as long as they will fit into your storage account limit.

Containers have the ability to set public or private access policies. The private access level will only allow access to consumers that have been given permission. Public access allows any consumer to interact with the container’s blobs using a URL. You can also have container metadata, which, like blob metadata, is stored in name-value pairs.

You have two choices for the type of blob that you can use: block and page. Both types have characteristics that make them applicable to specific requirements.

Block Blobs

A block blob is primarily geared towards streaming media files. Each blob is organized into a sequential list of “blocks” that can be created and uploaded out of order and in parallel for increased performance. Once uploaded, each block is in an uncommitted state, meaning that you cannot access the blob until its blocks are committed. To commit the blocks as well as define the correct block order, you use the PutBlobList command.

Each block is immutable and is further defined by a block ID. After you have successfully uploaded a block, that block (identified by its block ID) cannot be changed. That also means that if you have updated a block on-premise, then you will need to re-upload or copy the entire block with the same block ID.

Blobs can be accessed via an available REST API that provides standard data access operations, as well as special functions, such as CopyBlob that allows you to copy an existing blob to a new blob name.

Page Blobs

Page blobs are suitable for random I/O operations. With this kind of blob, you must first pre-allocate space (up to 1TB), wherein the blob is divided into 512-byte “pages.” To access or update a page, you must address it using a byte offset. Another key difference is that changes to page blobs are immediate.

You can expand the blob size at any point by increasing its maximum size setting. You are also allowed to shrink the blob by truncating pages. You can update a page in one of two ways: PutPage or ClearPage. With PutPage, you specify the payload and the range of pages, whereas ClearPage basically zeroes out a page range up to the entire blob. There are several other commands that can be used to work with page blobs.

Windows Azure Drive

Windows Azure Drive is a storage service that provides a durable NTFS volume for Windows Azure applications. An application needs to mount the volume prior to using it and, when done, the application then unmounts the same volume. Throughout this period, the volume data is kept intact, even if the application should crash.

A Windows Azure Drive volume is actually a page blob. Specifically, it exists as a page blob that has been formatted as an NTFS single volume virtual hard drive (VHD). As such, these drives can be up to 1TB in size.

Other -----------------

- A REST Service in Windows Azure

- Cloud Services with Windows Azure : A Web Service in Windows Azure

- Cloud Services with Windows Azure : Hello World in Windows Azure

- Cloud Services with Windows Azure : Windows Azure Roles

- Cloud Services with Windows Azure : Windows Azure Platform Overview

- Cloud Services with Windows Azure : Cloud Computing 101

- SOA with .NET and Windows Azure : Orchestration Patterns with WF - Compensating Service Transaction

- SOA with .NET and Windows Azure : Orchestration Patterns with WF - State Repository

- SOA with .NET and Windows Azure : Orchestration Patterns with WF - Process Centralization

- SOA with .NET and Windows Azure : Process Abstraction and Orchestrated Task Services (part 4) - Publishing WF Workflows as REST Services