2.
The BLOB service approach to file management
As we discovered earlier,
the BLOB storage service is the Windows Azure solution to providing file
storage. Let’s take a look at how Azure implements this service.
An API-Based Service
Rather than building a native
network-share-based solution, Microsoft has provided a set of REST-based
APIs that allow you to interact with all the storage services over the
HTTP stack, using a standard HTTP request. As mentioned earlier, not
only can you use these APIs inside the data center, but you can also use
them outside the data center.
Note
Although you can upload
and download files outside the data center, you’ll be subject to
internet speed; it might take you a few hours to upload or download
gigabytes of data. Within the data center, you can copy gigabytes of
data between BLOB storage and a worker or web role in seconds. This
massive speed difference is the result of the co-location of the storage
service and the roles.
Scalability
Using HTTP as the
underlying transport layer means that Windows Azure can leverage the web
role infrastructure inside Windows Azure to host the storage services.
By using the web role infrastructure to host the Windows Azure storage
service (with tens of thousands of instances), you can be confident that
your application will be able scale to that level. Figure 4 shows the abstraction of web instances for the BLOB
storage service.
Because BLOB storage is
built on the web role infrastructure, web roles can also harness the
advantages of utility computing. As the demand for the storage services
increases, Microsoft can ramp up the number of instances just like it
can for any other web role. You don’t need to worry about the
scalability of any of the storage services (unless Microsoft runs out of
pennies).
Disk Storage
Just as there are
thousands of racks of machines used to host the web and worker roles,
there are just as many disk arrays storing your data! Microsoft can grow
the storage required in the data center by adding more disks as and
when required. This level of enterprise-class storage means that you
never need to worry about capacity or scale. Think of the BLOB service
as a giant virtual hard disk that will always scale up to meet your
demands and never run out of space.
Data Consistency with
Replication
Like the DFS solution,
Windows Azure BLOB storage is also a replicated solution (to be honest,
you have to be to achieve such massive scale). Although the BLOB service
is quite similar to the Amazon Simple Storage System (Amazon S3),
replication is one of the areas in which it differs.
With Amazon S3, there’s no
consistency of data throughout the data center. If you upload a file to
Amazon S3 and then request that same file, it’s likely that a different
server will process that request. As a result of network latency, the
file probably won’t be available to the new server because the data
won’t have been replicated from the original server yet. Amazon S3
suffers from the same issues seen with DFS.
This issue of replication
latency can never occur in Windows Azure storage services. Windows Azure
guarantees a consistent view of your data across all instances that
might serve your requests. Internally, inside the Windows Azure storage
services, data is replicated throughout the data center as soon as it’s
written to your storage account. Every piece of data must be replicated
at least three times as part of the commit process.
As your data is being
replicated across the various disks in Windows Azure, the FC keeps track
of which instances can access the latest version of your data. The load
balancer will route requests only to an instance that can access the
latest version, ensuring that stale data is never served.
Even if a disk
failure occurs immediately after the upload, there won’t be any data
loss; other disks are guaranteed to receive a copy of that data.
So far we’ve talked about how
BLOB storage solves the problems of scalability and fault tolerance,
but we haven’t talked about performance. Surely performance is going to
suffer; it’s effectively a REST-based web service, after all.
Performance
Sure, the performance of BLOB storage in
comparison to SANs or DASs isn’t all that great. Ultimately that
tradeoff between performance, fault tolerance, and scalability means
that performance is lost. However, within the data center, it’s
generally good enough performance. Because the service is ultimately a
load balanced web server, you can expect 50 milliseconds to 100
milliseconds of latency between your role and the storage service.
Although the latency is poor, the network connection is fast, so you can
expect good enough performance. Sure, you wouldn’t allow an application
that needs to write to disk very quickly (for example, SQL Server) to
write directly to BLOB storage, but not all applications need that kind
of speed.
If you do need that level of
speed, you can always cache files locally on your role using local
storage. This technique will usually give you more acceptable
performance for your application. In fact, this is exactly what the
Azure Drive (originally called X-Drive) feature uses to ensure
performance.
Although the REST API is
flexible and provides great scale, it’s no substitute for a good old
filesystem. To make life a little easier for those bits of code that are
used to talk to directories and files rather than to a web service,
Microsoft has provided a new feature called Azure Drive. Azure Drive
allows you to mount BLOB storage as a New Technology File System (NTFS)
drive, which lets you access BLOB storage just like any other drive.
Because this feature is implemented using a special OS driver that was
developed specifically for Windows Azure, this feature is only available
to your roles; it’s not available outside the data center.
As cool as Azure Drive is,
it allows only one instance of a role to read and write to the Azure
Drive. Multiple role instances can mount the same Azure Drive, but only
in a read-only mode, and only against a snapshot of the drive itself.
|
Now that we’ve looked at how
BLOB storage handles the issues that arise in traditional on-premises
solutions, it’s worth looking at BLOB storage from a data management
perspective.
Management
One of the most
compelling arguments for using the Windows Azure storage services is
that IT professional management skills aren’t required. In traditional
systems, a large investment in IT management skills is usually needed to
support storage. Management of the storage arrays usually requires
expensive specialists who are capable of supporting the data, such as
SAN experts, network specialists, technicians, administrators, and DBAs.
To plan such a system, these
experts need to be able to design and implement the infrastructure,
taking disk management, fault tolerance, networking, lights-out
operation, and data distribution into consideration. The day-to-day
running of the system includes hardware
replacement, managing backups, optimizing infrastructure, health
monitoring, and data cleansing, among other endless tasks.
With Windows Azure, you can let
Microsoft manage the storage systems and concentrate on using the
system via familiar developer APIs. You can focus on your core skill
set, which is building software.