Many administrators of
information technology (IT) systems are all too familiar with that
famous axiom known as Murphy’s Law, which says, “If anything can go
wrong, it will.” Although it may sound fatalistic, having the
expectation that one day down the road a mishap of one kind or another
will happen to your SharePoint environment is an important perspective
to maintain when designing and creating your organization’s disaster
recovery plan. This isn’t something you should generate for the sake of
crossing an item off your To-Do list or checking a check box in a survey
or audit. An effective disaster recovery plan gives you a resource you
can use in all situations, regardless of scope or importance. By not
losing sight of the fact that this strategy is going to be used and not
just gather dust somewhere, you are drastically improving your chances
for a successful recovery of your business’s crucial SharePoint systems
and data when the chips are down.
Defining Scope
It’s impossible to plan how
you will recover your system in the event of an outage or disaster
without understanding what your system is composed of and what its
critical components are. For many complex environments, it simply isn’t
feasible to attempt to fully restore every server, application, or
database at the same time; trying to do so would add hours, days, or
even weeks to the time it would take
to complete this vital restoration activity. That is why the first step
you must take when developing your disaster recovery plan is to define
its scope and to evaluate and select the essential parts of your system
that must be restored in the event of a disaster.
Note
It’s assumed that you’re
not designing and developing your SharePoint environment’s disaster plan
on your own, or only from an IT perspective. A disaster recovery strategy is simply part of a
larger business continuity plan (BCP) that’s driven primarily by
business stakeholders and the cost that is tied to outages in a
SharePoint environment. Although you, as an administrator, know what
infrastructure components you need to have in place to restore your
environment, your users are the ones who should determine which sites
are business critical, what content should be preserved at all costs,
and what the acceptable levels of downtime are for these items. The
results of a business impact analysis (BIA) serve as the primary guide
when constructing your disaster recovery plan.
What Are Recovery
Targets?
Recovery targets are the critical functions and data of your
SharePoint environment that need to be restored following the
declaration of a disaster. Seems pretty straightforward, doesn’t it?
Well, thanks in part to the complex and modular nature of a SharePoint
environment, that is not always the case.
Recovery targets are
important because not only do they identify the parts of your system
that need to be acknowledged and addressed in some way as a part of your
disaster recovery plan, but they are the functions and data that must
be restored or replaced as part of a successful recovery operation. A
set of recovery targets reads like a checklist, and recovery targets are
often used in this fashion during disaster recovery testing to gauge
the success or failure of a recovery strategy following its execution.
How Are Your Recovery
Targets Defined?
Recovery targets are
defined through the process of mapping the results of a BIA (that is,
the data and functionality that business stakeholders have identified as
being critical in a SharePoint farm) to elements within the farm that
were identified during the discovery and documentation phase . Each result from the BIA should translate to one
or more technical functions and data elements within the SharePoint
farm.
For example, consider a
BIA that identifies a SharePoint site housing online actuarial
capabilities as being highly critical to daily business operations.
Technical analysis and cross-referencing of the site mentioned in the
BIA might yield numerous recovery targets, including these:
The content
database housing the SharePoint site containing Excel spreadsheets
The Excel Services Service Application providing online
calculation functionality
The physical server that is dedicated within the farm
to carry out the processor-intensive Excel calculations
The unattended service
account username and password that Excel Services uses for several
trusted data connections
A custom trusted data provider that is defined within the
Excel Services Service Application
Several legacy line of business systems that
are accessed through trusted data connections to supply data for the
actuarial spreadsheets
As you can see, a
seemingly straightforward business function could lead to a cascading
list of technical requirements during the definition of recovery
targets.
For large SharePoint farms,
the recovery targets that are ultimately selected may comprise only a
subset of the farm’s total functionality. This is especially true if the
recovery time objective (RTO) for the functions and data specified is
extremely aggressive and the disaster recovery plan involves a
substantial manual effort to carry out.
What Should Be
Restored?
As the results of the BIA are
mapped to recovery targets, you may begin to see that some technical
functions or data within your farm have a higher priority than others
and that some pieces of key technical functionality or data are required
to make their associated business functions available in SharePoint.
It’s also perfectly normal for some technical functions to be identified
as low-priority components that can be restored once your farm’s core
content and technical functionality have been fully restored and
verified. This kind of triage activity can be beneficial, because it
helps you to focus your activities and energy on the most important
aspects of your environment without getting distracted by targets of
lower priority.
Often this exercise can help
you understand that it isn’t a good idea to fully restore your
production environment immediately after an outage. Another benefit of
this analysis is the impact it can have on the architecture,
configuration, and governance policies of your SharePoint farm to better
position or partition key elements for recoverability based on business
value and associated disaster recovery priority. Following are a few
other factors that you should keep in mind as you analyze the BIA
results and consider the recovery targets that result:
Content database
distribution. How are sites and
site collections in your farm distributed across content databases?
Consider storing high-priority sites in specific or unique content
databases to allow more frequent backups to be made on those databases
and prevent lesser sites from using resources. Carefully distributing
your sites across databases, and even database instances, can make your
backup and restore processes much easier to manage and complete.
Content. What types of content or data do users store in different
types of sites in your farm? Is the content that users store in their My
Sites given the same recovery priority by the BIA as what they store in
collaborative team sites? Your organization may already have usage and
retention policies that can help to answer these questions about the
contents of different types of sites and determine when they should be
backed up and restored in the absence of specific directives by the BIA.
Service Applications. SharePoint Foundation uses a number of
Service Applications, and SharePoint Server 2010 includes an even
greater number. If your recovery strategy involves some form of manual
rebuild or reconfiguration, it is important to understand the usage
patterns for the Service Applications in your SharePoint farm. In the
actuarial example that was mentioned earlier, Excel Services are
critical to the restoration of business functionality and would likely
receive a high priority for recovery. Excel Services could be run
locally within the farm, or the service could be consumed from another
farm entirely. Recognizing both the importance of the Service
Application and the actual origin of services provided is key in the
proper definition of recovery targets.
Dependent systems and interfaces. What applications or configuration items
have been identified as recovery targets on your production servers to
support the various functions of your SharePoint farm? Some applications
provide crucial data or functionality to the users of your SharePoint
farm and must be reconnected or restored as part of your farm’s restore
effort. Other applications are not identified by the BIA as mission
critical and are therefore not a priority.
What’s Out of Scope
It’s just as important
to establish what’s out of scope for your disaster recovery plan as it
is to identify what’s in scope. This isn’t a simple exercise of listing
what platforms, applications, systems, or components are not included in
your disaster recovery plan. Yes, such actions are definitely part of
the scope definition process, but it’s also important to determine what
other groups are being expected to support and identify those items
deemed to be out of scope for your plan. For example, if database
administrators (DBAs) external to your group manage your SharePoint
databases, it may be possible to declare the disaster recovery of those
databases out of scope to your plan because those DBAs will handle them.
Tip
Establishing external
dependencies within a disaster recovery plan introduces risk and is not
the “right” of SharePoint technical owners. Prior to portions of a plan
becoming dependent on external systems or personnel, discussions with
business owners and stakeholders must take place. Although SharePoint
technical owners and personnel are ultimately responsible for meeting
the recovery objectives identified through the BIA, business
stakeholders are the ones assuming the risk and realizing the ultimate
impact of a system outage.
What Are the Costs?
As
professors of economics are often fond of stating, “There’s no such
thing as a free lunch.” Every choice and decision you make around your
disaster recovery plan has a direct impact on how much it will cost to
implement that plan. Frequent backups can require extensive storage
resources, as well as more time to configure, test, and maintain. Opting
to restore every aspect of a farm as quickly as possible is certainly
possible, but the hardware, software, and workforce resources necessary
to pull off such a plan can prove prohibitively high for all but the
largest of enterprises. It’s essential to understand the costs inherent
in each aspect of a disaster recovery plan so that you can balance and
consider them as part of the plan. You may find that the best solution
is not always the right solution for your organization once you
introduce costs and expenses into the equation.