SharePoint Disaster Recovery Design and Implementation : Defining Scope

3/22/2011 9:10:13 PM

Many administrators of information technology (IT) systems are all too familiar with that famous axiom known as Murphy’s Law, which says, “If anything can go wrong, it will.” Although it may sound fatalistic, having the expectation that one day down the road a mishap of one kind or another will happen to your SharePoint environment is an important perspective to maintain when designing and creating your organization’s disaster recovery plan. This isn’t something you should generate for the sake of crossing an item off your To-Do list or checking a check box in a survey or audit. An effective disaster recovery plan gives you a resource you can use in all situations, regardless of scope or importance. By not losing sight of the fact that this strategy is going to be used and not just gather dust somewhere, you are drastically improving your chances for a successful recovery of your business’s crucial SharePoint systems and data when the chips are down.

Defining Scope

It’s impossible to plan how you will recover your system in the event of an outage or disaster without understanding what your system is composed of and what its critical components are. For many complex environments, it simply isn’t feasible to attempt to fully restore every server, application, or database at the same time; trying to do so would add hours, days, or even weeks to the time it would take to complete this vital restoration activity. That is why the first step you must take when developing your disaster recovery plan is to define its scope and to evaluate and select the essential parts of your system that must be restored in the event of a disaster.

Note

It’s assumed that you’re not designing and developing your SharePoint environment’s disaster plan on your own, or only from an IT perspective. A disaster recovery strategy is simply part of a larger business continuity plan (BCP) that’s driven primarily by business stakeholders and the cost that is tied to outages in a SharePoint environment. Although you, as an administrator, know what infrastructure components you need to have in place to restore your environment, your users are the ones who should determine which sites are business critical, what content should be preserved at all costs, and what the acceptable levels of downtime are for these items. The results of a business impact analysis (BIA) serve as the primary guide when constructing your disaster recovery plan.

What Are Recovery Targets?

Recovery targets are the critical functions and data of your SharePoint environment that need to be restored following the declaration of a disaster. Seems pretty straightforward, doesn’t it? Well, thanks in part to the complex and modular nature of a SharePoint environment, that is not always the case.

Recovery targets are important because not only do they identify the parts of your system that need to be acknowledged and addressed in some way as a part of your disaster recovery plan, but they are the functions and data that must be restored or replaced as part of a successful recovery operation. A set of recovery targets reads like a checklist, and recovery targets are often used in this fashion during disaster recovery testing to gauge the success or failure of a recovery strategy following its execution.

How Are Your Recovery Targets Defined?

Recovery targets are defined through the process of mapping the results of a BIA (that is, the data and functionality that business stakeholders have identified as being critical in a SharePoint farm) to elements within the farm that were identified during the discovery and documentation phase . Each result from the BIA should translate to one or more technical functions and data elements within the SharePoint farm.

For example, consider a BIA that identifies a SharePoint site housing online actuarial capabilities as being highly critical to daily business operations. Technical analysis and cross-referencing of the site mentioned in the BIA might yield numerous recovery targets, including these:

The content database housing the SharePoint site containing Excel spreadsheets
The Excel Services Service Application providing online calculation functionality
The physical server that is dedicated within the farm to carry out the processor-intensive Excel calculations
The unattended service account username and password that Excel Services uses for several trusted data connections
A custom trusted data provider that is defined within the Excel Services Service Application
Several legacy line of business systems that are accessed through trusted data connections to supply data for the actuarial spreadsheets

As you can see, a seemingly straightforward business function could lead to a cascading list of technical requirements during the definition of recovery targets.

For large SharePoint farms, the recovery targets that are ultimately selected may comprise only a subset of the farm’s total functionality. This is especially true if the recovery time objective (RTO) for the functions and data specified is extremely aggressive and the disaster recovery plan involves a substantial manual effort to carry out.

What Should Be Restored?

As the results of the BIA are mapped to recovery targets, you may begin to see that some technical functions or data within your farm have a higher priority than others and that some pieces of key technical functionality or data are required to make their associated business functions available in SharePoint. It’s also perfectly normal for some technical functions to be identified as low-priority components that can be restored once your farm’s core content and technical functionality have been fully restored and verified. This kind of triage activity can be beneficial, because it helps you to focus your activities and energy on the most important aspects of your environment without getting distracted by targets of lower priority.

Often this exercise can help you understand that it isn’t a good idea to fully restore your production environment immediately after an outage. Another benefit of this analysis is the impact it can have on the architecture, configuration, and governance policies of your SharePoint farm to better position or partition key elements for recoverability based on business value and associated disaster recovery priority. Following are a few other factors that you should keep in mind as you analyze the BIA results and consider the recovery targets that result:

Content database distribution. How are sites and site collections in your farm distributed across content databases? Consider storing high-priority sites in specific or unique content databases to allow more frequent backups to be made on those databases and prevent lesser sites from using resources. Carefully distributing your sites across databases, and even database instances, can make your backup and restore processes much easier to manage and complete.
Content. What types of content or data do users store in different types of sites in your farm? Is the content that users store in their My Sites given the same recovery priority by the BIA as what they store in collaborative team sites? Your organization may already have usage and retention policies that can help to answer these questions about the contents of different types of sites and determine when they should be backed up and restored in the absence of specific directives by the BIA.
Service Applications. SharePoint Foundation uses a number of Service Applications, and SharePoint Server 2010 includes an even greater number. If your recovery strategy involves some form of manual rebuild or reconfiguration, it is important to understand the usage patterns for the Service Applications in your SharePoint farm. In the actuarial example that was mentioned earlier, Excel Services are critical to the restoration of business functionality and would likely receive a high priority for recovery. Excel Services could be run locally within the farm, or the service could be consumed from another farm entirely. Recognizing both the importance of the Service Application and the actual origin of services provided is key in the proper definition of recovery targets.
Dependent systems and interfaces. What applications or configuration items have been identified as recovery targets on your production servers to support the various functions of your SharePoint farm? Some applications provide crucial data or functionality to the users of your SharePoint farm and must be reconnected or restored as part of your farm’s restore effort. Other applications are not identified by the BIA as mission critical and are therefore not a priority.

What’s Out of Scope

It’s just as important to establish what’s out of scope for your disaster recovery plan as it is to identify what’s in scope. This isn’t a simple exercise of listing what platforms, applications, systems, or components are not included in your disaster recovery plan. Yes, such actions are definitely part of the scope definition process, but it’s also important to determine what other groups are being expected to support and identify those items deemed to be out of scope for your plan. For example, if database administrators (DBAs) external to your group manage your SharePoint databases, it may be possible to declare the disaster recovery of those databases out of scope to your plan because those DBAs will handle them.

Tip

Establishing external dependencies within a disaster recovery plan introduces risk and is not the “right” of SharePoint technical owners. Prior to portions of a plan becoming dependent on external systems or personnel, discussions with business owners and stakeholders must take place. Although SharePoint technical owners and personnel are ultimately responsible for meeting the recovery objectives identified through the BIA, business stakeholders are the ones assuming the risk and realizing the ultimate impact of a system outage.

What Are the Costs?

As professors of economics are often fond of stating, “There’s no such thing as a free lunch.” Every choice and decision you make around your disaster recovery plan has a direct impact on how much it will cost to implement that plan. Frequent backups can require extensive storage resources, as well as more time to configure, test, and maintain. Opting to restore every aspect of a farm as quickly as possible is certainly possible, but the hardware, software, and workforce resources necessary to pull off such a plan can prove prohibitively high for all but the largest of enterprises. It’s essential to understand the costs inherent in each aspect of a disaster recovery plan so that you can balance and consider them as part of the plan. You may find that the best solution is not always the right solution for your organization once you introduce costs and expenses into the equation.

Other -----------------

- SharePoint Disaster Recovery Planning and Key Concepts : Assessment and Planning

- SharePoint Disaster Recovery Planning and Key Concepts : Key Concepts and Terms

- Windows Server 2003 : Implementing VPNs (part 2) - Configuring VPN Types

- Windows Server 2003 : Implementing VPNs (part 1) - Understanding Virtual Private Networks

- Installing Microsoft SharePoint Server 2010 and Configuring PerformancePoint Services : Configuring PPS (part 5) - Activating the Feature in the Web Application

- Installing Microsoft SharePoint Server 2010 and Configuring PerformancePoint Services : Configuring PPS (part 4) - Set the Unattended Service Account & Associating the Service Application Proxy with a

- Installing Microsoft SharePoint Server 2010 and Configuring PerformancePoint Services : Configuring PPS (part 3) - Creating the PerformancePoint Service Application

- Installing Microsoft SharePoint Server 2010 and Configuring PerformancePoint Services : Configuring PPS (part 2)

- Installing Microsoft SharePoint Server 2010 and Configuring PerformancePoint Services : Configuring PPS (part 1) - Configuring the Secure Store Service

- Manage the Active Directory Domain Services Schema : Add Attributes to Ambiguous Name Resolution Filter