SharePoint Disaster Recovery Planning and Key Concepts : Key Concepts and Terms

3/22/2011 9:05:58 PM

The domain of business continuity planning possesses a somewhat unique set of concepts, terms, and processes. To continue building on the concepts and drivers associated with disaster recovery planning, Figure 1 zooms out to look at the larger, more holistic process of business continuity planning and where SharePoint disaster recovery planning fits into it.

Figure 1. The stages of business continuity planning.

As illustrated in Figure 1, business continuity planning involves three distinct stages:

The risk assessment. The risk assessment is where disaster recovery planning begins. It entails the analysis of a SharePoint farm and the business processes tied to it from the perspective of vulnerabilities, threats, and general exposures that are introduced simply by having the farm in production and in use by business users. The identifiable risks typically equate to one or more SharePoint functions or usage scenarios. “Collaboration on XYZ project,” “business intelligence functions leveraged by executives,” and “workflow that is used to approve public communications in the ABC document library” are examples of such functions and scenarios.
The business impact analysis (BIA). The results of the risk assessment serve as the input to the BIA. The BIA attempts to equate the loss of a particular SharePoint capability or function (such as the loss of business intelligence functions leveraged by executives) with the projected magnitude or expected monetary impact associated with the loss (for example, $10,000 per day in investments). Equating outages to exact losses is difficult at this stage due to all the variables that are typically in play, but the results of the analysis serve as a valuable prioritization tool in the next stage of the business continuity planning process.
The business continuity plan (BCP). Armed with the results of the BIA, business continuity planners possess the data they need to prioritize and address the risk areas identified during the risk assessment. Risk areas or regions that the BIA identifies as carrying the largest potential for loss or adverse business exposure are addressed more urgently, whereas those with lesser potential impact are addressed when the opportunity arises or is most cost effective. As described earlier, the BCP that results from this process addresses both the technological areas included in the disaster recovery plan (such as “restore the system and associated databases from backup”) and associated business processes (for example, “have the accounts payable team begin using the new repository at URL http://DRAccountsPayable instead of the standard production URL”). A BCP typically includes other prescriptive advice and workarounds to minimize or mitigate the impact of an outage.

As shown in Figure 1, a disaster recovery plan is one component of the ultimate business continuity plan that results from both the risk assessment and BIA of identified risks. Of course, the disaster recovery plan does not simply arise from a determination regarding the potential impact of an outage.

The purposes for which a SharePoint farm is used, along with acceptable outage windows in the event of a disaster, ultimately drive the technological aspects of the disaster recovery plan that an organization crafts and implements. Two key concepts determine what constitutes an “acceptable” outage window:

Recovery time objective (RTO). The RTO of a disaster recovery plan defines the amount of time that can elapse between the occurrence of a disaster and the affected system being returned to an agreed-upon level of operational readiness. Put simply, an RTO defines the time you have to get a system back up and running after a disaster. It is typically during this period that the steps of a disaster recovery plan are executed. A highly critical SharePoint system may have a real-time RTO (that is, the failure of a production system immediately results in a backup system taking over). At the other extreme, a farm that handles tertiary business functions may have an RTO that is measured in weeks to support the acquisition of new hardware and the ultimate rebuild of the farm from scratch.
Recovery point objective (RPO). Whereas RTOs are forward-looking, an RPO defines a period of time prior to any disaster where data loss may (and likely will) occur. Crudely explained another way, an RPO defines the maximum amount of data loss that’s deemed acceptable in a disaster. Data that existed prior to the point in time defined by the RPO can be restored or recovered, whereas data after that point may not. As you might expect, a highly critical SharePoint system may have a disaster recovery plan with a near-zero RPO that does not accept any form of data loss. Tertiary systems, on the other hand, may have RPOs that are measured in hours or days.

To illustrate the concepts of RTO and RPO, consider the disaster recovery plan profile shown in Figure 2. The requirements in this plan are common of less-critical systems, where some amount of data loss and downtime is deemed acceptable in the event of a disaster.

Figure 2. RPO and RTO for a SharePoint farm of lesser business significance.

In this disaster recovery plan, a disaster occurs and is declared at 7 a.m. The disaster recovery plan mandates an RPO of 12 hours and an RTO of 24 hours. To satisfy the RPO requirement of this plan, a backup or some capture of relevant data and state must have been performed in the 12 hours leading up to the declaration of the disaster. At the same time, the RTO requirement states that the system must be restored to a functional state (qualified within the disaster recovery plan) within 24 hours of the disaster’s occurrence.

Figure 3 presents a different set of requirements for recovery when the disaster is declared at 7 a.m. The RTO and RPO shown are more common of a SharePoint farm that is of greater importance to the organization that utilizes it. With an RPO window of one hour and an RTO window of 30 minutes, the potential overall outage window is significantly smaller than the one illustrated in Figure 2.

Figure 3. RPO and RTO for a SharePoint farm of greater business importance.

As you might imagine, implementing a disaster recovery solution to address the RTO and RPO requirements illustrated by the plan shown in Figure 3 carries a different set of challenges than meeting the requirements for the plan shown in Figure 2. Technical strategies and supplemental equipment requirements vary significantly between the two.

In a perfect world, all disaster recovery strategies would involve no loss of data (that is, have a zero RPO window) and provide instant failover (zero RTO). Unfortunately, the cost of such strategies for SharePoint farms is exceptional and prohibitive for all but the most critical of business uses. As part of their disaster recovery planning, most organizations discover that as RPO and RTO target windows shrink, the cost of an associated disaster recovery strategy goes up. The challenge then becomes balancing data loss and downtime against the total cost of implementing an appropriate and effective disaster recovery strategy.

Other -----------------

- Windows Server 2003 : Implementing VPNs (part 2) - Configuring VPN Types

- Windows Server 2003 : Implementing VPNs (part 1) - Understanding Virtual Private Networks

- Installing Microsoft SharePoint Server 2010 and Configuring PerformancePoint Services : Configuring PPS (part 5) - Activating the Feature in the Web Application

- Installing Microsoft SharePoint Server 2010 and Configuring PerformancePoint Services : Configuring PPS (part 4) - Set the Unattended Service Account & Associating the Service Application Proxy with a

- Installing Microsoft SharePoint Server 2010 and Configuring PerformancePoint Services : Configuring PPS (part 3) - Creating the PerformancePoint Service Application

- Installing Microsoft SharePoint Server 2010 and Configuring PerformancePoint Services : Configuring PPS (part 2)

- Installing Microsoft SharePoint Server 2010 and Configuring PerformancePoint Services : Configuring PPS (part 1) - Configuring the Secure Store Service

- Manage the Active Directory Domain Services Schema : Add Attributes to Ambiguous Name Resolution Filter

- Manage the Active Directory Domain Services Schema : Remove Attributes from the Index

- Manage the Active Directory Domain Services Schema : Index Attributes