It is never a happy day for the IT
group when an online service goes down, and this includes SharePoint.
As fantastic as SharePoint is, it is inevitable that at some point in
the life cycle, your SharePoint solution will suffer from downtime. Of
course, downtime may occur for any number of reasons: human error,
underlying hardware failure, power outage, faulty customizations, and
so on. Since failure cannot be entirely averted, your role as a
SharePoint administrator is to account for such downtime and restore
service to the users of the platform in a timely manner. Planning for
and recovering from loss of service is what I refer as to as planning
for disaster recovery.
Minimizing downtime and
averting loss in a disaster involves proactive processes and planning.
Those unfortunate readers who have experienced loss of data are likely
all too familiar with data backup, which is one aspect of disaster
recovery. Another important aspect of disaster recovery includes
techniques to minimize service downtime.
Minimizing downtime of a service factors both the total time to recover the service and the point in time
from which recovery resumes. In short, if recovery consists of
restoring data in a SharePoint site collection because of database
corruption, then the time to restore the database from backup and the
time when the last backup took place are both important factors for the
success of restoration of the SharePoint site collection. A speedy
restore is one thing, but if the data is already three months old then,
depending on the frequency of change of the live data, the restoration
is not necessarily successful.
Data/content recovery is
one piece of a good disaster recovery plan—restoration of system
hardware, the underlying operating system, system software, and
configuration are all part of the plan.
Warm recovery is the quickest form of recovery
in the event of a disaster and typically involves a level of hardware
and software redundancy. Conversely, cold recovery refers to the
restoration of service from scratch in a completely inoperable state.
Cold recovery typically involves restoration of data from an offline
backup store. A good disaster recovery strategy involves both warm and
cold recovery methods.
Load-Balanced Service
Load balancing involves either a hardware or
a software load balancer, which intercepts all incoming web traffic on
a specific IP address and redirects it to one of at least two web
servers to service the request. The load balancer directs traffic
either to the server with the least load (intelligent load balancing)
or in turn, based on which server served the previous request
(round-robin).
Load balancing serves two purposes:
distributing user requests load and warm redundancy in the event of a
server failure that was serving requests. SharePoint 2013 includes a
new request manager service to manage intelligently which servers in a
multiple server farm handle which requests.
Load balancing SharePoint consists of pointing
a configured load balancer to multiple front-end SharePoint servers in
the farm that serve pages. A SharePoint farm may include as many
front-end web and application servers as the infrastructure can
provide; thus, scaling out to handle more traffic is simply a case of
adding a new web server to the farm and registering the IP with the
load balancer.
As well as providing for distributed load, most
load balancers can detect if one of the servers in the pool is not
responding and then redirect all traffic to the other responding
servers. Large enterprise organizations that have the capability to
host different servers in multiple geographic locations may redirect
traffic to passive SharePoint servers, or completely mirrored
SharePoint farms, to achieve redundancy and rapid recovery if a primary
site hosting the main SharePoint infrastructure fails.
SQL Server Failover Clustering
SQL Server clustering consists of multiple
SQL Server nodes, managed by a root cluster that provides redundancy at
the SQL Server application level.
A cluster typically consists of an active node
and at least one passive node, although you can have multiple nodes.
The cluster maintains all nodes so that any database write operations
update both the active and passive nodes, but the active node is
handling all of the incoming requests. In the event that the active
node fails, then the Windows Failover Cluster Service switches over to
use one of the passive nodes (running on different hardware). I should
highlight some important points about SQL clustering:
- SQL clustering does not help performance, since only one node of the cluster is active at any one time
- Recovery in the event of failure of the active node is dependent on
the time it takes to bring a passive node online—this is not always an
immediate process and dependent on when the Windows Failover Cluster
Service detects a down node
- SQL clustering uses shared storage to ensure timely and accurate copies of data from the active node to the passive nodes
I recommend the use of SQL clustering in any
large organization or enterprise where SharePoint data is critical and
exceeds 100GB, and the organization must limit the downtime in the
event of failure. Traditionally, large-scale organizations using
SharePoint with SQL clustering would host the actual data on a Storage
Area Network, attached to the cluster, to provide an extra level of
data redundancy and hot swap capability with inexpensive disk storage.
SQL Server Database Mirroring
SQL Server mirroring also provides data
redundancy at the SQL Server, but unlike clustering, where the cluster
is the data repository in entirety, mirroring consists of a warm backup
SQL Server, separate from the main live server.
Clustering involves multiple storage nodes,
connected by network links to a root SQL instance. Mirroring consists
of two completely independent SQL Servers with either synchronous or asynchronous
copy, managed by each SQL Server instance. Synchronous mode provides
hot standby because SQL Server ensures no data discrepancy between the
principal and the mirror, whereas asynchronous provides warm backup and
operates in a more passive copy mode.
Administrators may provide high availability
for SharePoint when using SQL Server mirroring in synchronous mode and
using the database failover capabilities built into the SharePoint
platform. SharePoint requires a SQL Server witness to manage the
failover, in the event that the principal fails.
The following steps consist of PowerShell
commands. Launch the SharePoint Management Shell to begin, where you
will enter the following commands:
- Enter the following command into the PowerShell console to configure mirroring for the SharePoint configuration database:
$database = Get-SPDatabase | where {$_.Name –match "SharePoint_Config"}
$database.AddFailoverServiceInstance("mirror server name")
$databse.Update()
- Enter the following command into the PowerShell console to configure mirroring for your content database:
$database = Get-SPDatabase | where {$_.Name –match "WSS_Content"}
$database.AddFailoverServiceInstance("mirror server name")
$databse.Update()
Note Both of the preceding commands assume your configuration database has the name SharePoint_Config and you have named your content database as WSS_Content. Change the names in the script to match your database names.
If you prefer to configure database mirroring via Central Administration, follow these steps:
- Open Central Administration.
- Click the Application Management heading link.
- Click the Manage Content Databases link.
- Choose the relevant web application from the drop-down list.
- Select the relevant content database.
- On the settings page for the selected database, populate the Failover Database Server field with the mirrored server.
- Click the OK button.