Failure of a Node
As you can see in Figure 17, one of the nodes in the SQL Server cluster (CLUSTER1) has failed, and MSCS is in the middle of failing over to the other node in the cluster (CLUSTER2). As you can also see, the CLUSTER2 node item group has an hourglass on it, indicating that an MSCS operation is under way. The states of the resources on CLUSTER2
are mostly Online Pending. In other words, these resources are in the
middle of failing over to this node. As they come up successfully,
Online Pending turns to Online.
In addition, the failure of a node (for any reason) is also written to the System event log.
This example showed an
intentional failure of the SQL Server instance via the Cluster
Administrator. SQL Server Failover Clustering does the right thing by
failing over to the other
node. This serves to verify that SQL Server Clustering is working
properly. The next section illustrates what this effect has on a typical
client application point of view, using a custom client test program
called Connection Test Program.
Congratulations! You are
now up and running, with your SQL Server Failover Cluster intact and
should now be able to start achieving significantly higher availability
for your end users. You ca easily register this new virtual SQL Server (VSQLSERVER2008) within SQL Server Management Studio (SSMS) and completely manage it as you would any other SQL Server instance.
The Connection Test Program for a SQL Server Cluster
To help in visualizing exactly
what effect a SQL Server failure and subsequent failover may have on an
end-user application, we have created a small test program using Visual
Studio 2008. This small C# test program accesses the AdventureWorks2008
database available for SQL Server 2008 , and it was created in about 10 minutes. It displays a
few columns of data, along with a couple system variables that show
connection information, including the following:
ProductID, Name, and ProductNumber— This is a simple three-column display of data from the Product table in the AdventureWorks2008 database.
SHOWDATETIME— This shows the date and time (to the millisecond) of the data access being executed.
SERVERNAME— This is the SQL Server name that the client is connected to.
SPID— This is the SQL Server process ID (SPID) that reflects the connection ID to SQL Server itself by the client application.
This type of small
program is useful because it always connects to the virtual SQL Server.
This enables you to see what effect a failover would have with your
client applications.
To populate this display grid, you execute the following SQL statement:
SELECT ProductID, Name, ProductNumber,
CONVERT (varchar(32), GETDATE(), 9) AS SHOWDATETIME,
@@SERVERNAME AS SERVERNAME,
@@SPID AS SPID
FROM Production.Product WHERE (ProductID LIKE '32%')
You use Visual Studio 2008 to set up a simple Windows form like the one shown in Figure 18.
You build a simple button that will retrieve the data from the SQL
Server database on the virtual server and also show the date, time,
server name, and SPID information for each access invocation.
The program, called WindowsApplication4.sln SQLClientTest4 Visual Studio 2008 project, is zipped up in a file named SQL Client SQL Clustering test program .zip. If you want to install this program, you just unzip the SQLClientTest.zip file and locate the WindowsApplication4.sln
solution file. You open this from your Visual Studio 2008 start page.
Then you rebuild and deploy it after you have modified the connection
string of the dataset adapter.
After deploying this simple test program, you simply execute it from anywhere on your network. As you can see in the App.config XML file for this application, shown here, the connection string references the VSQLSERVER2008 virtual server name only:
<?xml version="1.0" encoding="utf-8" ?>
<configuration>
<configSections>
</configSections>
<connectionStrings>
<add name="WindowsApplication4.Properties.Settings.
AdventureWorksConnectionString"
connectionString="Data Source=VSQLSERVER2008;Initial
Catalog=AdventureWorks2008;
Integrated Security=True"
providerName="System.Data.SqlClient" />
</connectionStrings>
</configuration>
Figure 19
shows the first execution of the Connection Test Program. If you click
the Retrieve button, the program updates the data grid with a new data
access to the virtual SQL Server machine, shows the name of the server
that the client program is connecting to (SERVERNAME), shows the date and time information of the data access (in the SHOWDATETIME column), and displays the SQL SPID that it is using for the data access (in the SPID column). You are now executing a typical C# program against the virtual SQL Server. Note that the SPID value is 55; this represents the SQL connection to the virtual SQL Server machine servicing the data request.
Now let’s look at how this
high-availability approach works from the client application point of
view. To simulate the failure of the active node, you simply turn off
the machine (CLUSTER1 in this case). This
is the best (and most severe) test case of all. Or, if you like, you can
use the Cluster Administrator Move group approach shown earlier.
After you simulate this
failure, you click the Retrieve button in the Connection Test Program
again, and an unhandled exception occurs (see Figure 20).
You can view the details of the error message, choose to quit the
application, or choose to continue. You should click Continue for now.
What has happened is that the application can no longer connect to the failed SQL Server (because you turned off CLUSTER1), and it is still in the middle of failing over to CLUSTER2 in the two-node cluster.
A failover occurs in a short
amount of time; the actual amount of time varies, depending on the power
and speed of the servers implemented and the number of in-flight
transactions that need to be rolled back or forward at the time of the
failure. (A complete SQL failover often occurs in about 15 to 45
seconds. This is very minor and well within most service-level
agreements and high-availability goals.) You then simply click the
Retrieve button again in the Connection Test Program, and you are
talking to SQL Server again, but now to CLUSTER2.
As you can see in Figure 21, the data connection has returned the customer data, SHOWDATETIME has been updated, and SERVERNAME still shows the same virtual SQL Server name that the application needs to connect to, but the SPID has changed from 55 to 52.
This is due to the new connection of the Connection Test Program to the
newly owned (failed-over) SQL Server machine. The Connection Test
Program has simply connected to the newly started SQL Server instance on
CLUSTER2. The unhandled exception
(error) goes away, and the end user never knows a complete failover
occurred; the user simply keeps processing as usual.
Note
You could program better
error handling that would not show the “unhandled exception” error. You
might want to display a simple error message, such as “database
momentarily unavailable—please try again,” which would be much more user
friendly.
Potential Problems to Watch Out for with SQL Server Clustering
Many potential
problems can arise during setup and configuration of SQL Server
Clustering. Following are some items you should watch out for:
SQL Server service
accounts and passwords should be kept the same on all nodes, or a node
will not be able to restart a SQL Server service. You can use administrator or a designated account (for example, Cluster or ClusterAdmin) that has administrator rights within the domain and on each server.
Drive
letters for the cluster disks must be the same on all nodes (servers).
Otherwise, you might not be able to access a clustered disk.
You
might have to create an alternative method to connect to SQL Server if
the network name is offline and you cannot connect using TCP/IP. You can
use named pipes, specified as \\.\pipe\$$\SQLA\sql\query.
It
is likely that you will run into trouble getting MSCS to install due to
hardware incompatibility. Be sure to check Microsoft’s Hardware
Compatibility List before you venture into this installation.