Microsoft SQL Server 2008 R2 : Installing SQL Server Clustering (part 3) - Failure of a Node, The Connection Test Program for a SQL Server Cluster

7/25/2012 4:51:29 PM

Failure of a Node

As you can see in Figure 17, one of the nodes in the SQL Server cluster (CLUSTER1) has failed, and MSCS is in the middle of failing over to the other node in the cluster (CLUSTER2). As you can also see, the CLUSTER2 node item group has an hourglass on it, indicating that an MSCS operation is under way. The states of the resources on CLUSTER2 are mostly Online Pending. In other words, these resources are in the middle of failing over to this node. As they come up successfully, Online Pending turns to Online.

Figure 17. Failing over from CLUSTER1 to CLUSTER2, Online Pending state.

In addition, the failure of a node (for any reason) is also written to the System event log.

This example showed an intentional failure of the SQL Server instance via the Cluster Administrator. SQL Server Failover Clustering does the right thing by failing over to the other node. This serves to verify that SQL Server Clustering is working properly. The next section illustrates what this effect has on a typical client application point of view, using a custom client test program called Connection Test Program.

Congratulations! You are now up and running, with your SQL Server Failover Cluster intact and should now be able to start achieving significantly higher availability for your end users. You ca easily register this new virtual SQL Server (VSQLSERVER2008) within SQL Server Management Studio (SSMS) and completely manage it as you would any other SQL Server instance.

The Connection Test Program for a SQL Server Cluster

To help in visualizing exactly what effect a SQL Server failure and subsequent failover may have on an end-user application, we have created a small test program using Visual Studio 2008. This small C# test program accesses the AdventureWorks2008 database available for SQL Server 2008 , and it was created in about 10 minutes. It displays a few columns of data, along with a couple system variables that show connection information, including the following:

ProductID, Name, and ProductNumber— This is a simple three-column display of data from the Product table in the AdventureWorks2008 database.
SHOWDATETIME— This shows the date and time (to the millisecond) of the data access being executed.
SERVERNAME— This is the SQL Server name that the client is connected to.
SPID— This is the SQL Server process ID (SPID) that reflects the connection ID to SQL Server itself by the client application.

This type of small program is useful because it always connects to the virtual SQL Server. This enables you to see what effect a failover would have with your client applications.

To populate this display grid, you execute the following SQL statement:

SELECT ProductID, Name, ProductNumber,
CONVERT (varchar(32), GETDATE(), 9) AS SHOWDATETIME,
@@SERVERNAME AS SERVERNAME,
@@SPID AS SPID
FROM Production.Product WHERE (ProductID LIKE '32%')

You use Visual Studio 2008 to set up a simple Windows form like the one shown in Figure 18 . You build a simple button that will retrieve the data from the SQL Server database on the virtual server and also show the date, time, server name, and SPID information for each access invocation.

Figure 18. Visual Studio 2008 Windows form and data adapters needed for the test client C# program.

The program, called WindowsApplication4.sln SQLClientTest4 Visual Studio 2008 project, is zipped up in a file named SQL Client SQL Clustering test program .zip. If you want to install this program, you just unzip the SQLClientTest.zip file and locate the WindowsApplication4.sln solution file. You open this from your Visual Studio 2008 start page. Then you rebuild and deploy it after you have modified the connection string of the dataset adapter.

After deploying this simple test program, you simply execute it from anywhere on your network. As you can see in the App.config XML file for this application, shown here, the connection string references the VSQLSERVER2008 virtual server name only:

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
    <configSections>
    </configSections>
    <connectionStrings>
        <add name="WindowsApplication4.Properties.Settings.
       AdventureWorksConnectionString"
       connectionString="Data Source=VSQLSERVER2008;Initial
Catalog=AdventureWorks2008;
Integrated Security=True"
            providerName="System.Data.SqlClient" />
    </connectionStrings>
</configuration>

Figure 19 shows the first execution of the Connection Test Program. If you click the Retrieve button, the program updates the data grid with a new data access to the virtual SQL Server machine, shows the name of the server that the client program is connecting to (SERVERNAME), shows the date and time information of the data access (in the SHOWDATETIME column), and displays the SQL SPID that it is using for the data access (in the SPID column). You are now executing a typical C# program against the virtual SQL Server. Note that the SPID value is 55; this represents the SQL connection to the virtual SQL Server machine servicing the data request.

Figure 19. Executing the Connection Test Program with current connection information.

Now let’s look at how this high-availability approach works from the client application point of view. To simulate the failure of the active node, you simply turn off the machine (CLUSTER1 in this case). This is the best (and most severe) test case of all. Or, if you like, you can use the Cluster Administrator Move group approach shown earlier.

After you simulate this failure, you click the Retrieve button in the Connection Test Program again, and an unhandled exception occurs (see Figure 20 ). You can view the details of the error message, choose to quit the application, or choose to continue. You should click Continue for now.

Figure 20. An unhandled exception has occurred; it is a transport-level error (that is, a TCP provider error).

What has happened is that the application can no longer connect to the failed SQL Server (because you turned off CLUSTER1), and it is still in the middle of failing over to CLUSTER2 in the two-node cluster.

A failover occurs in a short amount of time; the actual amount of time varies, depending on the power and speed of the servers implemented and the number of in-flight transactions that need to be rolled back or forward at the time of the failure. (A complete SQL failover often occurs in about 15 to 45 seconds. This is very minor and well within most service-level agreements and high-availability goals.) You then simply click the Retrieve button again in the Connection Test Program, and you are talking to SQL Server again, but now to CLUSTER2.

As you can see in Figure 21, the data connection has returned the customer data, SHOWDATETIME has been updated, and SERVERNAME still shows the same virtual SQL Server name that the application needs to connect to, but the SPID has changed from 55 to 52. This is due to the new connection of the Connection Test Program to the newly owned (failed-over) SQL Server machine. The Connection Test Program has simply connected to the newly started SQL Server instance on CLUSTER2. The unhandled exception (error) goes away, and the end user never knows a complete failover occurred; the user simply keeps processing as usual.

Figure 21. Executing the Connection Test Program again against the failed-over cluster node.

Note

You could program better error handling that would not show the “unhandled exception” error. You might want to display a simple error message, such as “database momentarily unavailable—please try again,” which would be much more user friendly.

Potential Problems to Watch Out for with SQL Server Clustering

Many potential problems can arise during setup and configuration of SQL Server Clustering. Following are some items you should watch out for:

SQL Server service accounts and passwords should be kept the same on all nodes, or a node will not be able to restart a SQL Server service. You can use administrator or a designated account (for example, Cluster or ClusterAdmin) that has administrator rights within the domain and on each server.
Drive letters for the cluster disks must be the same on all nodes (servers). Otherwise, you might not be able to access a clustered disk.
You might have to create an alternative method to connect to SQL Server if the network name is offline and you cannot connect using TCP/IP. You can use named pipes, specified as \\.\pipe\$$\SQLA\sql\query.
It is likely that you will run into trouble getting MSCS to install due to hardware incompatibility. Be sure to check Microsoft’s Hardware Compatibility List before you venture into this installation.

Related -----------------

- Microsoft SQL Server 2008 R2 : Installing SQL Server Clustering (part 2) - Installing SQL Server

- Microsoft SQL Server 2008 R2 : Installing SQL Server Clustering (part 1) - Configuring SQL Server Database Disks

Other -----------------

- BizTalk 2009 : Dealing with Compressed Files (part 2) - Receiving Zipped Files

- BizTalk 2009 : Dealing with Compressed Files (part 1) - Sending Simple Zipped Files

- Windows Server 2008 Server Core : Managing Removable Storage with the RSM Utility (part 3) - VIEW, REFRESH, INVENTORY

- Windows Server 2008 Server Core : Managing Removable Storage with the RSM Utility (part 2) - DISMOUNT, EJECT, CREATEPOOL

- Windows Server 2008 Server Core : Managing Removable Storage with the RSM Utility (part 1) - ALLOCATE, DEALLOCATE, MOUNT

- Manage the Active Directory Domain Services Schema : Index Attributes, Remove Attributes from the Index

- Backing Up the Exchange Server 2007 Environment : Backing Up Specific Windows Services

- Backing Up the Exchange Server 2007 Environment : Backing Up the Windows Server 2003 and Exchange Server 2007

- Windows Server 2008 Server Core : Setting and Viewing Application Paths with the Path Command

- Windows Server 2008 Server Core : Creating Symbolic Links and Hard Links with the MKLink Command, Mounting a Volume with the MountVol Utility