SQL Server 2008 R2 : A Performance and Tuning Methodology (part 2)

11/4/2012 6:02:07 PM

A methodology is a process or procedure, for example, designed for repeatability that has been proven to work for a certain paradigm. In our case, it is a repeatable and thorough SQL Server–focused performance and tuning process. We generalized this focused process into a repeatable methodology and identified two possible paths that can be taken through it: one for a new SQL Server implementation that will have performance and tuning designed into it from the start and another for performance and tuning of an existing SQL Server implementation (one that needs to be scaled out or rescued—or in other words, “optimized”).

Figure 4 illustrates this overall performance and tuning methodology within a traditional waterfall development methodology context. But, as you will see, it is very iterative in nature and can be followed within any organization’s formal development methodology.

Figure 4. Generalized performance and tuning methodology for SQL Server.

Notice the two distinct paths labeled “New” and “Existing” indicated by the dashed arrowed lines. As mentioned earlier, one path is for new implementations, and the other is for existing implementations. The following sections describe each of these distinct paths through the methodology.

Designing In Performance and Tuning from the Start

If you are just starting to design and develop a new SQL Server–based implementation, it would be great to factor in all possible performance and tuning considerations from the beginning. In real life, this is rarely done primarily because much is unknown from a data access, number of users, and table design point of view. However, you are not precluded from designing in certain common performance and tuning considerations, nor are you precluded from incorporating a performance and tuning “prototyping” step in your methodology so that you have known and predictable results of what you are building “as you go” and not “at the end.” As you have no doubt experienced (or heard many times), changing something after it is built is more expensive than if you had considered it much earlier in the development process. In fact, such changes are likely at least 10 or more times more expensive from both a monetary point of view and from a time point of view.

As you can see in Figure 4 , each path begins with an assessment step. For new development, this step covers the traditional “initiation” type of activities, such as project sizing, scheduling, scope refinement, and resourcing. As you identify project sizing, scheduling, and resourcing, you should add in 5% to each subsequent phase of your traditional development life cycle for performance and tuning activities. We outline these activities shortly. In general, they include capturing performance goals, having performance and tuning design reviews, building prototypes optimized to meet these performance goals, and setting final checkpoints that demonstrate full compliance of performance achievement under full loads. Our performance and tuning methodology for new implementations focuses on six major steps along the way to deployments. We don’t list all the tasks or activities around your programming or user interface activities. Instead, we just focus on the SQL Server–oriented items that pertain to optimal performance and tuning.

The six development methodology steps are

Assessment
Identify and Design
Prototype
Code and Test
System Test and Acceptance
Implementation

Assessment

During this project initiation phase, a complete picture of the effort is usually identified and assessed at a high level. This includes

Project sizing— Determines if this is a small, medium, or larger project.
Project scope clarifications— Describe the intent and application scope to be created.
Deliverables identified— Identifies deliverables for all subsequent phases of the project. This should also include new tasks for performance and tuning activities.
Schedules/milestones— Reflect what is needed to build this application, when it must be delivered, and everything along the way.
Resources identified and committed to project— Includes some specialized resources that enable your performance and tuning tasks and reviews.

All organizations have their own version of this step/phase, but you should note that you need to plan in time for performance and tuning from the beginning. We suggest at least adding 5% to the number of hours to performance and tuning tasks for each subsequent step in the development life cycle. That 5% will save you enormous heartache and grief later and will ensure the success of your implementation.

Identify and Design

The identification and design step is centered around the clear identification of what must be built, what service-level agreements are needed, and what performance goals must be met. The performance and tuning-oriented tasks are as follows:

Identify the primary service-level agreements (SLAs)/performance goals— This task is critical and must be stated clearly and realistically. Sub-second response rates are likely not realistic for every element of an application. You will be measured against these goals.
Estimate work load/use profiles— These profiles are the general volumes of major activity this application must support, such as 68% online activity, 32% reporting activity, and availability such as 24×7×365. These profiles should also include any known growth estimates for different increments of time such as 6-month growth numbers, 12-month growth numbers, so on.
Generalize to major access patterns— This task quantifies the major data access patterns that must be supported by the application. Knowing these patters is essential in order to design in performance from the start. Examples of data access patterns are a shopping cart access pattern and an ATM access pattern. They would be vastly different in their table, indexing, and transactional designs.
Design for all layers of the architecture— Based on the performance goals, workloads, and major data access patterns, your initial designs should not only cover the SQL Server objects (such as table designs, index designs, and so on), but must also reach out to each other’s architectural layer previously identified. Now is the time to design on the correct partitioning scheme, correct transactional model, correct file placement approach, correct disk subsystem to use, correct servers needed, and memory management needs of your application. You should include a task in your project plan for a complete performance and design review.

Prototype

We added a formal prototyping step into this methodology to stress how important it is to fully evaluate and understand the performance expectations of the application you are building. This very iterative process can help greatly in refining the designs being considered. Go ahead and build a prototyping environment where you can thoroughly try out what you have designed in the preceding “Identify and Design” step. Your prototype should illustrate (and implement) the major data access patterns you must support, show how they are achieving planned performance goals, and indicate whose results must be signed off on to pass to the next development phase step. You may have to iterate back through this and the preceding step several times to narrow in on a more optimal design solution. Use this time now to ensure success later.

In a prototype environment, prototype possible solutions— This is for a generalized design and does not have to include every column of data or functionality of the application. There may be one or more possible solutions that you must prototype. The prototype is at the generic “pattern” level. Often we use tables that have artificial columns as placeholders of known unusual data types along with the minimal known columns needed to implement a data access pattern and table design.
Dissect trouble areas— During the prototyping, you will have issues around your design and the performance results you are getting. Now is the right time to solve these types of issues. Not later.
Ramp up for load/stress testing— Your prototypes should include some level of volume or peak stress testing results so you can have a good understanding of what to expect at peak times. In addition, it is beneficial to prototype and test with data volumes that are representative of your expected production data volumes. This helps to identify ahead of time any performance issues that may arise when querying and processing larger data volumes.
Note

Very large volumes of data may require modifications to the database design to provide good performance. The earlier this need is determined in the design/testing process, the less expensive it will be to implement these changes in the system while code is still being developed rather than having to make changes after the system has been put into production.
Iterate back to the “Identify and Design” step (as needed)— Design and redesign as much as is needed. Prototyping helps you narrow in on an optimized solution quickly. In general, shoot for a 90% solution (one that meets performance within 10% of stated goals).

You should also have the right tools available to you during this prototyping step. This includes monitoring tools, performance and tuning tools, and other instrumentation needed to better understand and resolve any early issues that surface.

Code and Test

In the “Code and Test” stage, you fully code all elements of your application. This includes all table designs, index designs, application coding, and complete testing prior to the system test and user acceptance.

Perform full coding and unit/load testing— At this point, perform complete coding of your application and database. Each unit test must also include a step to determine how it is meeting performance goals/SLAs. Most programmers are not accustomed to this checkpoint. The code does not go forward unless the goals/SLAs are fully met. No exceptions. Again, whenever possible, testing should be done with data volumes and values that are representative of the expected production data after the system has been running for an extended period of time. Doing so is very important to help avoid future performance issues when the system is in production, because as data volumes grow, queries may optimize differently. Queries that run quickly with small data volumes may optimize differently and run slowly with larger data volumes. However, if you can optimize the database design and queries to provide good performance when you have large data volumes, performance should be just as good or even better with smaller data volumes as well.
Perform regression testing if desired— Utilize full regression testing if you desire. This step usually requires complex testing harnesses and dedicated quality assurance resources.
Iterate back to the “Prototype” step (as needed)— Don’t be afraid to iterate back into the prototyping mode to flesh out a serious issue. You need to solve issues now, not later in production or in acceptance testing.
Assess if service levels have been met— Create a formal checkpoint that must be passed around the service levels and performance goals being met. You cannot proceed to the next step unless they are met!

System Test and Acceptance

The “System Test and Acceptance” stage is the full-blown integrated system test in a production-like environment. You do final user acceptance and full system-level stress tests here. All your performance goals must be fully met! Again, you need to have a formal checkpoint identified here that requires signoff of the SLAs/performance goals and fully document the results for the system stress tests.

In the full system/stress test/user acceptance, your fully loaded application and database are thoroughly tested by your users, and a full stress test is done to reflect peak system usage. The users should not sign off on this step unless the performance goals are met, and the application’s functional test is successful.

If possible, this stage should reflect what the complete production hardware and software stack will look like. By now, you should also be able to set expectations for both current and future scalability of this application.

Implementation

By design, the implementation should be merely a formality. Certainly, all performance concerns should have been met, documented, and verified even under peak processing scenarios. It is also recommended that any production implementation include a certain amount of performance and tuning instrumentation and monitoring. This should be a standard part of any production implementation environment now.

Production build/implementation— You should perform a complete buildout of your application in production.
Production performance and monitoring— You need to have complete performance and tuning instrumentation and monitoring in place and tied to your system monitoring services (SMS) environment. There is usually a proactive monitoring following a new implementation for an extended amount of time. Don’t just implement and walk away. There is likely something that must be adjusted.
Final documentation/results— In this step, you create a set of documents that reflect what you built and also the current performance levels being achieved. These documents will be valuable later as the application changes and workloads increase.

Figure 5 shows a holistic picture of the different layers that you have been building on and how these different layers depend on each other and are built on each other. Make sure you have checked off and considered each of them in your pursuit of optimizing your SQL Server–based implementation.

Figure 5. The many interrelated layers of your system.

Starting at the hardware footprint, to the operating systems that sit on top of the hardware, to the database and middleware implemented on top of the OS, to the application itself (however many application tiers), and the network supporting the communication to the users—all have a part in delivering an optimal implementation.

The next section describes our formal performance and tuning methodology applied to an existing implementation.

Performance and Tuning for an Existing Implementation

Many of you may have just skipped down to this section because you have already built something and are trying to get to some serious performance and tuning for your existing implementation. Regardless of your situation, the essence of this section is describing the methodology for isolation, identification, and migration to get your existing implementation to a well-performing and tuned implementation. We use isolation in the sense that you must isolate the major performance issues quickly, identification in the sense of locating the exact issues to focus on, and migration in the sense of having to get from the current issues to a new issue-free implementation. This last part is often incredibly hard to do—like changing tires on a car while it is still moving. With this in mind, a different path through the performance and tuning methodology is needed—one that starts with an assessment, but an assessment of what issues or shortcomings exist, and then a separate branch (path) that includes isolation, monitoring, and identification of the issues rapidly. This is then followed by rapid prototyping and further isolation and monitoring and eventual full system testing and rolling out the changes into your production implementation.

One big advantage of doing performance and tuning on an existing implementation is that you have live transactional information, live data, and other production execution history to work from. From all of these, you should be able to piece together a good execution profile, all the major data access patterns, and other major characteristics critical to tuning what you have. You also are able to include new information or scalability needs as well.

Another initial decision you need to make is whether you want only transparent changes (ones that have no application changes whatsoever, such as index changes, server instance changes, file placement changes, some stored procedure changes, and so on) or if you can tolerate making nontransparent changes (ones that force you to also make schema, structure, and SQL statement changes, and even application changes). Of course, your decision depends on how much trouble you are in.

Now, let’s look back at Figure 41.4 and focus on the far right-most path through this methodology. We start by recasting the traditional “initiation” type of activities such as project sizing, scheduling, scope refinement, and resourcing to focus on just the performance and tuning tasks at hand. Again, we don’t list all the tasks or activities around your programming or user interface activities; we just focus on the SQL Server–oriented items that pertain to optimal performance and tuning.

Performance and tuning methodology steps for an existing implementation are as follows:

Assessment
Isolation and Monitor
Prototype
Code and Test
System Test and Acceptance
Implementation

Assessment

For this first step, assessment, a complete picture of the performance issues or expectations must be outlined. This includes

Project sizing— Determines if the performance and tuning effort you are about to engage in will be a small, medium, or large project. Although it is not a full-blown development project, it could certainly be bigger than you realize.
Project scope clarifications— Identify mostly whether you can tolerate only transparent changes versus nontransparent changes to resolve your performance issues.
Deliverables identified— Focuses mostly on performance and tuning activities but may be expanded to application activities if the changes needed must extend into the application itself.
Schedules/milestones— Reflect what is needed to monitor, isolate, identify, and roll out the changes needed.
Resources identified and committed to project— Identifies resources centered around your best performance and tuning folks. Don’t scrimp here. Even if you have to hire some experts to help with this effort, this is money well invested.

Isolate and Monitor

Now comes the specialized path down the performance and tuning methodology for existing implementations. It is time to monitor, isolate, and identify exactly where your problems are. In addition, you must not lose sight of the complete stack and all the layers that will potentially be a part of your performance issues.

You also need to revisit (or define for the first time) what service-level agreements are needed and what performance goals must be met. The performance and tuning-oriented tasks are as follows:

Identify the primary service-level agreements (SLAs)/performance goals— If you haven’t performed this task yet, the time is now. If you did it previously, it is time to revisit what these SLAs should be realistically. Again, subsecond response rates are likely not realistic for every element of an application. You should also add new or emerging scalability and growth needs to your goals now. This may have some very significant impact to new design decisions or performance and tuning approaches that would be different if all you needed to do was tune for an existing, unchanging workload.
Set up your execution capture and monitoring capabilities— You should perform this task at each of the layers in your architecture if you can. Remember, many performance issues often are spread across multiple areas. Our preference is to get peak production monitoring. This includes basic monitoring using Perfmon counters, SQL Server Profiler tracing, and even third-party monitoring tools that can help you see major issues graphically as they are happening.
Isolate hottest issues— We like to use the 90/10 rule such that 10% of your execution transactions or implementation configurations are likely causing 90% of your problems. If you solve that 10%, you have a well-performing platform. These issues must be isolated layer by layer (network, hardware, OS, application server/application, SQL Server instance, database, SQL statements, and so on).
Create workloads/use profiles— Utilizing the tracing/capture tools, you need to capture and organize where the problem issues are located at each layer. For SQL statements, this means capturing and ordering the worst-performing SQL into the top 100 worst by I/O and CPU consumption usually. You also need to identify concurrency issues (locking/deadlocks), disk/file utilization queues, cache utilization, and many other handles. It is from these issues that you must create repeatable execution patterns so that you can tune to support them. You must also execute all activities such as batch processes and so on that might not normally come into play until scheduled events have occurred. For new growth or scalability increases, add the appropriate workloads, use profiles, and SQL statements into the mix.
Identify issues and possible solutions for all layers of the architecture— Based on the performance goals, workloads, and major data access patterns, you need to come up with a series of performance changes prioritized by the biggest impact. These changes may include SQL Server objects changes (such as table changes, new or different indexes, and so on) but could also reach out to other architectural layers previously identified. These changes can also affect adding or changing a partitioning scheme, fixing a transactional model, correcting the file placements, changing the disk subsystem or server configurations, and adjusting the memory requirements of your application. Before you make any changes, you should have a comprehensive performance review that outlines all issues identified and proposes solutions to each.

Prototype

You now enter an iterative cycle between isolate, monitor, prototype, isolate, monitor, and prototype. This cycle may repeat for any number of iterations as you deem necessary. You should set your overall goals to achieve at least 95% of your performance goals. It isn’t too hard to get to that point. However, it is hard to squeeze out that last 5%, though. Often, getting those last few yards isn’t really going to buy you that much more relief.

When you want to get good performance and tuning results, it’s always best to utilize a prototyping environment that is very much like your production environment (or at least as close to the same as possible in all aspects—memory, CPU, disk subsystems, data volumes, and so on). Optimizer decisions and I/O times, for example, vary greatly from one configuration to another. If your prototyping system is completely different from your production system or configuration, you will likely be able to achieve only 80% of your performance goals and SLAs. Often, we are able to use a company’s QA/User Acceptance Testing environment, which is a mirror image of the production system, including a very current and complete production database backup image being used for the system testing. You need to negotiate time with this type of environment. Some prototyping can be accomplished on smaller environments (such as testing new partitioning schemes, new SQL query statements, and so on). But you need to run this all on a full-blown production-like environment before it is all said and done.

Again, your prototype should illustrate (and implement) the major data access patterns you must support, show how they are achieving planned performance goals, and indicate whose results must be signed off on to pass to the next development phase step. You may have to iterate back through this and the preceding step several times to narrow in on a more optimal design solution. You need to have full execution monitoring and tracing set up in this prototyping environment.

Create a complete prototype environment— Remember that you have to test existing and new solutions, not just one or the other. A change in something new may adversely affect something that was fine before, but becomes slow performing with the new solution. There may be one or more possible solutions that you must prototype to get the best results. You will be using “live” patterns based on production queries for the most part now. Any new functionality or solutions will be added to this mix.
Ramp up for load/stress testing— Your prototypes should include some full level of volume or peak stress testing results so you can have a good understanding of what to expect at peak times and the potential impact of performance changes to one aspect of the system on another, such as increased locking contention.
Iterate back to the “Isolate and Monitor” step (as needed)— Change, enhance, and try out new solutions in this iterative circle. Using a prototyping approach allows you to narrow down to what works best rather quickly.

Remember that you should have the right tools available to you during this prototyping step. They include monitoring tools, performance and tuning tools, and other instrumentation needed to better understand and resolve any early issues that surface.

Code and Test

The “Code and Test” step takes on a more formal process after the prototyping of all possible performance and tuning enhancements has been completed. In this step, you fully code all elements of your application with any changes that have resulted from the prototyping effort. This includes all table designs, index designs, and application coding prior to the final system test and user acceptance. Remember, if you are doing only the transparent changes, this part should be fairly easy; but if your changes are nontransparent, this will be a full-blown code and test phase of your entire application.

Perform full coding and unit/load testing— Perform complete coding of the application and database. Each unit test must also include a step determining how they are meeting performance goals/SLAs. Most programmers are not accustomed to this checkpoint. The code does not go forward unless the goals/SLAs are fully met. No exceptions.
Perform regression testing if desired— Utilize full regression testing if you desire. This step usually requires complex testing harnesses and dedicated quality assurance resources.
Iterate back to the “Prototype” step (as needed)— Don’t be afraid to iterate back into the prototyping mode to flesh out a serious issue. You need to solve issues now, not later in production or in acceptance testing.
Assess if service levels have been met— Create a formal checkpoint that must be passed around the service levels and performance goals being met. You cannot proceed to the next step unless these goals are met!

System Test and Acceptance

The next step is the full-blown integrated system test in a production-like environment. You do final user acceptance and full system-level stress tests here. All your performance goals must be fully met! Again, you need to have a formal checkpoint identified here that requires signoff of the SLAs/performance goals and fully document the results for the system stress tests.

In the full system/stress test/user acceptance, your fully loaded application is thoroughly tested by your users, and a full stress test is done to reflect peak system usage. The users should not sign off on this step unless the performance goals are met, and the application’s functional test is successful.

Implementation

The biggest issues now are migrating your new changes into the existing production implementation. This may include complex data structure changes and data migrations from old structures to new structures, or may simply involve dropping and re-creating indexes. You should plan all steps of your changes to the nth degree and test the upgrades on your QA platform a few times. You may have to schedule downtime for extensive nontransparent changes. You should make full backups at all layers prior to your live upgrade in production. Be sure to have a set of performance benchmark operations ready to run in production to verify the results you intended.

Production build/implementation— You should perform a complete buildout of your application in production and full migration upgrade scripts/data conversions.
Production performance and monitoring— You need to have complete performance and tuning instrumentation and monitoring in place and running. Run a series of performance testing scripts/queries as a part of your production upgrade process. Don’t just implement and walk away. There is likely something that must be adjusted even at this late stage.
Final documentation/results— In this step, you create a set of documents that reflect what you built and also the current performance levels being achieved.

In theory, you should now be in an optimized execution implementation. You should be prepared to monitor, monitor, and monitor some more. We traditionally keep a close monitoring eye for about five days of execution and then switch back to a normal amount of proactive monitoring after that.

The following sections highlight some common design techniques, approaches, and guidelines you should consider or utilize as you tune your SQL Server implementation.

Related -----------------

- SQL Server 2008 R2 : A Performance and Tuning Methodology (part 4) - Tools of the Performance and Tuning Trade

- SQL Server 2008 R2 : A Performance and Tuning Methodology (part 3) - Performance and Tuning Design Guidelines

- SQL Server 2008 R2 : A Performance and Tuning Methodology (part 1)

Other -----------------

- System Center Configuration Manager 2007 : Making the Status Message System Work for You

- System Center Configuration Manager 2007 : Database Maintenance

- Connecting Dynamics GP to Microsoft Office 2010 : Improving reports by sending SmartLists to Word

- Connecting Dynamics GP to Microsoft Office 2010 : Building analyses by Exporting SmartLists to Microsoft Excel, Delivering flexibility by exporting Navigation Lists to Excel

- Oracle SOA Suite 11g R1 : Securing the credit card validation service

- Oracle SOA Suite 11g R1 : Introducing security policy management

- InfoPath with Microsoft Content Management Server Web Services : Cleaning up the Generated Form

- InfoPath with Microsoft Content Management Server Web Services : Creating the Controls in InfoPath

- Installing Exchange Server 2010 : Post-setup configuration (part 2) - Add a certificate to the Client Access Server role

- Installing Exchange Server 2010 : Post-setup configuration (part 1) - Configure a Send Connector to the Internet