1. Component overview
The data collection platform comprises three key components: the Data Collector, collection sets, and the management data warehouse.
These components work together to enable information to be
automatically collected and stored for later analysis and reporting.
Before we focus on the setup, administration, and benefits of this
platform, let's walk through an overview of the key components.
1.1. Data Collector
The Data
Collector component controls the collection and upload of information
from a SQL Server instance to the management data warehouse using a
combination of SQL Server Agent jobs and SQL Server Integration Services
(SSIS) packages. The information collected is defined in the data
collection sets.
1.2. Data collection sets
A data collection set
is comprised of information relating to a particular aspect of a SQL
Server instance. Three system data collection sets are included with SQL
Server 2008: Disk Usage, Query Statistics, and Server Activity.
Disk usage
This collection set
records disk usage statistics for each database on the instance where
the data collection set is located. By collecting this information on a
regular basis, you can report on information such as the average data
file growth per day.
Query statistics
One of the limitations of
DMVs such as sys.dm_exec_query_stats is that the information they
contain is lost each time SQL Server restarts. The Query Statistics
collection set overcomes this limitation by regularly collecting and
storing this information, enabling retrospective analysis of query
information such as Top N queries by CPU from any period of time in
which the collection data is available.
1.3. Management data warehouse
When created by the
setup wizard, the management data warehouse, commonly known as the MDW,
is created with all of the necessary data structures to store uploaded
data from the data collection sets of participating servers. Each server
is individually configured for data collection, and one of the steps in
this process is choosing the MDW location where the collected data will
be loaded.
Once the data collection
sets begin loading, the MDW becomes the source for a number of included
(and very powerful) reports, which we'll cover shortly. Figure 15.1 shows the major components of the data collection platform working together.
With this overview in mind, let's proceed by looking at the initial setup and configuration steps.
2. Setup and configuration
We use SQL Server Management
Studio to begin the process of configuring the data collection platform
for a SQL Server 2008 instance, with the first step being the selection (or creation) of the MDW.
2.1. MDW selection or creation
Right-clicking
Data Collection under the Management node in SQL Server Management
Studio allows you to select the Configure Management Data Warehouse menu
option, which starts the Configure Management Data Warehouse Wizard.
The first wizard step presents two options, as shown in figure 2:
Create or Upgrade a Management Data Warehouse and Set Up Data
Collection. According to the provided option descriptions, if an MDW
database does not yet exist, you can select the first option to create
an MDW for subsequent server instances to use. Alternatively, the second
option allows you to select an existing MDW.
Although it's possible
for each instance to host its own local MDW database, choosing to create
it on a centralized, dedicated server
provides a number of advantages, particularly in large environments
containing many server instances that will be configured for data
collection. Among others, the major benefits of a single centralized MDW
database include the following:
Centralized administration—A
single MDW database enables simpler administration for actions such as
backups and disk space monitoring. Depending on the configured
collection sets and upload frequencies, the volume of uploaded data can
grow very quickly, so having a single administration point is very
beneficial in this regard.
Single report source—A centralized MDW enables a custom enterprise reporting solution to be configured against a single-source MDW database.
Minimal performance impact—Offloading
the MDW overhead from each of the uploading servers minimizes (and
centralizes) the performance overhead of the data collection platform.
For the purposes of this example, let's imagine our MDW is yet to be created. Thus, we'll select the first option shown in figure 2. The next screen, as shown in figure 3,
permits the creation of a new MDW by clicking the New button and
entering the name, location, and size details. As with the creation of
any database, we must consider the initial size and growth factors; an
MDW can grow very quickly, so avoiding frequent autogrow operations is a vital performance consideration.
The next and final step in
the initial setup wizard is configuring the MDW security, which
involves mapping logins to one of three MDW database roles: mdw_admin,
mdw_reader, and mdw_writer. These roles enable control over the two main
uses of the MDW: loading data into it and reporting on the data within.
Should a central MDW database instance be used by multiple uploading
instances, each uploading instance's SQL Agent account would need to be
defined as a member of the mdw_writer role, unless the account already
has sysadmin membership on the MDW instance. The mdw_reader role is used
for accounts requiring reporting access, and the mdw_admin role is a
superset of both the reader and writer roles.
Once the MDW database
has been created, the next step is to set up data collection. In a
central MDW installation, the data collection setup process is repeated
on each SQL Server instance, each of which would be configured to upload
to the recently created MDW database.
2.2. Data collection setup
Setting up data
collection involves accessing the same wizard we used to create the MDW
database. Right-click Data Collection and choose Configure Management
Data Warehouse; however, this time select Setup Data Collection from the
step shown earlier in figure 2.
The next step, as shown in figure 4,
allows you to select the MDW database you've just created, along with a
cache directory. One of the properties of each data collection set is
whether its data is cached before being uploaded to the MDW. As we'll
discuss shortly, caching collected data before upload reduces the cost
of the collection process, particularly for large and/or frequently
collected data sets.
If the cache directory is not specified, cached data collection sets will use the directory specified in the %TEMP% or %TMP%
system variables. For more control over disk usage, specifying a custom
directory is recommended, ideally on a disk separate from
data/transaction log files. Further, the SQL Agent service account will
need read/write permissions to the specified cache directory.
Once you enter these
settings, the wizard completes and creates the necessary components to
begin collecting and uploading to the selected MDW. Without any further
action, the system data collection sets will remain configured with
default settings determining the upload frequency, cache method, and
data retention period.
To gain a
deeper understanding of the data collection platform, let's look at the
properties of the default data collection sets and how they can be
customized.