It is good to install
BIDS Helper, an award-winning free community-developed tool that adds a
lot of useful functionality to BIDS. You can download it from http://www.codeplex.com/bidshelper.
Creating data sources
Once we've created a new
project and configured it appropriately, the next step is to create a
data source object. Even though you can create multiple data sources in a
project, you probably shouldn't.
You are then faced with the
choice of which OLE DB provider to use, since there are often several
different options for any given relational database. For SQL Server data
sources, you have the option of using the SQLClient .NET data provider,
the Microsoft OLE DB provider for SQL Server and the SQL Server Native
Client (often referred to as SNAC). You should always choose the SQL
Server Native Client since it offers the best performance. For Oracle
data sources, the choice is more complicated since, even though Oracle
is a supported data source for Analysis Services, there is a long list
of bugs and issues. Some are addressed in the white paper at http://tinyurl.com/asdatasources,
but if you do run into problems, the best approach is to try using
Microsoft's Oracle OLE DB Provider, Oracle's own OLE DB Provider, the
.NET Provider for Oracle or any of the third-party OLE DB Providers on
the market to see which one works. Access, DB2, Teradata and Sybase are
the other officially supported relational data sources, and if you need
to load data from another source, you can always use SQL Server
Integration Services to push data into the cube by using the Dimension
Processing and Partition Processing destinations in a Data Flow.
Remember to install
the same version of any OLE DB provider you're using on all of your
development, test and production machines. Also, while BIDS is a 32-bit
application and needs a 32-bit version of the driver to connect to a
relational database, if your Analysis Services instance is 64-bit, it
will need the 64-bit version of the same driver to process cubes
successfully.
Analysis Services
must also be given permission to access the data source, and how it does
so depends on the type of data source you're using and how its security
is set up. If you're using Windows authentication to connect to SQL
Server, as Microsoft recommends you to, then you should set up a new
Windows domain account specifically for Analysis Services, and then use
the SQL Server Configuration Manager tool to set the Analysis Services
service to run under that account. You should then give that account any
permissions it needs in SQL Server on the tables and views you'll be
using. Most of the time 'Read' permissions will be sufficient. However,
some tasks, such as creating Writeback fact tables, will need more.
You'll notice on the Impersonation Information
tab in the Data Source Designer dialog in BI Development Studio there
are some other options for use with Windows authentication, such as the
ability to enter the username and password of a specific user. However,
we recommend that you use the Use Service Account option so that Analysis Services tries to connect to the relational database under the account you've created.
If you need to connect to
your data source using a username and a password (for example, when
you're using SQL Server authentication or Oracle), then Analysis
Services will keep all sensitive information, such as passwords, in an
encrypted format on the server after deployment. If you try to script
the data source object out you'll find that the password is not
returned, and since opening an Analysis Services project in online mode
essentially involves scripting out the entire database, you'll find
yourself continually re-entering the password in your data source
whenever you want to reprocess anything when working this way. This is
another good reason to use project mode rather than online mode for
development and to use Windows authentication where possible.
Creating Data Source Views
In an ideal world, if you've
followed all of our recommendations so far, then you should need to do
very little work in your project's Data Source View—nothing more than
selecting the views representing the dimension and fact tables and
setting up any joins between the tables that weren't detected
automatically. Of course, in the real world, you have to compromise your
design sometimes and that's where a lot of the functionality available
in Data Source Views comes in useful.
When you first create a new Data Source View (DSV),
the easiest thing to do is to go through all of the steps of the
wizard, but not to select any tables yet. You can then set some useful
properties on the DSV, which will make the process of adding new tables
and relationships much easier. In order to find them, right-click on
some blank space in the diagram pane and click on Properties. They are:
Retrieve Relationships—by default, this is set to True,
which means that BIDS will add relationships between tables based on
various criteria. It will always look for foreign key relationships
between tables and add those. Depending on the value of the NameMatchingCriteria property, it may also use other criteria as well.
SchemaRestriction—this
property allows you to enter a comma-delimited list of schema names to
restrict the list of tables that appear in the Add/Remove Tables
dialog. This is very useful if your data warehouse contains a large
number of tables and you used schemas to separate them into logical
groups.
NameMatchingCriteria—if the RetrieveRelationships property is set to True,
then BIDS will try to guess relationships between tables by looking at
column names. There are three different ways it can do this:
1. by looking for identical column names in the source and destination tables (for example, FactTable.CustomerID to Customer.CustomerID)
2. by matching column names to table names (for example, FactTable.Customer to Customer.CustomerID)
3. by matching column names to a combination of column and table names (for example, FactTable.CustomerID to Customer.ID).
This
is extremely useful if the tables you're using don't actually contain
foreign key relationships. You'll also see an extra step in the New Data Source View wizard allowing you to set these options if no foreign keys are found in the Data Source you're using.
Now, you can go ahead and right-click on the DSV design area and select the Add/Remove Tables
option and select any tables or views you need to use. It might be a
good idea not to select everything you need initially, but to select
just one fact table and a few dimension tables so you can check the
relationships and arrange the tables clearly, then add more. It's all
too easy to end up with a DSV that looks like a plate of spaghetti and
is completely unreadable. Even though you don't actually need to add
every single relationship at this stage in order to build a cube, we
recommend that you do so, as the effort will pay off later when BIDS
uses these relationships to automatically populate properties such as
dimension-to-measure group relationships.
Creating multiple
diagrams within the DSV, maybe one for every fact table, will also help
you organize your tables more effectively. The Arrange Tables
right-click menu option is also invaluable.
Named Queries and Named
Calculations allow you to add the equivalent of views and derived
columns to your DSV, and this functionality was added to help cube
developers who needed to manipulate data in the relational database, but
didn't have the appropriate permissions to do so. However, if you have
the choice between, say, altering a table and a SQL Server Integration Services (SSIS)
package to fix a modeling problem or creating a Named Query, then we
recommend that you always choose the former one—only do work in the DSV
if you have no other choice. As we've already said several times, it
makes much more sense to keep all of your ETL work in your ETL tool, and
your relational modeling work in the relational database where it can
be shared, managed and tuned more effectively. Resist the temptation to
be lazy and don't just hack something in the DSV! One of the reasons why
we advocate the use of views on top of dimension and fact tables is
that they are as easy to alter as named queries and much easier to tune.
The SQL that Analysis Services generates during processing is
influenced heavily by what goes on in the DSV, and many processing
performance problems are the result of cube designers taking the easy
option early on in development.
If you make changes in your relational data source, those changes won't be reflected in your DSV until you click the Refresh Data Source View button or choose Refresh on the right-click menu.
Problems with TinyInt
Unfortunately, there's a bug in Analysis Services 2008 that causes a problem in the DSV when you use key columns of type TinyInt. Since Analysis Services doesn't support this type natively, the DSV attempts to convert it to something else—a System.Byte for foreign keys to dimensions on the fact table and a System.Int32
for primary keys on dimension tables which have Identity set to true.
This in turn means you can no longer create joins between your fact
table and dimension table. To work around this, you need to create a
named query on top of your dimension table containing an expression that
explicitly casts your TinyInt column to a TinyInt (for example using an expression like cast(mytinyintcol as tinyint) ), which will make the DSV show the column as a System.Byte. It sounds crazy, but for some reason it works.