Expert Cube Development with SSAS Multidimensional Models
上QQ阅读APP看书,第一时间看更新

Designing simple dimensions

Next, let's build some dimensions. As this is one of the more complicated steps in the cube design process, it's a topic we'll return to again in the future chapters when we need to deal with more advanced modeling scenarios. Right now, we'll concentrate on the fundamentals of dimension design.

Using the New Dimension wizard

Running the New Dimension wizard will give you the first draft of your dimension, something you'll then be able to tweak and tidy up in the Dimension Editor afterwards. The first question you'll be asked, in the Select Creation Method step is how you want to create the new dimension and there are effectively the following two choices:

  • Create the dimension from an existing table or view in your data source (the Use an existing table option)
  • Have SSDT create a dimension automatically for you and optionally fill it with data (the other three options)

The Select Creation Method step of the New Dimension wizard is shown in the following screenshot:

Using the New Dimension wizard

In keeping with our line of argument that all relational modeling work should be done outside SSDT, we recommend you to use the Use an existing table option. If you're trying to build a cube before you have prepared any dimension tables or indeed done any dimensional modeling, you're probably running ahead of yourself and you need to go back and think about your data modeling in more detail.

In the next step of the wizard, you have to choose the main table for your dimension—the table or view that contains the lowest level of granularity of data for your dimension, and which joins directly to a fact table in your DSV. You also need to define the key attribute for the dimension, the attribute that represents the lowest level of granularity in the dimension and to which all other attributes have a one-to-many relationship. For example, on a Product dimension, the key attribute could represent the Stock Keeping Unit (SKU), and on a Customer dimension, the key attribute could represent the customer itself. To define the key attribute, as with any attribute, you have to specify two extremely important properties:

  • Key: This is the column or collection of columns in your dimension table that uniquely identifies each individual member on the attribute, and for the key attribute, this is usually the dimension's surrogate key column. This will set the value of the KeyColumns property of the attribute.
  • Name: This is the column in your dimension table that contains the name or description that the end user expects to see on screen when they browse the dimension. This will set the NameColumn property of the attribute, as shown in the following screenshot. The name may or may not uniquely identify each member on the attribute. For example, on your Customer dimension there may be many members on the key attribute—many individual customers—with the name John Smith. Member uniqueness is determined by the value of the key property of the attribute, but the end user may well be happy seeing multiple members with the same name, so long as they can distinguish between them using other dimension attributes such as Address.
    Using the New Dimension wizard

Click on Next and if you are building your dimension from a set of snowflaked tables, you get to the Select Related Tables set, where you have the option of selecting other tables you want to use to build this dimension. After this, you move onto the Select Dimension Attributes step, where you have the option of creating other new attributes on the dimension by checking any columns that you'd like to turn into attributes. You can also rename the attribute, uncheck the Enable Browsing option (which controls the value of the AttributeHierarchyEnabled property, as shown in the following screenshot), and set the Attribute Type—a property that has a bewildering array of possible values, most of which are pointless, and which can be ignored unless you're designing a Time or an Account dimension.

Using the New Dimension wizard

Now that we are into the final step of the wizard, all there is to do is to confirm the name of the dimension and click on Finish.

Note

A note about naming

At various points in the New Dimension wizard, you have the chance to rename the dimension and attributes from whatever the wizard has guessed at as the default. It's a good idea to take naming seriously even at this early stage and discuss with your end users what the names of dimensions and attributes should be. Changing object names later in the development cycle can break any calculations or queries that you've already defined. Also, when you create an object for the first time, its Object ID property is set to its name, and even though you can subsequently change the name, the ID (which you'll see if you script the object out to XMLA, for example) can never change, which can be confusing if you ever need to work directly with XMLA code. Object names should also be as end user friendly as possible, as they're going to be the names that appear in the reports the end users want to build. The corollary of this is that object names are unlikely to be in a format that anyone with database design experience would choose. You might be happy with a dimension called DimProduct or an attribute called Usr_Addr_FrstLn, but when the CEO gets his sales report, he's not going to want to see these names. Think of designing a cube as designing a user interface for data access by non-technical business people.

Using the Dimension Editor

Once you've completed the wizard, you'll arrive in the Dimension Editor for your new dimension.

Adding new attributes

It's likely that you'll want to add some more attributes to your dimension, and to do so, you can simply drag columns from the tables displayed in the Data Source View pane on the right-hand side of the screen, into the Attributes pane on the left-hand side. Once again, there are a couple of important properties you'll want to set on each of your attributes once you've created them:

  • KeyColumns, NameColumn: They are the column or columns that represent the key and the name of this attribute. It's common for non-key attributes to be based on just one column that represents both the key and the name. For example, a Year attribute on a Time dimension might have values such as 2001 and 2002. In this situation, it's sufficient to set the KeyColumns property and not set the NameColumn property.

    Analysis Services will display the key of the member as its name. For attributes with a very large number of members, you should always try to use a column of the smallest possible numeric data type as the key to an attribute and set the name separately. Using strings as keys can have a performance overhead, as can using composite keys (that is, when you use more than one column in the KeyColumns collection).

  • AttributeHierarchyEnabled: This property controls whether a hierarchy is built for this attribute, that is, whether the user can actually see this attribute when they browse the cube and use it in their queries. By default, it is True. If you set it to False, the attribute still exists, but is only visible in the dimension as a property of another attribute. A good example of when you would do this would be if you were building a Customer dimension. You'd never want to be able to see a hierarchy containing phone numbers and e-mail addresses. However, if you were looking at a specific customer, it might be good to see this information as a property of that customer. Setting this property to False also has the benefit of reducing the amount of time needed to process the dimension.
  • AttributeHierarchyOptimizedState: For attributes that have their AttributeHierarchyEnabled property set to True, this property controls whether indexes are built for the resulting attribute hierarchy. Setting this property to False can improve dimension processing times, since building an index for an attribute can take some time. However, the penalty you pay in terms of query performance can be quite significant, so you should only consider setting this to False for attributes that are rarely used in queries and that are large enough to have an impact on dimension processing times.
  • OrderBy, OrderByAttribute: The order in which members appear in an attribute hierarchy can be important. For example, you would expect days of the week to appear in the order of Sunday, Monday, Tuesday, and so on rather than in alphabetical order. The OrderBy attribute allows you to order the members on an attribute either alphabetically by name, by the value of the key of the attribute, by the value of the name, or by the key of another attribute that has a relationship with the attribute being ordered (which attribute is used is controlled by the value of the OrderByAttribute property). It is only possible to sort in an ascending order. As a result, you may end up creating new numeric columns in your dimension table to use as the key of an attribute in order to ensure the type of sorting you want.
  • AttributeHierarchyOrdered: In some situations, where the order of members on a very large attribute hierarchy doesn't matter much, disabling sorting by setting the AttributeHierarchyOrdered property to False can save a significant amount of time during dimension processing. This is shown in the following blog entry: http://tinyurl.com/attributeordered.
  • IsAggregatable, DefaultMember: For attributes that have their AttributeHierarchyEnabled property set to True, the IsAggregatable property controls whether an attribute's hierarchy contains an All Member as the root of the hierarchy. In general, you should not set this property to False, even for something like a Year attribute on a Time dimension where you might think it makes no sense to display an aggregated value, because it can have all kinds of confusing consequences for the user. Without an All Member, another member on the hierarchy will be automatically selected every time a query is run, and the user may not know which member this is. You can control which member is selected automatically by setting the DefaultMember property, for example, making the default selection on the Year attribute the last year on the hierarchy. However, in our experience it's better to let the user make a selection themselves. The only time you should consider setting IsAggregatable to False is when DefaultMember is used and you have members in an attribute that should never be aggregated together. For instance, in a Budget Scenario attribute, you would get meaningless values if you show the value of actuals and several different budget scenarios aggregated together. Usually, this situation arises when you only have one attribute on the dimension with AttributeHierarchyEnabled set to True.

Note

The 4 GB string store limit

If your dimension contains attributes with several million members, you need to be aware of a limitation with the default format that Analysis Services uses to store string values. If a single attribute needs to store more than 4 GB of string data, processing will fail unless you have changed the StringStoreCompatibilityLevel property on the dimension from the default value of 1050 to the value 1100. More detail on this issue and the StringStoreCompatibilityLevel property can be found at http://tinyurl.com/4GBStringStore.

Even if you don't run into this error when you are in development, you should set this property if you think you have an attribute that may grow over time to exceed this limit. There is a detailed explanation of how to calculate the amount of string storage space needed for an attribute at http://tinyurl.com/StringStoreCalculator.

Configuring a Time dimension

Building a Time dimension is really just the same as building a regular dimension, although there are a few extra properties that need to be set. Going back to the Select Creation Method step of the Dimension wizard, you'll see that several of the available options are specifically for building Time dimensions, but we still recommend that you do not use them and construct a Time dimension yourself; they're only useful for proof-of-concepts. Getting SSDT to build and populate a Time dimension table in your data source can be useful as a quick way to get started on building your own dimension. However, it's very likely that you'll want to make changes to what the wizard creates, not just because it uses a DateTime column as a primary key rather than an integer. The option to generate a Time dimension on the server—so that there is no relational data source needed—suffers from the same problems, but is even more inflexible since there's no way to customize things such as member name formats beyond the options the wizard gives you. There are a lot of resources on the Web that can help you with building your own Time dimension table, such as the one shown at http://tinyurl.com/BuildTimeDim.

The important thing to do with a Time dimension is to tell Analysis Services that it is in fact a Time dimension, so you can access some of the special functionality that Analysis Services has in this area. You can do this by setting the dimension's Type property to Time and then setting attributes' Type property to the nearest approximation of what they represent, as shown in the following screenshot. So, for a Year attribute you would set Type to Years, a Month attribute to Months, and so on.

Configuring a Time dimension

The benefits of doing this are:

  • Certain MDX functions such as YTD() are time aware, and will not need you to specify certain parameters if you have a dimension marked as type Time
  • Semi-additive measures, that is, those with their AggegationFunction set to AverageOfChildren, FirstChild, LastChild, FirstNonEmpty ,or LastNonEmpty, will perform their special types of aggregation on the first Time dimension related to a measure group

Creating user hierarchies

User hierarchies, the multilevel hierarchies that you create in the central Hierarchies pane in the Dimension Editor, can best be described as being something like views on top of attribute hierarchies. They allow you to take two or more attribute hierarchies and combine them into a more complex drillpath. For example, you could take your Year, Month, and Date attributes and combine them into a single hierarchy with Year, Month, and Date levels, so that the user could then easily drill from a year down to see all the months in that year and from a month down to all the dates in that month.

When Analysis Services 2005 was first released and developers started coming to grips with the concept of attribute hierarchies and user hierarchies, one question that came up quite frequently was, "Should I expose attribute hierarchies or user hierarchies to my users?". As usual, the answer is that it depends. There are pros and cons to each and you'll probably end up using a mixture of both types of hierarchy. User hierarchies are more user-friendly because drilling down is just a matter of a double-click. On the other hand, rigidly defined drillpaths might be too limiting, and users might prefer the flexibility of being able to arrange attribute hierarchies whichever way they want. Certainly there are some things you can't do with user hierarchies, such as putting different levels of the same hierarchy on different axes in a query. For you as a developer, user hierarchies have some benefits. They make writing the MDX for certain types of calculation easier (members in a multilevel user hierarchy have meaningful "parents" and "children"), and "natural" user hierarchies, which we'll talk about next, are materialized on disk and so offer query performance benefits and also make the aggregation design process much easier.

If you do create a user hierarchy, you should definitely set the Visible property of its constituent attribute hierarchies to False. If you have too many hierarchies available for a user to choose from in a dimension, you run the risk of confusing the user, and having the same members appear more than once in attribute hierarchies and user hierarchies will make matters worse. Simplicity is a virtue when it comes to dimension design.

Configuring attribute relationships

Attribute relationships allow you to model one-to-many relationships between attributes. For example, you might have Year, Month, and Date attributes on your Time dimension, and you know that there are one-to-many relationships between Year and Month, and Month and Date, so you should build the attribute relationships between these three attributes to reflect this. Why? The short answer is performance. Setting attribute relationships optimally can make a very big difference to query performance, and the importance of this cannot be overstated—they drastically improve Analysis Services' ability to build efficient indexes and make use of aggregations. However, attribute relationships do not have anything to do with how the dimension gets displayed to the end user, so don't get them confused with user hierarchies.

It's very likely that there will be many different ways to model attribute relationships between the same attributes in a dimension, so the important thing is to find the best way of modeling the relationships from a performance point of view. When you first build a dimension in SSDT, you will get a set of attributes relationships that while correct, are not necessarily optimal. This default set of relationships can be described as looking a bit like a bush. Every non-key attribute has a relationship defined either with the key attribute, or with the attribute built on the primary key of the source table in a snowflaked dimension. You can visualize these relationships on the Attribute Relationships tab of the Dimension Editor, as shown in the following screenshot:

Configuring attribute relationships

If you have BIDS Helper installed, right-clicking on the dimension in the Solution Explorer window and selecting Visualize Attribute Lattice will display something like what's shown in the following screenshot. The built-in functionality, in our opinion, sacrifices clarity in favor of saving space, and the BIDS Helper visualization is easier to understand. BIDS Helper displays each attribute as a separate node in its relationship diagram, whereas on the Attribute Relationships tab, multiple attributes are sometimes displayed in the same node.

Configuring attribute relationships

You can see that in our example using Year, Quarter, Month, and Date attributes on a Time dimension, Analysis Services knows that a year is made up of many dates, a quarter is made up of many dates, and a month is made up of many dates, but not that years are made up of quarters or that quarters are made up of months. Changing the attribute relationships to reflect these facts gives you something resembling a long chain, and in general the more long chains you see in your attribute relationships the better. As Analysis Services understands transitory relationships, there's no need to define a relationship between Year and Date, for example, because it can see that there is an indirect relationship via Quarter and Month. In fact, defining such redundant relationships can be a bad thing from a performance point of view.

Here's what the optimized set of relationships looks like in the Dimension Editor:

Configuring attribute relationships

Here's what the same set of relationships looks like in BIDS Helper:

Configuring attribute relationships

You will also notice a change in how some user hierarchies are displayed once you've optimized your attribute relationships in this way. User hierarchies that have one-to-many relationships defined between the attributes that make up each level are called natural user hierarchies. User hierarchies that don't are called unnatural user hierarchies and display an amber warning triangle in their top left-hand corner. This warning triangle does not signify that the hierarchy is broken or incorrectly configured, just that you should check your attribute relationships to see whether they can be optimized. You may actually want to build an unnatural user hierarchy, and it may not be possible to set up the attribute relationships to make it into a natural user hierarchy, and so long as you are aware that unnatural user hierarchies may not perform as well as natural user hierarchies, you should not be put off by the warning.

Configuring attribute relationships

It is very important to understand your data before you set up any attribute relationships, because if you set them up incorrectly, then it could result in Analysis Services returning incorrect data. The Dimension Health Check functionality in BIDS Helper, available when you right-click on a dimension in the Solution Explorer, checks whether the attribute relationships you've defined actually reflect the data in your dimension table. Here's an example of a common mistake that is made: your Year attribute has members 2001, 2002, 2003, and 2004; your Quarter attribute has four members from Quarter 1 to Quarter 4. Is there a one-to-many relationship between Year and Quarter? The answer is in fact no, despite what common sense tells you, because, for example, a quarter called Quarter 1 appears in every Year. What you can and should do here is modify your data so that you can build an attribute relationship: a quarter called Quarter 1 2001 only exists in the year 2001. There are two ways to do this:

  • By modifying the data in your dimension table (the best option)
  • By using a composite key in the KeyColumns property of the Quarter attribute that is made up of both the keys for the quarter and the year

You can read a more detailed discussion of these techniques at the end of the following article: http://tinyurl.com/attributerelationships. You should not make changes like this at the expense of the users' requirements, but in most cases they will not mind minor modifications such as these and will appreciate the increased query performance.

The RelationshipType property of an attribute relationship indicates whether the relationship between any two members on two related attributes is ever likely to change or not. For example, you would hope that the date January 1st 2001 would always appear under the month January 2001 in your dimension. If it moved to the month March 2002, something would be very wrong with your source data. In this case, you should set the RelationshipType property of the Month-Date attribute relationship to Rigid. On the other hand, it may be true that the relationship between a Customer attribute and a City attribute may change over time if a customer changes his/her residence. In that case, the RelationshipType property should be set to Flexible. Setting RelationshipType to Rigid where possible increases the chances that aggregations will not need to be rebuilt when a dimension undergoes a ProcessUpdate, something that will be discussed in more detail in Chapter 10, Going in Production. Our advice is never to set the RelationshipType property to Rigid unless you are absolutely sure that no UPDATE or DELETE statement is ever run on the underlying dimension table. If a rigid relationship does change, dimension processing will fail.

Finally, attribute relationships also have a second function, and that is to act as member properties. If a Month has many dates, you can think of the relationship in reverse as a Date having a property which is its month. A better example might be a Customer having an Address, and if the Address attribute has its AttributeHierarchyEnabled property set to False then it may only be visible as a property of Customer. Every attribute relationship is visible by default as a member property in most client tools. If you want to hide the member property, you can set the relationship's Visible property to False.