Oracle® Database 2 Day + Data Warehousing Guide 11g Release 2 (11.2) Part Number E25555-03 |
|
|
PDF · Mobi · ePub |
Using Oracle Warehouse Builder (OWB), you can design a data warehouse that is either relational or dimensional.
Warehouse Builder explicitly separates dimensional design from physical implementation. You can choose either a relational implementation or a multidimensional implementation for the dimensional objects using a simple click operation. Therefore, you can implement the same dimensional object as a relational target warehouse or a multidimensional warehouse.
This chapter shows you how to design a dimensional model implemented as a relational target warehouse. You model a small data warehouse consisting of a cube and two dimensions. Although you can use Warehouse Builder to model complex snowflake schemas, for the purposes of this demonstration, you model a simple star schema consisting of a cube with foreign key references to the two dimensions.
This chapter contains the following topics:
This section provides a procedure for defining a relational target schema.
To define a relational target warehouse:
Designate a schema as the warehouse target schema as described in "Identifying the Warehouse Target Schema".
Define or import source and target objects into the warehouse target module.
In general, you can right-click any node in the warehouse target module and select either New or Import. Warehouse Builder starts the appropriate wizard to guide you. Click Help for additional information.
The types of objects you add to the warehouse target module depend on the type of your source data and the purpose of the data warehouse.
To continue with the exercises presented in this guide, see "Exercise: Adding External Tables to the Target Module" and "Exercise: Understanding Dimensions".
Configure the source and target objects.
Some objects require additional configuration. After you import or define an object in the warehouse module, right-click and select Configure to review the settings and make changes as necessary.
In a traditional data warehousing implementation, there is typically only one target schema, which is the data warehouse target.
To designate a schema as the data warehouse target schema:
Register the schema in Warehouse Builder.
In the Globals Navigator panel, expand the Security node. Right-click the Users node and select New User.
In the Select DB User to Register page of the Create User Wizard, select Create DB User and follow the prompts. Click Help or the F1 key if you need more information.
For the purposes of the demonstration, create a new schema and call it EXPENSE_WH
.
Specify the location information for the new schema.
In the Locations Navigator, right-click and select New Oracle Location from Locations under the Oracle node.
Create a location named EXPENSE_WH_LOCATION
. Select the option to test the connection.
In the Projects Navigator, associate a module with the schema location.
Recall that in "Example: Importing Metadata from Flat Files", you created a module to correspond to a location from which you import metadata. In a similar way, you must create a module to correspond to the location for the target schema.
In the OWB_DEMO
project, expand the Databases node, right-click the Oracle node, and select New Oracle Module. Follow the prompts in the Create Module Wizard. Ensure that you designate the module status as Warehouse Target.
For the purposes of the demonstration, name the module EXPENSE_WH.
Familiarize yourself with the new data warehouse target schema.
In the Projects Navigator, expand the node for the newly defined warehouse target module. Notice the various types of objects listed under the node. These are the types of objects that you can either define in or import into the module.
The types of objects you add to the target module have implications on the ETL logic you subsequently design. If your source data originates from flat files, you can choose to generate either SQL*Loader code or SQL code. Each type of code has its own advantages.
To utilize SQL*Loader in Warehouse Builder, import the flat files as described in "Example: Importing Metadata from Flat Files". To utilize SQL, however, you must define an external table in the warehouse module as described in "Exercise: Adding External Tables to the Target Module".
External tables are tables that represent data from flat files in a relational format. They are read-only tables that act like regular source tables in Warehouse Builder. Each external table you create corresponds to a single record type in an existing flat file.
The objective of this exercise is to create the necessary external tables for the two flat files that were previously imported. Because both files have a single record type, you must create only one external table for each file.
To add external tables to the target warehouse module:
In the Projects Navigator, expand the Databases node and then the Oracle node.
Expand the target module where you want to create the external table.
Expand the EXPENSE_WH
module.
Right-click the External Tables node and select New External Table.
Warehouse Builder displays the Create External Table wizard. Follow the prompts.
Name the external table EXPENSE_CATEGORIES.
When prompted to select a flat file, select EXPENSE_CATEGORIES_CSV.
Repeat the previous step to create an external table called EXPENSE_DATA
to represent EXPORT_CSV.
Configure the physical file system details for the two external tables.
Right-click an external table from the module and select Configure. On the DataFiles node, right-click and select Create. Accept the default name, NEW_DATAFILE_1.
Enter the name of the flat file from which the external table inherits data. Therefore, specify the data file name as expense_categories.csv
for one external table and export.csv
for the other.
A dimension is a structure that organizes data. Examples of commonly used dimensions are Customers, Time, and Products.
For relational dimensions, using dimensions improves query performance because users often analyze data by drilling down on known hierarchies. An example of a hierarchy is the Time hierarchy of year, quarter, month, day. Oracle Database uses these defined hierarchies by rewriting queries that retrieve data from materialized views rather than detail tables.
Typical relational dimension tables have the following characteristics:
A single-column primary key populated with values called warehouse keys.
Warehouse keys provide administrative control over the dimension, support techniques that preserve dimension history, and reduce the size of cubes.
One or more hierarchies that are explicitly defined as dimension objects. Hierarchies maximize the number of query rewrites performed by the Oracle server.
Dimensions are the primary organizational unit of data in a star schema. Examples of some commonly used dimensions are Customer, Product, and Time.
A dimension consists of a set of levels and a set of hierarchies defined over these levels. When you create a dimension, you define the following:
Dimension Attributes: A descriptive characteristic of a dimension member. It has a name and a data type.
Levels: Defines the level of aggregation of data. For example, the Products dimension can have the following levels: Total, Groups, and Product.
Level attributes: A descriptive characteristic of a level member. Each level in the dimension has a set of level attributes.
Hierarchies: A logical structure that uses ordered levels or a set of data values (for a value-based hierarchy) as a means of organizing data. A hierarchy describes parent-child relationships among a set of levels.
To understand the basic concepts and design of a dimension, you will examine a predefined dimension.
To become familiar with the dimensions:
Open the PRODUCTS
dimension in the Dimension Editor.
In the Projects Navigator, navigate to OWB_DEMO,
Databases, Oracle, SALES_WH,
and then expand Dimensions. Double-click PRODUCTS.
Warehouse Builder starts the Dimension Editor. The Dimension Editor is the single interface where you can design, create, and manage a variety of database or dimensional objects.
Observe the dimension attributes.
A dimension attribute is a descriptive characteristic of a dimension member. It has a name and a data type. A dimension attribute is applicable to one or more levels in the dimension. They are implemented as level attributes to store data.
The list of dimension attributes must include all the attributes that you may need for any of the levels in the dimension.
For example, the Products dimension has a dimension attribute called Description. This attribute is applicable to all the levels (Total, Groups, and Products) and stores the description for each of the members of these levels.
Observe the levels.
The levels in a dimension represent the level of aggregation of data. A dimension must contain at least one level, except when a dimension contains a value-based hierarchy. Every level must have level attributes and a level identifier.
For example, the dimension Products can have the following levels: Total, Groups, and Product.
Every level must have two identifiers: a surrogate identifier and a business identifier. When you create a dimension, each level must implement the dimension attributes marked as the surrogate identifier and business identifier (attributes, in the case of a composite business identifier) of the dimension.
A surrogate identifier uniquely identifies each level record across all the levels of the dimension. It must be composed of a single attribute. Surrogate identifiers enable you to hook facts to any dimension level as opposed to the lowest dimension level only.
For a dimension that has a relational implementation, the surrogate identifier must be of the data type NUMBER
. Because the value of the surrogate identifier must be unique across all dimension levels, you use the same sequence to generate the surrogate identifier of all the dimension levels.
For a relational implementation, the surrogate identifier serves the following purposes:
If a child level is stored in a different table from the parent level, each child level record stores the surrogate identifier of the parent record.
In a fact table, each cube record stores only the surrogate identifier of the dimension record to which it refers. By storing the surrogate identifier, the size of the fact table that implements the cube is reduced.
A business identifier consists of a user-selected list of attributes. The business identifier must be unique across the level and is always derived from the natural key of the data source. The business identifier uniquely identifies the member. For example, the business identifier of a Product level can be its Universal Product Code (UPC), which is a unique code for each product.
The business identifier does the following:
Identifies a record in business terms
Provides a logical link between the fact and the dimension or between two levels
Enables the lookup of a surrogate key
When you populate a child level in a dimension, you must specify the business identifier of its parent level. When you populate a cube, you must specify the business identifier of the dimension level to which the cube refers.
A parent identifier is used to annotate the parent reference in a value-based hierarchy.
For example, an EMPLOYEE
dimension with a value-based hierarchy, has the following dimension attributes: ID
, FIRST_NAME
, LAST_NAME
, EMAIL
, PHONE
, JOB_ID
, HIRE_DATE
, and MANAGER_ID
. In this dimension, ID
is the surrogate identifier and MANAGER_ID
is the parent identifier.
A level attribute is a descriptive characteristic of a level member. Each level in the dimension has a set of level attributes. To define level attributes, you select the dimension attributes that the level will implement. A level attribute has a distinct name and a data type. The data type is inherited from the dimension attribute that the level attribute implements. The name of the level attribute can be modified to be different from that of the dimension attribute that it implements.
Every level must implement the attribute marked as the surrogate identifier and the business identifier in the set of the dimension attributes.
A dimension hierarchy is a logical structure that uses ordered levels or a set of data values (for a value-based hierarchy) as a means of organizing data. A hierarchy describes parent-child relationships among a set of levels. A level-based hierarchy must have at least one level. A level can be part of more than one hierarchy.
For example, the Time dimension can have the following two hierarchies:
Fiscal Hierarchy: Fiscal Year > Fiscal Quarter > Fiscal Month > Fiscal Week > Day
Calendar Hierarchy: Calendar Year > Calendar Quarter > Calendar Month > Day
All hierarchies must be strict 1:n relationships. One record in a parent level corresponds to multiple records in a child level, but one record in a child level corresponds to only one parent record within a hierarchy.
A dimension role is an alias for a dimension. In a data warehouse, a cube can refer to the same dimension multiple times, without requiring the dimension to be stored multiple times. Multiple references to the same dimension may cause confusion. To avoid confusion, you create an alias for each reference to the dimension, thus allowing the joins to be instantly understandable. In such cases, the same dimension performs different dimension roles in the cube.
For example, a sales record can have the following three time values:
Time the order is booked
Time the order is shipped
Time the order is fulfilled
Instead of creating three time dimensions and populating them with data, you can use dimension roles. Model one time dimension and create the following three roles for the time dimension: order booked time, order shipped time, and order fulfillment time. The sales cube can refer to the order time, ship time, and fulfillment time dimensions.
When the dimension is stored in the database, only one dimension is created, and each dimension role references this dimension. When the dimension is stored in the OLAP catalog, Warehouse Builder creates a dimension for each dimension role. Thus, if a time dimension has three roles, three dimensions are created in the OLAP catalog. However, all three dimensions are mapped to the same underlying table. This is a workaround because the OLAP catalog does not support dimension roles.
Note:
Dimension roles can be created for dimensions that have a relational implementation only.A level relationship is an association between levels in a dimension hierarchy. Level relationships are implemented using level attributes that store the reference to the parent level in the hierarchy.
For example, the Products dimension has the following hierarchy: Total > Groups > Product. Warehouse Builder creates two level relationships: Product to Groups and Groups to Total. Two new attributes implement this level relationship: one in the Product level and one in the Groups level. These attributes store the surrogate ID of the parent level.
An example of a dimension is the Products dimension that you use to organize product data. Table 4-1 lists the levels in the Products dimension and the surrogate identifier and business identifier for each of the levels in the dimension.
Table 4-1 Products Dimension Level Details
Level | Attribute Name | Identifier |
---|---|---|
Total |
ID |
Surrogate |
Name |
Business |
|
Description |
||
Groups |
ID |
Surrogate |
Name |
Business |
|
Description |
||
Product |
ID |
Surrogate |
UPC |
Business |
|
Name |
||
Description |
||
Package Type |
||
Package Size |
The Products dimension contains the following hierarchy:
Hierarchy 1: Total > Groups > Product
Warehouse Builder creates control rows that enable you to link fact data to a dimension at any level. For example, you may want to reuse a Time dimension in two different cubes to record the budget data at the month level and the actual data at the day level. Because of the way dimensions are loaded with control rows, you can perform this action without any additional definitions. Each member in a dimension hierarchy is represented using a single record.
All control rows have negative dimension key values starting from -2. For each level value of higher levels, a row is generated that can act as a unique linking row to the fact table. All the lower levels in this linking or control rows are nulled out.
Consider the Products dimension described in "Dimension Example". You load data into this dimension from a table that contains four categories of products. Warehouse Builder inserts control rows in the dimension as shown in Table 4-2. These rows enable you to link to a cube at any dimension level. Note that the table does not contain all the dimension attribute values.
Table 4-2 Control Rows Created for the Products Dimension
Dimension Key | Total Name | Groups Name | Product Name |
---|---|---|---|
-3 |
TOTAL |
||
-9 |
TOTAL |
Hardware |
|
-10 |
TOTAL |
Software |
|
-11 |
TOTAL |
Electronics |
|
-12 |
TOTAL |
Peripherals |
To obtain the real number of rows in a dimension, count the number of rows by including a WHERE
clause that excludes the NULL
rows. For example, to obtain a count on Products, count the number of rows including a WHERE
clause to exclude NULL
rows in Product.
Implementing a dimension consists of specifying how the dimension and its data are physically stored. Warehouse Builder enables several types of implementations for dimensional objects, including multi-dimensional implementations. However, this guide describes a relational implementation only.
In a star schema implementation, Warehouse Builder stores the dimension data in a single table. Because the same table or view stores data for more than one dimension level, you must specify a dimension key column in the table. The dimension key column is the primary key for the dimension. This column also forms the foreign key reference to the cube.
Each level implements a subset of dimension attributes. By default, the level attribute name is the same as the dimension attribute name. To avoid name conflicts caused by all level data being stored in the same table, Warehouse Builder uses the following guidelines for naming in a star table:
If the level attribute name is not unique, Warehouse Builder prefixes it with the name of the level.
If the level attribute name is unique, Warehouse Builder does not use any prefix.
Note:
To ensure that no prefixes are used, you must explicitly change the level attribute name in the Create Dimension wizard or the Data Object Editor.For example, if you implement the Products dimension using a star schema, Warehouse Builder uses a single table to implement all the levels in the dimension.
Figure 4-1 Star Schema Implementation of Products Dimension
When you perform binding, you specify the database columns that will store the data of each attribute and level relationship in the dimension. You can perform either auto binding or manual binding for a dimension.
Auto Binding When you perform auto binding, Warehouse Builder binds the dimension object attributes to the database columns that store their data. When you perform auto binding for the first time, Warehouse Builder also creates the tables that are used to store the dimension data.
When you perform auto binding on a dimension that is already bound, Warehouse Builder uses the following rules:
If the implementation method of the dimension remains the same, Warehouse Builder rebinds the dimensional object to the existing implementation objects.
For example, you create a Products dimension using the star schema implementation method and perform auto binding. The dimension data is stored in a table called PRODUCTS
. You modify the dimension definition at a later date but retain the implementation method as star schema. When you now auto bind the Products dimension, Warehouse Builder rebinds the Products dimension attributes to the same implementation tables.
If the implementation method of a dimension is changed, Warehouse Builder deletes the old implementation objects and creates a new set of implementation tables. If you want to retain the old implementation objects, you must first unbind the dimensional object and then perform auto binding. For more information about implementation methods, see "Star Schema".
For example, you create a Products dimension using the star schema implementation method and bind it to the implementation table. You now edit this dimension and change its implementation method to snowflake schema. When you now perform auto binding for the modified Products dimension, Warehouse Builder deletes the table that stores the dimension data, creates new implementation tables, and binds the dimension attributes and relationships to the new implementation tables.
In the Projects Navigator, select the dimension.
From the File menu, choose Bind.
If the Bind option is not enabled, then check if the dimension is a relational dimension and that the Manual options is not set in the Implementation section of the Storage tab.
Auto binding uses the implementation settings described in "Star Schema".
Manual Binding You would typically use manual binding to bind existing tables to a dimension. Use manual binding if no auto binding or rebinding is required.
To perform manual binding for a dimension:
Create the implementation objects (tables or views) that you will use to store the dimension data.
In the case of relational dimensions, create the sequence used to load the surrogate identifier of the dimension. You can also choose to use an existing sequence.
In the Projects Navigator, right-click the dimension and select Open.
The Dimension Editor is opened.
Go to the Physical Bindings tab.
From the Component Palette, drag and drop the operator that represents the implementation object onto the canvas.
Warehouse Builder displays the Add a New or Existing Object dialog box. For example, if the dimension data is stored in a table, drag a Table operator from the Component Palette and drop it onto the canvas. The Add a New or Existing Table dialog box is displayed.
Choose the Select an existing Object option and then select the data object from the list of objects displayed in the selection tree.
Click OK.
A node representing the object that you just added is displayed on the canvas.
If more than one data object is used to store the dimension data, perform steps 4 to 6 for each data object.
Map the attributes in each level of the dimension to the columns that store their data. Hold down your mouse on the dimension attribute, drag, and then drop on the column that stores the attribute value.
Also map the level relationships to the database column that store their data.
For example, for the Products dimension described in "Dimension Example", the attribute Name
in the Groups
level of the Products
dimension is stored in the Group_name
attribute of the Products_tab
table. Hold down the mouse on the Name
attribute, drag, and drop on the Group_name
attribute of the Products_tab
table.
Cubes contain measures and link to one or more dimensions. The axes of a cube contain dimension members and the body of the cube contains measure values. Most measures are additive. For example, sales data can be organized into a cube whose edges contain values for Time, Products, and Customers dimensions and whose body contains values from the measures Value sales, and Dollar sales.
A cube is linked to dimension tables over foreign key constraints. Because data integrity is vital, these constraints are critical in a data warehousing environment. The constraints enforce referential integrity during the daily operations of the data warehouse.
Data analysis applications typically aggregate data across many dimensions. This enables them to look for anomalies or unusual patterns in the data. Using cubes is the most efficient way of performing these type of operations. In a relational implementation, when you design dimensions with warehouse keys, the cube row length is usually reduced. This is because warehouse keys are shorter than their natural counterparts. This results is a smaller amount of storage space needed for the cube data.
A typical cube contains:
A primary key defined on a set of foreign key reference columns or, in the case of a data list, on an artificial key or a set of warehouse key columns. When the cube is a data list, the foreign key reference columns do not uniquely identify each row in the cube.
A set of foreign key reference columns that link the table with its dimensions.
A cube consists of the set of measures defined over a set of dimensions. To create a cube, you must define the following:
A measure is data, usually numeric and additive, that can be examined and analyzed. Examples of measures include sales, cost, and profit. A cube must have one or more measures. You can also perform aggregation of measures. Only numeric measures can be aggregated.
A cube is defined by a set of dimensions. A cube can refer to a level that is not the lowest level in a dimension.
For cubes that use a pure relational implementation, you can reuse the same dimension multiple times with the help of dimension roles. For more information about dimension roles, see "Dimension Roles".
Before you validate a cube, ensure that all the dimensions that the cube references are valid.
To define a dimension reference, specify the following:
The dimension and the level within the dimension to which the cube refers.
For a cube that uses a relational implementation, you can refer to intermediate levels in a dimension. Warehouse Builder supports a reference to the non surrogate identifier of a level, for example, the business keys.
For dimensions that use a relational implementation, a dimension role for each dimension to indicate what role the dimension reference is performing in the cube. Specifying the dimension role is optional.
The Sales cube stores aggregated sales data. It contains the following two measures: Value_sales
and Dollar_sales
.
Value_sales
stores the amount of the sale in terms of the quantity sold.
Dollar_sales
stores the amount of the sale.
Table 4-3 describes the dimensionality of the Sales cube. It lists the name of the dimension and the dimension level that the cube references.
When you implement a cube, you specify the physical storage details for the cube. As with dimensions, Warehouse Builder enables you to implement cubes in relational or multidimensional forms. The relational implementation is described in this guide.
The database object used to store the cube data is called a fact table. A cube must be implemented using only one fact table. The fact table contains columns for the cube measures and dimension references.
Select a table or materialized view that will store the cube data.
For each measure, select a column that will store the measure data.
For each dimension reference, select a column that will store the dimension reference.
Each dimension reference corresponds to a column on the fact table and optionally a foreign key from the fact table to the dimension table. The 1:n relationships from the fact tables to the dimension tables must be enforced.
Figure 4-2 Implementation of the Sales Cube
When you perform binding, you specify the database columns that will store the data of each measure and dimension reference of the cube. You can perform auto binding or manual binding for a cube.
Auto Binding When you perform auto binding, Warehouse Builder creates the table that stores the cube data and then binds the cube measures and references to the database columns. For detailed steps on performing auto binding, see "Auto Binding".
When you perform auto binding for a cube, ensure that you auto bind the dimensions that a cube references before you auto bind the cube. You will not be able to deploy the cube if any dimension that the cube references was auto bound after the cube was last auto bound.
For example, you create the SALES
cube that references the TIMES
and PRODUCTS
dimensions and perform auto binding for the cube. You later modify the definition of the PRODUCTS
dimension. If you now attempt to auto bind the SALES cube again, Warehouse Builder generates an error. You must first auto bind the PRODUCTS dimensions and then auto bind the cube.
Manual Binding In manual binding, you must first create the table or view that stores the cube data and then map the cube references and measures to the database columns that store their data. Alternatively, you can use an existing database table or view to store the cube data.
To perform manual binding for a cube:
Create the table or view that stores the cube data.
In the Projects Navigator, right-click the cube and select Open.
The Cube Editor is opened.
Go to the Physical Bindings tab.
From the Component Palette, drag and drop the operator that represents the implementation object onto the canvas.
Warehouse Builder displays the Add a New or Existing Object dialog box. For example, if the cube data is stored in a table, drag a Table operator from the Component Palette and drop it onto the canvas. The Add a New or Existing Table dialog box is displayed.
Choose Select an existing object and then select the data object from the list of objects displayed in the selection tree.
Click OK.
A node representing the object that you just added is displayed on the canvas.
Map the measures and dimension references of the cube to the columns that store the cube data. Hold down your mouse on the measure or dimension reference, drag, and then drop on the data object attribute that stores the measure or dimension reference.