PK
EAoa, mimetypeapplication/epub+zipPK EA iTunesMetadata.plistX
This chapter describes how to initially configure your data warehouse environment. It contains the following topics:
The procedures in this section describe how to configure Oracle Database for use as a data warehouse. Subsequently, you configure Oracle Warehouse Builder (OWB), which leverages Oracle Database and provides graphical user interfaces to design data management strategies.
To set up a data warehouse system:
Size and configure your hardware as described in "Preparing the Environment".
Install the Oracle Database software. See the installation instructions in Oracle Database 2 Day DBA or the installation guide for your platform, such as Oracle Database Installation Guide for Linux
Optimize the Database for use as a data warehouse as described in "Setting Up a Database for a Data Warehouse".
Access the Oracle Warehouse Builder software.
Follow the instructions in "Accessing Oracle Warehouse Builder". Subsequently, you can install a demonstration to help you learn how to complete common data warehousing tasks using Warehouse Builder.
The basic components for a data warehousing architecture are similar to an online transaction processing (OLTP) system. However, because of the size and volume of data, the hardware configuration and data throughput requirements for a data warehouse are unique. The starting point for sizing a data warehouse is the throughput that you require from the system. When sizing, use one or more of the following criteria:
The amount of data accessed by queries during peak time and the acceptable response time.
The amount of data that is loaded within a window of time.
In general, you must estimate the highest throughput required at any given point.
Hardware vendors can recommend balanced configurations for a data warehousing application and can help you with the sizing. Contact your preferred hardware vendor for more details.
A properly sized and balanced hardware configuration is required to maximize data warehouse performance. The following sections describe important considerations in achieving this balance:
Central processing units (CPUs) provide the calculation capabilities in a data warehouse. You must have sufficient CPU power to perform the data warehouse operations. Parallel operations are more CPU-intensive than the equivalent number of serial operations.
Use the estimated highest throughput as a guideline for the number of CPUs required. As a rough estimate, use the following formula:
<number of CPUs> = <maximum throughput in MB/s> / 200
When you use this formula, you assume that a CPU can sustain up to about 200 MB per second. For example, if you require a maximum throughput of 1200 MB per second, then the system needs <number of CPUs> = 1200/200 = 6 CPUs
. A configuration with 1 server with 6 CPUs can service this system. A 2-node clustered system could be configured with 3 CPUs in both nodes.
Memory in a data warehouse is particularly important for processing memory-intensive operations such as large sorts. Access to the data cache is less important in a data warehouse because most of the queries access vast amounts of data. Data warehouses do not have the same memory requirements as mission-critical OLTP applications.
The number of CPUs is a good guideline for the amount of memory you need. Use the following simplified formula to derive the amount of memory you need from the CPUs that you select:
<amount of memory in GB> = 2 * <number of CPUs>
For example, a system with 6 CPUs needs 2 * 6 = 12 GB
of memory. Most standard servers fulfill this requirement.
A common mistake in data warehouse environments is to size the storage based on the maximum capacity needed. Sizing that is based exclusively on storage requirements will likely create a throughput bottleneck.
Use the maximum throughput you require to find out how many disk arrays you need. Use the storage provider's specifications to find out how much throughput a disk array can sustain. Note that storage providers measure in Gb per second, and your initial throughput estimate is based on MB per second. An average disk controller has a maximum throughput of 2 Gb per second, which equals a sustainable throughput of about (70% * 2 GB/s) /8 = 180 MB/s
.
Use the following formula to determine the number of disk arrays you need:
<number of disk controllers> = <throughput in MB/s> / <individual controller throughput in MB/s>
For example, a system with 1200 MB per second throughput requires at least 1200 / 180 = 7 disk arrays.
Ensure you have enough physical disks to sustain the throughput you require. Ask your disk vendor for the throughput numbers of the disks.
The end-to-end I/O system consists of more components than just the CPUs and disks. A well-balanced I/O system must provide approximately the same bandwidth across all components in the I/O system. These components include:
Host bus adapters (HBAs), the connectors between the server and the storage.
Switches, in between the servers and a storage area network (SAN) or network attached storage (NAS).
Ethernet adapters for network connectivity (GigE NIC or Infiniband). In an Oracle Real Application Clusters (Oracle RAC) environment, you need an additional private port for the interconnect between the nodes that you should not include when sizing the system for I/O throughput. The interconnect must be sized separately, taking into account factors such as internode parallel execution.
Wires that connect the individual components.
Each of the components must provide sufficient I/O bandwidth to ensure a well-balanced I/O system. The initial throughput you estimated and the hardware specifications from the vendors are the basis to determine the quantities of the individual components you need. Use the conversion in Table 2-1 to convert the vendors' maximum throughput numbers in bits into sustainable throughput in bytes.
Table 2-1 Throughput Performance Conversion
Component | Bits | Bytes Per Second |
---|---|---|
HBA |
2 GB |
200 MB |
16 Port Switch |
8 * 2 GB |
1200 MB |
Fibre Channel |
2 GB |
200 MB |
GigE NIC |
1 GB |
80 MB |
Inf-2 Gbit |
2 GB |
160 MB |
In addition to having sufficient components to ensure enough I/O bandwidth, the layout of data on the disk is key to success or failure. If you configured the system for sufficient throughput across all disk arrays, but if the data that a query will retrieve is on one disk, then you will not be able to get the required throughput. This is because having only one disk will be the bottleneck. To avoid such a situation, stripe data across as many disks as possible, ideally all disks. A stripe size of 256 KB to 1 MB provides a good balance between multiblock read operations and data spread across multiple disks.
Before you install Oracle Database, verify your setup on the hardware and operating-system level. The key point to understand is that if the operating system cannot deliver the performance and throughput you need, Oracle Database will not perform according to your requirements. Two tools for verifying throughput are the dd
utility and Orion, an Oracle-supplied tool.
A very basic way to validate the operating system throughput on UNIX or Linux systems is to use the dd
utility. The dd
utility is a common Unix program whose primary purpose is the low-level copying and conversion of raw data. Because there is almost no overhead involved with the dd utility, the output provides a reliable calibration. Oracle Database can reach a maximum throughput of approximately 90 percent of what the dd
utility can achieve.
First, the most important options for using dd
are:
bs=BYTES: Read BYTES bytes at a time; use 1 MB count=BLOCKS: copy only BLOCKS input blocks if=FILE: read from FILE; set to your device of=FILE: write to FILE; set to /dev/null to evaluate read performance; write to disk would erase all existing data!!! skip=BLOCKS: skip BLOCKS BYTES-sized blocks at start of input
To estimate the maximum throughput Oracle Database will be able to achieve, you can mimic a workload of a typical data warehouse application, which consists of large, random sequential disk access.
The following dd
command performs random sequential disk access across two devices reading a total of 2 GB. The throughput is 2 GB divided by the time it takes to finish the following command:
dd bs=1048576 count=200 if=/raw/data_1 of=/dev/null & dd bs=1048576 count=200 skip=200 if=/raw/data_1 of=/dev/null & dd bs=1048576 count=200 skip=400 if=/raw/data_1 of=/dev/null & dd bs=1048576 count=200 skip=600 if=/raw/data_1 of=/dev/null & dd bs=1048576 count=200 skip=800 if=/raw/data_1 of=/dev/null & dd bs=1048576 count=200 if=/raw/data_2 of=/dev/null & dd bs=1048576 count=200 skip=200 if=/raw/data_2 of=/dev/null & dd bs=1048576 count=200 skip=400 if=/raw/data_2 of=/dev/null & dd bs=1048576 count=200 skip=600 if=/raw/data_2 of=/dev/null & dd bs=1048576 count=200 skip=800 if=/raw/data_2 of=/dev/null &
In your test, include all the storage devices that you plan to include for your database storage. When you configure a clustered environment, you run dd
commands from every node.
Orion is a tool that Oracle provides to mimic a typical workload on a database system to calibrate the throughput. Compared to the dd
utility, Orion provides the following advantages:
Orion's simulation is closer to the workload the database will produce.
Orion enables you to perform reliable write and read simulations within one simulation.
Oracle recommends you use Orion to verify the maximum achievable throughput, even if a database has already been installed.
The types of supported I/O workloads are as follows:
Small and random
Large and sequential
Large and random
Mixed workloads
For each type of workload, Orion can run tests at different levels of I/O load to measure performance metrics such as MB per second, I/O per second, and I/O latency. A data warehouse workload is typically characterized by sequential I/O throughput, issued by multiple processes. You can run different I/O simulations depending upon which type of system you plan to build. Examples are the following:
Daily workloads when users or applications query the system
The data load when users may or may not access the system
Index and materialized view builds
Backup operations
After you set up your environment and install Oracle Database software, ensure that you have the database parameters set correctly. Note that there are not many database parameters that must be set.
As a general guideline, avoid changing a database parameter unless you have good reason to do so. You can use Oracle Enterprise Manager to set up your data warehouse. To view various parameter settings, go to the Database page, then click Server. Under Database Configuration, click Memory Parameters or All Inititalization Parameters.
Oracle Database memory has the following components:
Shared memory: Also called the system global area (SGA), this is the memory used by the Oracle instance.
Session-based memory: Also called program global area (PGA), this is the memory that is occupied by sessions in the database. It is used to perform database operations, such as sorts and aggregations.
Oracle Database can automatically tune the distribution of the memory components in two memory areas. You have a choice between two mutually exclusive options:
Set MEMORY_TARGET
and MEMORY_MAX_TARGET
Set SGA_TARGET
and PGA_AGGREGATE_TARGET
If you choose the first option, then you need not set other parameters. The database manages all memory for you. If you choose the second option, then you must specify a size for the SGA and a size for the PGA. The database does the rest.
The PGA_AGGREGATE_TARGET
parameter is the target amount of memory that you want the total PGA across all sessions to use. As a starting point, you can use the following formula to define the PGA_AGGREGATE_TARGET
value:
PGA_AGGREGATE_TARGET
= 3 * SGA_TARGET
.
If you do not have enough physical memory for the PGA_AGGREGATE_TARGET
to fit in memory, then reduce PGA_AGGREGATE_TARGET
.
MEMORY_TARGET
and MEMORY_MAX_TARGET
The MEMORY_TARGET
parameter enables you to set a target memory size and the related initialization parameter, MEMORY_MAX_TARGET
, sets a maximum target memory size. The database then tunes to the target memory size, redistributing memory as needed between the system global area (SGA) and aggregate program global area (PGA). Because the target memory initialization parameter is dynamic, you can change the target memory size at any time without restarting the database. The maximum memory size acts as an upper limit so that you cannot accidentally set the target memory size too high. Because certain SGA components either cannot easily shrink or must remain at a minimum size, the database also prevents you from setting the target memory size too low.
You can set an initialization parameter by issuing an ALTER
SYSTEM
statement, as follows:
ALTER SYSTEM SET SGA_TARGET = 1024M;
A good starting point for a data warehouse is the data warehouse template database that you can select when you run the Database Configuration Assistant (DBCA). However, any database will be acceptable as long as you ensure you take the following initialization parameters into account:
The COMPATIBLE
parameter identifies the level of compatibility that the database has with earlier releases. To benefit from the latest features, set the COMPATIBLE
parameter to your database release number.
To benefit from advanced cost-based optimizer features such as query rewrite, ensure that the OPTIMIZER_FEATURES_ENABLE
parameter is set to the value of the current database version.
The default value of the DB_BLOCK_SIZE parameter is 8 KB, and appropriate for most data warehousing needs. If you intend to use table compression, then consider a larger block size.
The DB_FILE_MULTIBLOCK_READ_COUNT
parameter enables reading several database blocks in a single operating-system read call. Because a typical workload on a data warehouse consists of many sequential I/Os, ensure you can take advantage of fewer large I/Os as opposed to many small I/Os. When setting this parameter, take into account the block size and the maximum I/O size of the operating system, and use the following formula:
DB_FILE_MULTIBLOCK_READ_COUNT * DB_BLOCK_SIZE = <maximum operating system I/O size>
Maximum operating-system I/O sizes vary between 64 KB and 1 MB.
The PARALLEL_MAX_SERVERS
parameter sets a resource limit on the maximum number of processes available for parallel execution. Parallel operations need at most twice the number of query server processes as the maximum degree of parallelism (DOP) attributed to any table in the operation.
Oracle Database sets the PARALLEL_MAX_SERVERS
parameter to a default value that is sufficient for most systems. The default value for the PARALLEL_MAX_SERVERS
parameter is as follows:
(CPU_COUNT x PARALLEL_THREADS_PER_CPU x (2 if PGA_AGGREGATE_TARGET > 0; otherwise 1) x 5)
This value might not be enough for parallel queries on tables with higher DOP attributes. Oracle recommends users who expect to run queries of higher DOP to set PARALLEL_MAX_SERVERS
as follows:
2 x DOP x <number_of_concurrent_users>
For example, setting the PARALLEL_MAX_SERVERS
parameter to 64 will allow you to run four parallel queries simultaneously, assuming that each query is using two slave sets with a DOP of eight for each set.
If the hardware system is neither CPU-bound nor I/O bound, then you can increase the number of concurrent parallel execution users on the system by adding more query server processes. When the system becomes CPU-bound or I/O-bound, however, adding more concurrent users becomes detrimental to the overall performance. Careful setting of the PARALLEL_MAX_SERVERS
parameter is an effective method of restricting the number of concurrent parallel operations.
The PARALLEL_ADAPTIVE_MULTI_USER
parameter, which can be TRUE
or FALSE
, defines whether or not the server will use an algorithm to dynamically determine the degree of parallelism for a particular statement depending on the current workload. To use this feature, set PARALLEL_ADAPTIVE_MULTI_USER
to TRUE
.
To take advantage of query rewrite against materialized views, you must set the QUERY_REWRITE_ENABLED
parameter to TRUE
. This parameter defaults to TRUE
.
The default for the QUERY_REWRITE_INTEGRITY
parameter is ENFORCED
. The database will rewrite queries against only up-to-date materialized views, if it can base itself on enabled and validated primary, unique, and foreign key constraints.
In TRUSTED
mode, the optimizer trusts that the data in the materialized views is current, and the hierarchical relationships declared in dimensions and RELY
constraints are correct.
To take advantage of highly optimized star transformations, set the STAR_TRANSFORMATION_ENABLED
parameter to TRUE
.
Oracle Warehouse Builder (OWB) enables you to design and deploy various types of data management strategies, including traditional data warehouses.
To enable OWB:
Ensure that you have access to either Oracle Database Enterprise Edition or Standard Edition.
Oracle Database 11g comes with Warehouse Builder server components preinstalled. This includes a schema for the Warehouse Builder repository.
To use the default Warehouse Builder schema installed in Oracle Database, first unlock the schema as follows:
Connect to SQL*Plus as the SYS
or SYSDBA
user and enter the following commands:
SQL> ALTER USER OWBSYS ACCOUNT UNLOCK;
SQL> ALTER USER OWBSYS IDENTIFIED BY
owbsys_passwd;
Start the Warehouse Builder Design Center.
For Windows, select Start, Programs, Oracle, Warehouse Builder, and then Design Center.
For UNIX and Linux, locate owb home
/owb/bin/unix and then run owbclient.sh
Define a workspace and assign a user to the workspace.
In the single Warehouse Builder repository, you can define multiple workspaces with each workspace corresponding to a set of users working on related projects. For instance, you could create a workspace for each of the following environments: development, test, and production.
For simplicity, create one workspace called MY_WORKSPACE and assign a user.
In the Design Center dialog box, click Show Details and then Workspace Management.
The Repository Assistant appears.
Follow the prompts and accept the default settings in the Repository Assistant to create a workspace and assign a user as the workspace owner.
Log in to the Design Center with the user name and password you created.
In subsequent topics, this guide uses exercises from Oracle By Example (OBE) series for Oracle Warehouse Builder to show how to consolidate data from multiple flat file sources, transform the data, and load it into a new relational target.
The exercises and examples are available on Oracle Technology Network (OTN) at http://www.oracle.com/webfolder/technetwork/tutorials/obe/db/10g/r2/owb/owb10gr2_obe_series/owb10g.htm
. To help you learn the product, the demonstration provides you with flat file data and scripts that create various Warehouse Builder objects. The OBE pages provide additional information about OWB and the latest information about the exercises.
To perform the Warehouse Builder exercises presented in this guide:
Download the demonstration.
Go to the location for OWB examples, which is available on OTN from the following location:
Click the link for the Oracle By Example (OBE) set for the latest release.
The demonstration is a set of files in a ZIP archive called owbdemo_files.zip
.
The ZIP archive includes a SQL script, two files in comma-separated values format, and scripts written in Tcl.
(Optional) Download the xsales.zip
file from the same link, which includ
ves XSALES table data.
Edit the script owbdemoinit.tcl.
The script owbdemoinit.tcl
defines and sets variables used by the other tcl scripts. Edit the following variables to match the values in your computer environment:
set tempspace TEMP
set owbclientpwd workspace_owner
set sysuser sys
set syspwd pwd
set host hostname
set port portnumber
set service servicename
set project owb_project_name
set owbclient workspace_owner
set sourcedir drive:/newowbdemo
set indexspace USERS
set dataspace USERS
set snapspace USERS
set sqlpath drive:/
oracle/11.1.0/db_1/BIN
set sid servicename
Execute the Tcl scripts from the Warehouse Builder scripting utility, OMB Plus.
For Windows, select Start, Programs, Oracle, Warehouse Builder, and then OMB*Plus.
For UNIX, locate owb home
/owb/bin/unix and then execute OMBPlus.sh
At the OMB+> prompt, enter the following command to change to the directory that contains the scripts:
cd
drive
:\\newowbdemo\\
Run all of the Tcl scripts in the desired sequence by entering the following command:
source loadall.tcl
Start the Design Center and log in to it as the workspace owner, using the credentials you specified in the script owbdemoinit.tcl
.
Verify that you successfully set up the Warehouse Builder client to follow the demonstration.
In the Design Center, expand the Locations node in the Locations Navigator. Expand Databases and then Oracle. The Oracle node includes the following locations:
OWB_REPOSITORY
SALES_WH_LOCATION
When you successfully install the Warehouse Builder demonstration, the Design Center displays with an Oracle module named EXPENSE_WH
.
2 Day + Data Warehousing Guide
11g Release 2 (11.2)
E25555-03
February 2012
Oracle Database 2 Day + Data Warehousing Guide, 11g Release 2 (11.2)
E25555-03
Copyright © 2007, 2012, Oracle and/or its affiliates. All rights reserved.
This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited.
The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing.
If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, the following notice is applicable:
U.S. GOVERNMENT RIGHTS Programs, software, databases, and related documentation and technical data delivered to U.S. Government customers are "commercial computer software" or "commercial technical data" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, the use, duplication, disclosure, modification, and adaptation shall be subject to the restrictions and license terms set forth in the applicable Government contract, and, to the extent applicable by the terms of the Government contract, the additional rights set forth in FAR 52.227-19, Commercial Computer Software License (December 2007). Oracle America, Inc., 500 Oracle Parkway, Redwood City, CA 94065.
This software or hardware is developed for general use in a variety of information management applications. It is not developed or intended for use in any inherently dangerous applications, including applications that may create a risk of personal injury. If you use this software or hardware in dangerous applications, then you shall be responsible to take all appropriate fail-safe, backup, redundancy, and other measures to ensure its safe use. Oracle Corporation and its affiliates disclaim any liability for any damages caused by use of this software or hardware in dangerous applications.
Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. AMD, Opteron, the AMD logo, and the AMD Opteron logo are trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registered trademark of The Open Group.
This software or hardware and documentation may provide access to or information on content, products, and services from third parties. Oracle Corporation and its affiliates are not responsible for and expressly disclaim all warranties of any kind with respect to third-party content, products, and services. Oracle Corporation and its affiliates will not be responsible for any loss, costs, or damages incurred due to your access to or use of third-party content, products, or services.
After you create and import data object definitions in Oracle Warehouse Builder, you can design extraction, transformation, and loading (ETL) operations that move data from sources to targets. In Warehouse Builder, you design these operations in a mapping.
This chapter contains the following topics:
Mappings describe a series of operations that extract data from sources, transform it, and load it into targets. Mappings provide a visual representation of the flow of the data and the operations performed on the data. When you design a mapping in Warehouse Builder, you use the Mapping Editor interface.
The basic design element for a mapping is the operator. Use operators to represent sources and targets in the data flow. Also use operators to define how to transform the data from source to target. The operators you select as sources have an impact on how you design the mapping. Based on the operators you select, Warehouse Builder assigns the mapping to one of the following Mapping Generation Languages:
PL/SQL
SQL*Loader
ABAP
Each of these code languages require you to adhere to certain rules when designing a mapping.
This guide illustrates how to define a PL/SQL mapping. To define the other types of mappings, see Oracle Warehouse Builder Data Modeling, ETL, and Data Quality Guide. A basic rule for defining a PL/SQL mapping is that PL/SQL mappings can contain any type of source operator other a Flat File operator or a SAP/R3 source.
To define a mapping, refer to the following sections:
Configuring Mappings Reference in the Warehouse Builder Online Help
When you are satisfied with the mapping design, generate the code by selecting the Generate icon in the toolbar.
Subsequent Steps
After you design a mapping and generate its code, you can next create a process flow or proceed directly with deployment followed by execution.
Use process flows to interrelate mappings. For example, you can design a process flow such that the completion of one mapping triggers an email notification and launches another mapping.
Deploy the mapping, and any associated process flows you created, and then execute the mapping.
To create a mapping:
Go to the Mappings node in the Projects Navigator. This node is located under a warehouse target module, under the Databases folder, under the Oracle folder.
Figure 5-1 Mappings Node on the Projects Navigator
Right-click Mappings and then select New Mapping.
Warehouse Builder opens the Create Mapping dialog box.
Enter a name and an optional description for the new mapping.
Select Help to review the rules on naming and describing mappings.
Click OK.
Warehouse Builder stores the definition for the mapping and inserts its name in the Projects Navigator. Warehouse Builder opens a mapping editor for the mapping and displays the name of the mapping in the title bar.
To open a previously created mapping:
From the Projects Navigator, locate a warehouse target module under the Databases folder and then under the Oracle Database folder.
Expand the Mappings node.
Open the Mapping Editor in one of the following ways:
Double-click a mapping.
Select a mapping and then from the File menu, select Open.
Select a mapping and press Ctrl + O.
Right-click a mapping and select Open.
Warehouse Builder displays the Mapping Editor.
As you design a mapping, you select operators from the Mapping Editor palette and drag them onto the canvas.
Oracle source and target operators: Use these operators to represent Oracle Database objects such as Oracle tables, views, materialized views.
Remote and non-Oracle source and target Operators: The use of these operator have special requirements.
Data flow operators: Use data flow operators to transform data.
Pre/Post Processing operators: Use these operators to perform processing before or after executing a mapping. The Mapping parameter operator is used to provide values to and from a mapping.
Pluggable mapping operators: A pluggable mapping is a reusable grouping of mapping operators that acts as a single operator.
The steps you take to add an operator to a mapping depend on the type of operator you select. This is because some operators are bound to workspace objects while others are not. As a general rule, when you add a data source or target operator, Warehouse Builder creates and maintains a version of that object in the Warehouse Builder workspace and a separate version for the Mapping Editor. For example, when you add a table operator to a mapping, Warehouse Builder maintains a separate copy of the table in the workspace. The separate versions are said to be bound together. That is, the version in the mapping is bound to the version in the workspace.
To distinguish between the two versions, this section refers to objects in the workspace either generically as workspace objects or specifically as workspace tables, workspace views, and so on. And this section refers to operators in the mapping as table operators, view operators, and so on. Therefore, when you add a dimension to a mapping, refer to the dimension in the mapping as the dimension operator and refer to the dimension in the workspace as the workspace dimension.
Warehouse Builder maintains separate workspace objects for some operators so that you can synchronize changing definitions of these objects. For example, when you re-import a new metadata definition for the workspace table, you may want to propagate those changes to the table operator in the mapping. Conversely, as you make changes to a table operator in a mapping, you may want to propagate those changes back to its associated workspace table. You can accomplish these tasks by a process known as synchronizing. In Warehouse Builder, you can synchronize automatically. Alternatively, synchronize manually from within the Mapping Editor.
To add an operator to a mapping:
Open the Mapping Editor.
From the Graph menu, select Add and select an operator. Alternatively, you can drag an operator icon from the Component Palette and drop it onto the Mapping Editor canvas.
If you select an operator that you can bind to a workspace object, the Mapping Editor displays the Add Mapping operator_name dialog box. For information about how to use this dialog box, click Help.
If you select an operator that you cannot bind to a workspace object, Warehouse Builder may display a wizard or dialog box to assist you in creating the operator.
Follow any prompts Warehouse Builder displays and click OK.
The Mapping Editor displays the operator maximized on the canvas. The operator name appears in the upper left corner. You can view each attribute name and data type.
If you want to minimize the operator, click the arrow in the upper right corner and the Mapping Editor displays the operator as an icon on the canvas.
Figure 5-2 Mapping Editor Showing a Table Operator Source
When you add an operator that you can bind to a workspace object, the Mapping Editor displays the Add Mapping operator_name dialog box. Select one of the following options:
Use this option when you want to use the Mapping Editor to define a new workspace object such as a new staging area table or a new target table.
After you select Create Unbound Operator with No Attributes, type a name for the new object. Warehouse Builder displays the operator on the canvas without any attributes.
You can now add and define attributes for the operator as described in "Editing Operators". Next, to create the new workspace object in a target module, right-click the operator and select Create and Bind.
For an example about how to use this option in a mapping design, see "Example: Using the Mapping Editor to Create Staging Area Tables".
Use this option when you want to add an operator based on an object you previously defined or imported into the workspace.
Either type the prefix to search for the object or select from the displayed list of objects within the selected module.
To select multiple items, press the Control key as you click each item. To select a group of items located in a series, click the first object in your selection range, press the Shift key, and then click the last object.
You can add operators based on workspace objects within the same module as the mapping or from other modules. If you select a workspace object from another module, the Mapping Editor creates a connector if one does not already exist. The connector establishes a path for moving data between the mapping location and the location of the workspace object.
Each operator has an editor associated with it. Use the operator editor to specify general and structural information for operators, groups, and attributes. In the operator editor you can add, remove, or rename groups and attributes. You can also rename an operator.
Editing operators is different from assigning loading properties and conditional behaviors. To specify loading properties and conditional behaviors, use the properties windows as described in "Setting Operator, Group, and Attribute Properties".
To edit an operator, group, or attribute:
Select an operator from the Mapping Editor canvas or select any group or attribute within an operator.
Right-click and select Open Details.
The Mapping Editor displays the operator editor with the Name Tab, Groups Tab, and Input and Output Tabs for each type of group in the operator.
Some operators include additional tabs. For example, the Match Merge operator includes tabs for defining Match rules and Merge rules.
Follow the prompts on each tab and click OK when you are finished.
Select Help if you need additional information for completing a tab.
After you select mapping source operators, operators that transform data, and target operators, you are ready to connect them. Data flow connections graphically represent how the data flows from a source, through operators, and to a target.
You can connect operators by one of the following methods:
Connecting Operators: Define criteria for connecting groups between two operators.
Connecting Groups: Define criteria for connecting all the attributes between two groups.
Connecting Attributes: Connect individual operator attributes to each other one at a time.
Using an Operator Wizard: For operators such as the Pivot operator and Name-Address operator, you can use the wizard to define data flow connections.
You can connect one operator to another if there are no existing connection between the operators. Both of the operators that you want to connect must be displayed in their icon form. You can also connect from a group to an operator. Hold down the left-mouse button on the group, drag and then drop on the title of the operator.
To connect one operator to another:
Select the operator from which you want to establish a connection.
Click and hold down the left mouse button while the pointer is positioned over the operator icon.
Drag the mouse away from the operator and toward the operator icon to which you want to establish a connection.
Release the mouse button over the target operator.
The Mapping Connection dialog box is displayed.
In the Attribute Group to Connect section, select values for the following:
Source Group: Select the group, from the source operator, which must be connected to the target operator.
Target Group: Select the group, from the target operator, to which the source group must be mapped.
In the Connection Options section, select the method to be used to connect the source attributes to the target attributes and click Preview.
Click OK to close the Mapping Connection Dialog box.
When you connect groups, the Mapping Editor assists you by either automatically copying the attributes or displaying the Mapping Connection Dialog box.
To connect one group to another:
Select the group from which you want to establish a connection.
Click and hold down the left mouse button while the pointer is positioned over the group.
Drag the mouse away from the group and towards the group to which you want to establish a connection.
Release the mouse button over the target group.
If you connect from an operator group to a target group containing attributes, the Mapping Connection Dialog Box is displayed.
In the Connection Options section, select the method used to connect the source attributes to the target attributes and click Preview.
Click OK to close the Mapping Connection Dialog box.
If you connect from one operator group to a target group with no existing attributes, the Mapping Editor automatically copies the attributes and connects the attributes. This is useful for designing mappings such shown in "Example: Using the Mapping Editor to Create Staging Area Tables".
You can use the Mapping Editor with an unbound table operator to quickly create staging area tables.
The following instructions describe how to create a staging table based on an existing source table. You can also use these instructions to create views, materialized views, flat files, and transformations.
To map a source table to a staging table:
In the Mapping Editor, add a source table.
From the menu bar, select Mapping, select Add, then select Data Sources/Targets. In the Data Sources/Targets menu, select Table Operator.
Use the Add Table Operator dialog box to select and bind the source table operator in the mapping. From the Add Table Operator dialog box, select Create unbound operator with no attributes.
Figure 5-3 Unbound Staging Table without Attributes and Source Table
With the mouse button positioned over the group in the source operator, click and hold down the mouse button.
Drag the mouse to the staging area table group.
Warehouse Builder copies the source attributes to the staging area table and connects the two operators.
In the Mapping Editor, select the unbound table you added to the mapping. Right-click and select Create and Bind.
In Create in, specify the target module in which to create the table.
Warehouse Builder creates the new table in the target module you specify.
You can draw a line from a single output attribute of one operator to a single input attribute of another operator.
To connect attributes:
Click and hold down the mouse button while the pointer is positioned over an output attribute.
Drag the mouse away from the output attribute and toward the input attribute to which you want data to flow.
As you drag the mouse, a line appears on the Mapping Editor canvas to indicate a connection.
Release the mouse over the input attribute.
Repeat steps 1 through 3 until you create all the required data flow connections.
Figure 5-5 Connected Operators in a Mapping
When connecting attributes, keep the following rules in mind:
You cannot connect to the same input attribute twice.
You cannot connect attributes within the same operator.
You cannot connect out of an input only attribute nor can you connect into an output only attribute.
You cannot connect operators in such a way as to contradict an established cardinality. Instead, use a Joiner operator.
When you select an object on the canvas, the editor displays its associated properties in the property inspector along the left side.
Figure 5-6 Property Inspector for a Table Operator
You can view and set the following types of properties:
Operator properties: Properties that affect the entire operator. The properties you can set depend upon the operator type.
Group properties: Properties that affect a group of attributes. Most operators do not have properties for their groups. Examples of operators that do have group properties include the splitter operator and the deduplicator.
Attribute properties: Properties that pertain to attributes in source and target operators. Examples of attribute properties are data type, precision, and scale.
Many of the operators you use in a mapping have corresponding definitions in the Warehouse Builder workspace. This is true of source and target operators such as table and view operators. This is also true of other operators such as sequence and transformation operators whose definitions you may want to use across multiple mappings. As you make changes to these operators, you may want to propagate those changes back to the workspace object.
You have the following choices in deciding the direction in which you propagate changes:
Synchronizing from a Workspace Object to an Operator: After you begin using mappings in a production environment, there may be changes to the sources or targets that impact your ETL designs. Typically, the best way to manage these changes is through the Warehouse Builder Dependency Manager described in the Warehouse Builder Online Help. Use the Dependency Manager to automatically evaluate the impact of changes and to synchronize all effected mappings at one time. Alternatively, in the Mapping Editor, you can manually synchronize objects as described in "Synchronizing from a Workspace Object to an Operator".
Synchronizing from an Operator to a Workspace Object: When you make changes to an operator in a mapping, you may want to propagate those changes to its corresponding workspace definition. For example, the sources you imported and used in a mapping may have complex physical names for its attributes.
Note that synchronizing is different from refreshing. The refresh command ensures that you are up-to-date with changes made by other users in a multiuser environment. Synchronizing matches operators with their corresponding workspace objects.
To synchronize, select a single operator and synchronize it with the definition of a specified workspace object.
To synchronize an operator:
Select an operator on the Mapping Editor canvas.
From the Edit menu, select Synchronize or right-click the header of the operator, and select Synchronize.
By default, Warehouse Builder selects the option for you to synchronize your selected operator with its associated object in the workspace. You can accept the default or select another workspace object from the list box.
In this step you also specify either Synchronizing from a Workspace Object to an Operator or select the option for Synchronizing from an Operator to a Workspace Object.
As an optional step, click Advanced to set the Matching Strategies.
Select Help for instruction on how to use the Matching Strategies.
Click OK.
In the Mapping Editor, you can synchronize from a workspace object for any of the following reasons:
Manually propagate changes: Propagate changes you made in a workspace object to its associated operator. Changes to the workspace object can include structural changes, attribute name changes, attribute data type changes. To automatically propagate changes in a workspace object across multiple mappings, see in the Warehouse Builder Online Help.
Synchronize an operator with a new workspace object: You can associate an operator with a new workspace object if, for example, you migrate mappings from one version of a data warehouse to a later version and maintain different object definitions for each version.
Figure 5-8 Synchronizing from a Different Workspace Object
Prototype mappings using tables: When working in the design environment, you could choose to design the ETL logic using tables. However, for production, you may want to the mappings to source other workspace object types such aus views, materialized views, or cubes.
Table 5-1 lists operators and the types of workspace objects from which you can synchronize.
Table 5-1 Operators Synchronized with Workspace Objects
To: Operator | From: Workspace Object Type |
---|---|
Cube |
Tables, Views, Materialized Views, Flat Files, Dimensions and Cubes |
Dimension |
Tables, External Tables, Views, Materialized Views, Flat Files, Dimensions and Cubes |
External Table |
Tables, External Tables, Views, Materialized Views, Flat Files, Dimensions and Cubes |
Flat File |
Tables, External Tables, Views, Materialized Views, Flat Files, Dimensions and Cubes |
Key Lookup |
Tables only |
Materialized View |
Tables, External Tables, Views, Materialized Views, Files, Dimensions and Cubes |
Post Mapping Process |
Transformations only |
Pre Mapping Process |
Transformations only |
Sequence |
Sequences only |
Table |
Tables, External Tables, Views, Materialized Views, Flat Files, Dimensions and Cubes |
Transformation |
Transformations only |
View |
Tables, External Tables, Views, Materialized Views, Files, Dimensions and Cubes |
Note that when you synchronize from an external table operator, Warehouse Builder updates the operator based on the workspace external table only and not its associated flat file.
As you make changes to operators in a mapping, you may want to propagate those changes back to a workspace object. By synchronizing, you can propagate changes from the following operators: tables, views, materialized views, transformations, and flat file operators.
Synchronize from the operator to a workspace object for any of the following reasons:
Propagate changes: Propagate changes you made in an operator to its associated workspace object. When you rename the business name for an operator or attribute, Warehouse Builder propagates the first 30 characters of the business name as the bound name.
Replace workspace objects: Synchronize to replace an existing workspace object.
Synchronizing from an operator has no impact on the dependent relationship between other operators and the workspace objects.
This chapter describes how to use Oracle Warehouse Builder to import metadata.
This chapter contains the following topics:
In general, the source systems for a data warehouse are typically transaction processing applications. A sales analysis data warehouse, for instance, extracts data from an order entry system that records current order activities.Designing the data extraction process can be problematic. If the source system is complex and poorly documented, then determining which data to extract can be difficult. Moreover, it is usually not possible to modify the source system, nor adjust its performance or availability. To address these problems, first import the metadata.
Metadata is the data that describes the contents of a given object in a data set. The metadata for a table, for instance, indicates the data type for each column.
For Oracle Database customers, the recommended tool of choice for importing metadata is Oracle Warehouse Builder (OWB). After you import the metadata into Warehouse Builder, you can annotate the metadata and design an extraction strategy independently from the transaction processing application.
To import metadata:
Complete the instructions for "Accessing Oracle Warehouse Builder".
Download and install the Oracle Warehouse Builder demonstration as described in "Installing the Oracle Warehouse Builder Demonstration".
Identify the Warehouse Builder project.
See "About Workspaces, Projects, and Other Devices in Warehouse Builder".
Follow along with the "Example: Importing Metadata from Flat Files".
The example explains how to specify where the source files are located and how to start the Metadata Import Wizard. The process for importing data objects such as files, tables, and views is the same. Therefore, after you complete this example, you will have a general understanding of how to import all data objects into Warehouse Builder.
After you install the Warehouse Builder demonstration and start the Design Center, you log in to a workspace. The user name and workspace name are displayed along the top of the Design Center.
Recall that a workspace includes a set of users working on related projects. Security is an important consideration for determining how many workspaces to create. A common model is to create separate workspaces for development, testing, and production. Using this model, you can allow users such as your developers access to the development and testing workspaces but restrict them from the production workspace.
You can optionally divide a workspace into projects. In practice, however, workspaces typically contain only one active project. This is because a project is simply a container and not suitable for implementing security or establishing subject-oriented groupings. Security is implemented through workspaces. Establishing subject-oriented groupings can be accomplished through modules, as discussed later.
A project contains the sets of metadata related to an initiative. For data warehousing, therefore, include all the metadata for sources and targets in the same project. Also include all the functions, procedures, transformations, mappings, and other objects required to implement your initiative. The project contains nodes for each type of object that you can either create or import into Warehouse Builder. Expand the different nodes to gain a general understanding of the types of objects you can create or import.
In the demonstration, the Projects Navigator is displayed on the left side and includes two projects. MY_PROJECT
is a default, pre-seeded project. You can use MY_PROJECT
as your single active project in the workspace. For the purposes of the demonstration, the OWB_DEMO
project is used.
This example describes how to import metadata from flat files. Specifically, our objective is to import the metadata into the OWB_DEMO
project such that the two files, export.csv
and expense_categories.csv
, display in the Projects Navigator under the Files node.
To import metadata from flat files:
Indicate where the flat files are located as described in "Specifying Locations for the Flat Files".
Organize OWB_DEMO to receive the incoming flat file metadata as described in "Creating Modules in the Project".
Indicate which files to import as described in "Starting the Import Metadata Wizard".
Specify the metadata structure as described in "Using the Flat File Sample Wizard".
Import the metadata for both flat files as described in "Importing the Flat File Data".
Indicate where the flat files are located.
In the Design Center, on the left side is a tab called Locations that contains a node called Locations. Use the Locations node to indicate where your source data resides.
Expand the Locations node and the nodes within it to gain a general understanding of the types of source and targets you can access from Warehouse Builder
For this example, right-click the Files node and select New File System Location to define a location for the flat files.
Follow the prompts in the Create File System Location dialog box. Each location you define corresponds to a specific directory on your computer file system. Therefore, consider naming the location based on the drive and directory. For the purposes of this demonstration, name the location C_NEWOWBDEMO_SOURCEFILES.
In the Projects Navigator, organize OWB_DEMO
to receive the incoming flat file metadata.
In a data warehousing implementation, you are likely to have numerous source and target objects. As a means of organizing these various objects, Warehouse Builder requires you to create modules. Modules enable you to establish subject-oriented groupings. Furthermore, each module corresponds to a location that you create in the Locations Navigator.
In this example, you create a module to contain company sales data. Because you have only one location for the two flat files, you create one module in the Projects Navigator. Right-click the Files node under OWB_DEMO
and select New Flat File Module. Name the new module SALES_EXPENSES
. For its location, specify the location you defined in the previous step, C_NEWOWBDEMO_SOURCEFILES.
Start the Import Metadata Wizard.
Right-click the module SALES_EXPENSES, select New, and follow the prompts in the Import Metadata Wizard. The prompts in the wizard vary according to the type of module you selected and therefore the type of data object you are importing.
In this example, you selected to import two flat files. On the summary page of the Import Metadata Wizard, select one of the files and then select Sample to launch the Flat File Sample Wizard.
In the next steps, you sample each file in turn and then select Finish on this page to import the metadata.
Follow the prompts in the Flat File Sample Wizard to specify the metadata structure.
Based on the number of characters you specify to be sampled, the wizard reads the flat file data and provides you with suggestions as to the structure of the metadata. If the sample size is too small, the wizard may misinterpret the data and make invalid suggestions. Accordingly, you can modify and refine the settings in the wizard.
For the purposes of this example, the wizard correctly determines that the file is delimited, includes a single record type, and the character set is WE8MSWIN1252. Accept all the default settings presented in the Flat File Wizard.
To become familiar with the various types of files the wizard can sample, notice the options on the wizard pages and also select Help for additional insights.
After sampling the first flat file, return to the Summary and Import page of Metadata Import Wizard to sample the second file.
Accept the default setting in the Flat File Wizard as you did for the previous file.
Import the metadata for both flat files.
Return again to the Summary and Import page and select Finish.
When you select Finish, the wizard imports the data based on the selections you made when sampling the data. The two comma separated files now display under the SALES_EXPENSES
module which is under the Files node in OWB_DEMO
project.
This chapter describes how to identify and reduce performance issues and contains the following topics:
An important aspect of ensuring that your system performs well is to eliminate performance problems. This section describes some methods of finding and eliminating these bottlenecks, and contains the following topics:
Optimizer statistics are a collection of data that describes more details about the database and the objects in the database. These statistics are stored in the data dictionary and are used by the query optimizer to choose the best execution plan for each SQL statement. Optimizer statistics include the following:
Table statistics (number of rows, blocks, and the average row length)
Column statistics (number of distinct values in a column, number of null values in a column, and data distribution)
Index statistics (number of leaf blocks, levels, and clustering factor)
The optimizer statistics are stored in the data dictionary. They can be viewed using data dictionary views similar to the following:
SELECT * FROM DBA_TAB_STATISTICS;
Because the objects in a database can constantly change, statistics must be regularly updated so that they accurately describe these database objects. Statistics are maintained automatically by Oracle Database, or you can maintain the optimizer statistics manually using the DBMS_STATS
package.
To execute a SQL statement, Oracle Database may perform many steps. Each of these steps either retrieves rows of data physically from the database or prepares them in some way for the user issuing the statement. The combination of the steps Oracle Database uses to execute a statement is called an execution plan. An execution plan includes an access path for each table that the statement accesses and an ordering of the tables (the join order) with the appropriate join method.
You can examine the execution plan chosen by the optimizer for a SQL statement by using the EXPLAIN
PLAN
statement. When the statement is issued, the optimizer chooses an execution plan and then inserts data describing the plan into a database table. Issue the EXPLAIN
PLAN
statement and then query the output table.
General guidelines for using the EXPLAIN
PLAN
statement are:
To use the SQL script UTLXPLAN.SQL
to create a sample output table called PLAN_TABLE
in your schema.
To include the EXPLAIN
PLAN
FOR
clause before the SQL statement.
After issuing the EXPLAIN
PLAN
statement, to use one of the scripts or packages provided by Oracle Database to display the most recent plan table output.
The execution order in EXPLAIN
PLAN
output begins with the line that is indented farthest to the right. If two lines are indented equally, then the top line is usually executed first.
The following statement shows the output of two EXPLAIN
PLAN
statements, one with dynamic pruning and one with static pruning.
To analyze EXPLAIN PLAN output:
EXPLAIN PLAN FOR SELECT p.prod_name , c.channel_desc , SUM(s.amount_sold) revenue FROM products p , channels c , sales s WHERE s.prod_id = p.prod_id AND s.channel_id = c.channel_id AND s.time_id BETWEEN '01-12-2001' AND '31-12-2001' GROUP BY p.prod_name , c.channel_desc; SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
WITHOUT TO_DATE --------------------------------------------------------------------------------------------------- | Id| Operation | Name |Rows|Bytes|Cost | Time |Pstart|Pstop| (%CPU) --------------------------------------------------------------------------------------------------- | 0|SELECT STATEMENT | | 252|15876|305(1)|00:00:06| | | | 1| HASH GROUP BY | | 252|15876|305(1)|00:00:06| | | | *2| FILTER | | | | | | | | | *3| HASH JOIN | |2255| 138K|304(1)|00:00:06| | | | 4| TABLE ACCESS FULL | PRODUCTS | 72| 2160| 2(0)|00:00:01| | | | 5| MERGE JOIN | |2286|75438|302(1)|00:00:06| | | | 6| TABLE ACCESS BY INDEX ROWID | CHANNELS | 5| 65| 2(0)|00:00:01| | | | 7| INDEX FULL SCAN | CHANNELS_PK | 5| | 1(0)|00:00:01| | | | *8| SORT JOIN | |2286|45720|299(1)|00:00:06| | | | 9| PARTITION RANGE ITERATOR | |2286|45720|298(0)|00:00:06| KEY| KEY| | 10| TABLE ACCESS BY LOCAL INDEX ROWID| SALES |2286|45720|298(0)|00:00:06| KEY| KEY| | 11| BITMAP CONVERSION TO ROWIDS | | | | | | | | |*12| BITMAP INDEX RANGE SCAN |SALES_TIME_BIX| | | | | KEY| KEY| --------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter(TO_DATE('01-12-2001')<=TO_DATE('31-12-2001')) 3 - access("S"."PROD_ID"="P"."PROD_ID") 8 - access("S"."CHANNEL_ID"="C"."CHANNEL_ID") filter("S"."CHANNEL_ID"="C"."CHANNEL_ID") 12 - access("S"."TIME_ID">='01-12-2001' AND "S"."TIME_ID"<='31-12-2001') Note the values of KEY KEY for Pstart and Pstop. WITH TO_DATE -------------------------------------------------------------------------------------------------- |Id| Operation | Name | Rows | Bytes |Cost(%CPU)| Time | Pstart| Pstop | -------------------------------------------------------------------------------------------------- | 0| SELECT STATEMENT | | 252 | 15876 | 31 (20)| 00:00:01 | | | | 1| HASH GROUP BY | | 252 | 15876 | 31 (20)| 00:00:01 | | | |*2| HASH JOIN | | 21717 | 1336K| 28 (11)| 00:00:01 | | | | 3| TABLE ACCESS FULL |PRODUCTS| 72 | 2160 | 2 (0)| 00:00:01 | | | |*4| HASH JOIN | | 21717 | 699K| 26 (12)| 00:00:01 | | | | 5| TABLE ACCESS FULL |CHANNELS| 5 | 65 | 3 (0)| 00:00:01 | | | | 6| PARTITION RANGE SINGLE| | 21717 | 424K| 22 (10)| 00:00:01 | 20 | 20 | |*7| TABLE ACCESS FULL |SALES | 21717 | 424K| 22 (10)| 00:00:01 | 20 | 20 | --------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("S"."PROD_ID"="P"."PROD_ID") 4 - access("S"."CHANNEL_ID"="C"."CHANNEL_ID") 7 - filter("S"."TIME_ID">=TO_DATE('2001-12-01 00:00:00', 'yyyy-mm-dd hh24:mi:ss') AND "S"."TIME_ID"<=TO_DATE('2001-12-31 00:00:00', 'yyyy-mm-dd hh24:mi:ss')) Note the values of 20 20 for Pstart and Pstop.
The first execution plan shows dynamic pruning using the KEY
values for Pstart
and Pstop
respectively. Dynamic pruning means that the database will have to determine at execution time which partition or partitions to access. With static pruning, the database knows at parse time which partition or partitions to access, which leads to more efficient execution.
You can frequently improve the execution plan by using explicit date conversions. Using explicit date conversions is a best practice for optimal partition pruning and index usage.
Hints enable you to make decisions usually made by the optimizer. As an application developer, you might have information about your data that the optimizer does not know. Hints provide a mechanism to instruct the optimizer to choose a certain query execution plan based on specific criteria.
For example, you might know that a certain index is more selective for certain queries. Based on this information, you might be able to choose a more efficient execution plan than the optimizer. In this case, use hints to instruct the optimizer to use the optimal execution plan.
By default, Oracle Warehouse Builder includes hints to optimize a typical data load.
Suppose you want to quickly run a summary across the sales table for last year while the system is otherwise idle. In this case, you could issue the following statement.
To use a hint to improve data warehouse performance:
SELECT /*+ PARALLEL(s,16) */ SUM(amount_sold) FROM sales s WHERE s.time_id BETWEEN TO_DATE('01-JAN-2005','DD-MON-YYYY') AND TO_DATE('31-DEC-2005','DD-MON-YYYY');
Another common use for hints in data warehouses is to ensure that records are efficiently loaded using compression. For this, you use the APPEND
hint, as shown in the following SQL:
... INSERT /* +APPEND */ INTO my_materialized_view ...
Using the SQL Tuning Advisor and SQL Access Advisor, you can invoke the query optimizer in advisory mode to examine a SQL statement or set of SQL statements, and provide recommendations to improve their efficiency. The SQL Tuning Advisor and SQL Access Advisor can make various types of recommendations, such as creating SQL profiles, restructuring SQL statements, creating additional indexes or materialized views, and refreshing optimizer statistics. Additionally, Oracle Enterprise Manager enables you to accept and implement many of these recommendations in very few steps.
The SQL Access Advisor is primarily responsible for making schema modification recommendations, such as adding or dropping indexes and materialized views. It also recommends a partitioning strategy. The SQL Tuning Advisor makes other types of recommendations, such as creating SQL profiles and restructuring SQL statements. In some cases where significant performance improvements can be gained by creating a new index, the SQL Tuning Advisor may recommend doing so. However, these recommendations must be verified by running the SQL Access Advisor with a SQL workload that contains a set of representative SQL statements.
Example: Using the SQL Tuning Advisor to Verify SQL Performance
You can use the SQL Tuning Advisor to tune a single SQL statement or multiple SQL statements. When tuning multiple SQL statements, remember the SQL Tuning Advisor does not recognize interdependencies between the SQL statements. Instead, it is just meant to be a convenient way for you to run the SQL Tuning Advisor for a large number of SQL statements.
To run the SQL Tuning Advisor to verify SQL performance:
Go to the Advisor Central page, then click SQL Advisors.
The SQL Advisors page is displayed.
Click Schedule SQL Tuning Advisor.
The Schedule SQL Tuning Advisor page is displayed. A suggested name will be in the Name field, which you can modify. Then select Comprehensive to have a comprehensive analysis performed. Select Immediately for the Schedule. Select a appropriate SQL Tuning Set, and then click OK.
The Processing page is displayed. Then the Recommendations page shows the recommendations for improving performance. Click View Recommendations.
The Recommendations page is displayed.
The recommendation is to create an index, which you can implement by clicking Implement. You may also want to run the SQL Access Advisor.
You can minimize resource use, and improve your data warehouse's performance through the use of the following capabilities:
Data warehouses often contain large tables and require techniques both for managing these large tables and for providing good query performance across these large tables. This section describes partitioning, a key method for addressing these requirements. Two capabilities relevant for query performance in a data warehouse are partition pruning and partitionwise joins.
Partition pruning is an essential performance feature for data warehouses. In partition pruning, the optimizer analyzes FROM
and WHERE
clauses in SQL statements to eliminate unneeded partitions when building the partition access list. This enables Oracle Database to perform operations only on those partitions that are relevant to the SQL statement. Oracle Database prunes partitions when you use range, LIKE
, equality, and IN
-list predicates on the range or list partitioning columns, and when you use equality and IN
-list predicates on the hash partitioning columns.
Partition pruning dramatically reduces the amount of data retrieved from disk and shortens the use of processing time, which improves query performance and resource use. If you partition the index and table on different columns (with a global partitioned index), partition pruning eliminates index partitions even when the partitions of the underlying table cannot be eliminated.
Depending upon the actual SQL statement, Oracle Database may use static or dynamic pruning. Static pruning occurs at compile time; the information about the partitions is accessed beforehand, dynamic pruning occurs at run time; the partitions are accessed by a statement and are not known beforehand. A sample scenario for static pruning is a SQL statement that contains a WHERE
clause with a constant literal on the partition key column. An example of dynamic pruning is the use of operators or functions in the WHERE
clause.
Partition pruning affects the statistics of the objects where pruning will occur and will affect the execution plan of a statement.
Partitionwise joins reduce query response time by minimizing the amount of data exchanged among parallel execution servers when joins execute in parallel. This significantly reduces response time and improves the use of both CPU and memory resources. In Oracle Real Application Clusters environments, partitionwise joins also avoid or at least limit the data traffic over the interconnection, which is the key to achieving good scalability for massive join operations.
Partitionwise joins can be full or partial. Oracle Database decides which type of join to use.
You should always consider partitioning in data warehousing environments.
In the Advisor Central page, click SQL Advisors.
The SQL Advisors page is displayed.
Click SQL Access Advisor.
The SQL Access Advisor page is displayed.
From the Initial Options menu, select Use Default Options and click Continue.
From the Workload Sources, select Current and Recent SQL Activity and click Next.
The Recommendation Options page is displayed.
Select Partitioning and then Comprehensive Mode, then click Next.
The Schedule page is displayed.
Enter SQLACCESStest1
into the Task Name field and click Next.
The Review page is displayed. Click Submit.
Click Submit.
The Confirmation page is displayed.
Select your task and click View Result. The Results for Task page is displayed, showing possible improvements as a result of partitioning.
In data warehouses, you can use materialized views to compute and store aggregated data such as the sum of sales. You can also use materialized views to compute joins with or without aggregations, and they are very useful for frequently executed expensive joins between large tables and expensive calculations. A materialized view eliminates the overhead associated with expensive joins and aggregations for a large or important class of queries because it computes and stores summarized data before processing large joins or queries. Materialized views in these environments are often referred to as summaries.
One of the major benefits of creating and maintaining materialized views is the ability to use the query rewrite feature, which transforms a SQL statement expressed in terms of tables or views into a statement accessing one or more materialized views that are defined on the detail tables. The transformation is transparent to the user or application, requiring no intervention and no reference to the materialized view in the SQL statement. Because the query rewrite feature is transparent, materialized views can be added or dropped just like indexes without invalidating the SQL in the application code.
When underlying tables contain large amounts of data, it is a resource intensive and time-consuming process to compute the required aggregates or to compute joins between these tables. In these cases, queries can take minutes or even hours to return the answer. Because materialized views contain already computed aggregates and joins, Oracle Database uses the powerful query rewrite process to quickly answer the query using materialized views.
Bitmap indexes are widely used in data warehousing environments. The environments typically have large amounts of data and ad hoc queries, but a low level of concurrent DML transactions. Fully indexing a large table with a traditional B-tree index can be prohibitively expensive in terms of disk space because the indexes can be several times larger than the data in the table. Bitmap indexes are typically only a fraction of the size of the indexed data in the table. For such applications, bitmap indexing provides the following:
Reduced response time for large classes of ad hoc queries
Reduced storage requirements compared to other indexing techniques
Dramatic performance gains even on hardware with a relatively small number of CPUs or a small amount of memory
Efficient maintenance during parallel DML and loads
During bulk-load operations, Oracle Database can compress the data being loaded. Oracle Database handles data transformation and compression internally and requires no application changes to use compression. Compression can help improve performance for queries that scan large amounts of data, by reducing the amount of I/O required to scan that data.
No special installation is required to configure this feature. However, to use this feature, the database compatibility parameter must be set to 11.2.0
or higher.
Note: Hybrid Columnar Compression is a feature of certain Oracle storage systems. See Oracle Database Concepts for more information. |
The PL/SQL package DBMS_COMPRESSION
provides a compression advisor interface to help choose the correct compression level for an application. The compression advisor analyzes the objects in the database and estimates the possible compression ratios that could be achieved.
See Also:
|
The table_compress
clause of the CREATE
TABLE
and ALTER
TABLE
statements provides COMPRESS
, which takes a parameter for compression level. Use COMPRESS
to instruct the database whether to compress data segments to reduce disk use. All forms of table compression are generally useful in OLAP environments and data warehouses, where the number of insert and update operations is small; some forms are also useful in OLTP environments.
Note: For compression to be enabled on a table, it must be turned on at table creation time, or the table must be changed to enable it. |
You can maximize how resources are used in your system by ensuring that operations run in parallel whenever possible. Database operations run faster if they are not constrained by resources. The operation may be constrained by CPU resources, I/O capacity, memory, or interconnection traffic (in a cluster). To improve the performance of database operations, you focus on the performance problem and try to eliminate it (so that the problem might shift to another resource). Oracle Database provides functions to optimize the use of available resources and to avoid using unnecessary resources.
Parallel execution dramatically reduces response time for data-intensive operations on large databases typically associated with a decision support system (DSS) and data warehouses. You can also implement parallel execution on certain types of online transaction processing (OLTP) and hybrid systems. Parallel execution is sometimes called parallelism. Parallelism is breaking down a task so that, instead of one process doing all the work in a query, many processes do part of the work at the same time. An example of this is when four processes handle four different quarters in a year instead of one process handling all four quarters by itself. The improvement in performance can be quite high. Parallel execution improves processing for the following:
Queries requiring large table scans, joins, or partitioned index scans
Creation of large indexes
Creation of large tables (including materialized views)
Bulk insert, update, merge, and delete operations
You can also use parallel execution to access object types within an Oracle database. For example, you can use parallel execution to access large objects (LOBs).
Parallel execution benefits systems with all of the following characteristics:
Symmetric multiprocessors (SMPs), clusters,N or massively parallel systems
Sufficient I/O bandwidth
Underutilized or intermittently used CPUs (for example, systems where CPU usage is typically less than 30 percent)
Sufficient memory to support additional memory-intensive processes, such as sorts, hashing, and I/O buffers
If your system lacks any of these characteristics, then parallel execution might not significantly improve performance. In fact, parallel execution might reduce system performance on overutilized systems or systems with small I/O bandwidth.
Parallel execution divides the task of running a SQL statement into multiple small units, each of which is executed by a separate process. Also, the incoming data (tables, indexes, partitions) can be divided into parts called granules. The user shadow process takes on the role as parallel execution coordinator or query coordinator. The query coordinator performs the following tasks:
Parses the query and determines the degree of parallelism
Allocates one or two sets of slaves (threads or processes)
Controls the query and sends instructions to the parallel query slaves
Determines which tables or indexes must be scanned by the parallel query slaves
Produces the final output to the user
The parallel execution coordinator may enlist two or more of the instance's parallel execution servers to process a SQL statement. The number of parallel execution servers associated with a single operation is known as the degree of parallelism or DOP.
A single operation is a part of a SQL statement, such as an ORDER
BY
operation or a full table scan to perform a join on a non-indexed column table.
The degree of parallelism is specified in the following ways:
At the statement level with PARALLEL
hints
At the session level by issuing the ALTER
SESSION
FORCE
PARALLEL
statement
At the table level in the table's definition
At the index level in the index's definition
Wait events are statistics that are incremented by a server process to indicate that the server process had to wait for an event to complete before being able to continue processing. A session could wait for a variety of reasons, including waiting for more input, waiting for the operating system to complete a service such as a disk write operation, or it could wait for a lock or latch.
When a session is waiting for resources, it is not doing any useful work. A large number of wait events is a source of concern. Wait event data reveals various symptoms of problems that might be affecting performance, such as latch contention, buffer contention, and I/O contention.
This preface contains these topics:
Oracle Database 2 Day + Data Warehousing Guide is for anyone who wants to perform common day-to-day warehousing tasks with Oracle Database. The main prerequisites are to have read through Oracle Database 2 Day DBA and to have a basic knowledge of computers.
In particular, this guide is targeted toward the following groups of users:
Oracle DBAs wanting to acquire data warehouse administrative skills
DBAs who have some data warehouse background but are new to Oracle Database
For information about Oracle's commitment to accessibility, visit the Oracle Accessibility Program website at http://www.oracle.com/pls/topic/lookup?ctx=acc&id=docacc
.
Access to Oracle Support
Oracle customers have access to electronic support through My Oracle Support. For information, visit http://www.oracle.com/pls/topic/lookup?ctx=acc&id=info
or visit http://www.oracle.com/pls/topic/lookup?ctx=acc&id=trs
if you are hearing impaired.
For more information, see these Oracle resources:
The following text conventions are used in this document:
Convention | Meaning |
---|---|
boldface | Boldface type indicates graphical user interface elements associated with an action, or terms defined in text or the glossary. |
italic | Italic type indicates book titles, emphasis, or placeholder variables for which you supply particular values. |
monospace | Monospace type indicates commands within a paragraph, URLs, code in examples, text that appears on the screen, or text that you enter. |