Oracle® Database 2 Day + Data Warehousing Guide 11g Release 2 (11.2) Part Number E25555-03 |
|
|
PDF · Mobi · ePub |
You must update your data warehouse on a regular basis to ensure that the information derived from it is current. The process of updating the data is called the refresh process, and this chapter describes the following topics:
Extraction, transformation and loading (ETL) is done on a schedule to reflect changes made to the original source system. During this step, you physically insert the new, updated data into the production data warehouse schema and take all the other steps necessary (such as building indexes, validating constraints, making backup copies) to make this new data available to the users. After this data has been loaded into the data warehouse, the materialized views must be updated to reflect the latest data.
The partitioning scheme of the data warehouse is often crucial in determining the efficiency of refresh operations in the data warehouse loading process. The loading process is often considered when choosing the partitioning scheme of data warehouse tables.
Most data warehouses are loaded with new data on a regular schedule. For example, every night, week, or month, new data is brought into the data warehouse. The data being loaded at the end of the week or month typically corresponds to the transactions for the week or month. In this common scenario, the data warehouse is being loaded by time. This suggests that the data warehouse tables be partitioned on a date column. In the data warehouse example, suppose the new data is loaded into the sales table every month. Furthermore, the sales
table has been partitioned by month. These steps show how the load process will proceed to add the data for a new month (Q1 2006) to the table sales
.
Many queries request few columns from the products
, customers
, and sales
tables, restricting the query by date. A materialized view will speed up the majority of the queries against the three tables. Use a prebuilt table on top of which the materialized view will be created. Choose the partitioning strategy of the materialized view in synchronization with the sales
table's partitioning strategy.
The following example shows the refreshing of a materialized view. It uses a partition exchange loading operation. The example is based on the sales
table in the sh
schema.
To refresh a materialized view:
Create a table that will be the basis for the materialized view.
CREATE TABLE sales_prod_cust_mv ( time_id DATE , prod_id NUMBER , prod_name VARCHAR2(50) , cust_id NUMBER , cust_first_name VARCHAR2(20) , cust_last_name VARCHAR2(40) , amount_sold NUMBER , quantity_sold NUMBER ) PARTITION BY RANGE (time_id) ( PARTITION p1999 VALUES LESS THAN (TO_DATE('01-JAN-2000','DD-MON-YYYY')) , PARTITION p2000 VALUES LESS THAN (TO_DATE('01-JAN-2001','DD-MON-YYYY')) , PARTITION p2001h1 VALUES LESS THAN (TO_DATE('01-JUL-2001','DD-MON-YYYY')) , PARTITION p2001h2 VALUES LESS THAN (TO_DATE('01-JAN-2002','DD-MON-YYYY')) , PARTITION p2001q1 VALUES LESS THAN (TO_DATE('01-APR-2002','DD-MON-YYYY')) , PARTITION p2002q2 VALUES LESS THAN (TO_DATE('01-JUL-2002','DD-MON-YYYY')) , PARTITION p2002q3 VALUES LESS THAN (TO_DATE('01-OCT-2002','DD-MON-YYYY')) , PARTITION p2002q4 VALUES LESS THAN (TO_DATE('01-JAN-2003','DD-MON-YYYY')) , PARTITION p2003q1 VALUES LESS THAN (TO_DATE('01-APR-2003','DD-MON-YYYY')) , PARTITION p2003q2 VALUES LESS THAN (TO_DATE('01-JUL-2003','DD-MON-YYYY')) , PARTITION p2003q3 VALUES LESS THAN (TO_DATE('01-OCT-2003','DD-MON-YYYY')) , PARTITION p2003q4 VALUES LESS THAN (TO_DATE('01-JAN-2004','DD-MON-YYYY')) , PARTITION p2004q1 VALUES LESS THAN (TO_DATE('01-APR-2004','DD-MON-YYYY')) , PARTITION p2004q2 VALUES LESS THAN (TO_DATE('01-JUL-2004','DD-MON-YYYY')) , PARTITION p2004q3 VALUES LESS THAN (TO_DATE('01-OCT-2004','DD-MON-YYYY')) , PARTITION p2004q4 VALUES LESS THAN (TO_DATE('01-JAN-2005','DD-MON-YYYY')) , PARTITION p2005q1 VALUES LESS THAN (TO_DATE('01-APR-2005','DD-MON-YYYY')) , PARTITION p2005q2 VALUES LESS THAN (TO_DATE('01-JUL-2005','DD-MON-YYYY')) , PARTITION p2005q3 VALUES LESS THAN (TO_DATE('01-OCT-2005','DD-MON-YYYY')) , PARTITION p2005q4 VALUES LESS THAN (TO_DATE('01-JAN-2006','DD-MON-YYYY')) , PARTITION p2006q1 VALUES LESS THAN (TO_DATE('01-APR-2006','DD-MON-YYYY')) ) PARALLEL COMPRESS;
Load the initial table from the sales
table.
ALTER SESSION ENABLE PARALLEL DML; INSERT /*+ PARALLEL smv */ INTO sales_prod_cust_mv smv SELECT /*+ PARALLEL s PARALLEL c */ s.time_id , s.prod_id , p.prod_name , s.cust_id , cust_first_name , c.cust_last_name , SUM(s.amount_sold) , SUM(s.quantity_sold) FROM sales s , products p , customers c WHERE s.cust_id = c.cust_id AND s.prod_id = p.prod_id GROUP BY s.time_id , s.prod_id , p.prod_name , s.cust_id , c.cust_first_name , c.cust_last_name; COMMIT;
Create a materialized view.
CREATE MATERIALIZED VIEW sales_prod_cust_mv ON PREBUILT TABLE ENABLE QUERY REWRITE AS SELECT s.time_id , s.prod_id , p.prod_name , s.cust_id , c.cust_first_name , c.cust_last_name , SUM(s.amount_sold) amount_sold , SUM(s.quantity_sold) quantity_sold FROM sales s , products p , customers c WHERE s.cust_id = c.cust_id AND s.prod_id = p.prod_id GROUP BY s.time_id , s.prod_id , p.prod_name , s.cust_id , c.cust_first_name , c.cust_last_name;
Load a separate table to be exchanged with the new partition.
CREATE TABLE sales_q1_2006 PARALLEL COMPRESS AS SELECT * FROM sales WHERE 0 = 1; /* This would be the regular ETL job */ ALTER SESSION ENABLE PARALLEL DML; INSERT /* PARALLEL qs */ INTO sales_q1_2006 qs SELECT /* PARALLEL s */ prod_id , cust_id , add_months(time_id,3) , channel_id , promo_id , quantity_sold , amount_sold FROM sales PARTITION(sales_q4_2005) s; COMMIT; CREATE BITMAP INDEX bmp_indx_prod_id ON sales_q1_2006 (prod_id); CREATE BITMAP INDEX bmp_indx_cust_id ON sales_q1_2006 (cust_id); CREATE BITMAP INDEX bmp_indx_time_id ON sales_q1_2006 (time_id); CREATE BITMAP INDEX bmp_indx_channel_id ON sales_q1_2006 (channel_id); CREATE BITMAP INDEX bmp_indx_promo_id ON sales_q1_2006 (promo_id); ALTER TABLE sales_q1_2006 ADD CONSTRAINT sales_q_prod_fk FOREIGN KEY (prod_id) REFERENCES products(prod_id) ENABLE NOVALIDATE; ALTER TABLE sales_q1_2006 ADD CONSTRAINT sales_q_cust_fk FOREIGN KEY (cust_id) REFERENCES customers(cust_id) ENABLE NOVALIDATE; ALTER TABLE sales_q1_2006 ADD CONSTRAINT sales_q_time_fk FOREIGN KEY (time_id) REFERENCES times(time_id) ENABLE NOVALIDATE; ALTER table sales_q1_2006 ADD CONSTRAINT sales_q_channel_fk FOREIGN KEY (channel_id) REFERENCES channels(channel_id) ENABLE NOVALIDATE; ALTER table sales_q1_2006 ADD CONSTRAINT sales_q_promo_fk FOREIGN KEY (promo_id) REFERENCES promotions(promo_id) ENABLE NOVALIDATE; BEGIN DBMS_STATS.GATHER_TABLE_STATS('SH','SALES_Q1_2006'); END; /
Create and load a separate table to be exchanged with a partition in the materialized view.
CREATE TABLE sales_mv_q1_2006 PARALLEL COMPRESS AS SELECT * FROM sales_prod_cust_mv WHERE 1 = 0; ALTER SESSION ENABLE PARALLEL DML; INSERT /*+ PARALLEL smv */ INTO sales_mv_q1_2006 smv SELECT /*+ PARALLEL s PARALLEL c */ s.time_id , s.prod_id , p.prod_name , s.cust_id , cust_first_name , c.cust_last_name , SUM(s.amount_sold) , SUM(s.quantity_sold) FROM sales_q1_2006 s , products p , customers c WHERE s.cust_id = c.cust_id AND s.prod_id = p.prod_id GROUP BY s.time_id , s.prod_id , p.prod_name , s.cust_id , c.cust_first_name , c.cust_last_name; COMMIT;
Gather statistics.
BEGIN DBMS_STATS.GATHER_TABLE_STATS('SH','SALES_MV_Q1_2006'); END;
Exchange the partitions.
ALTER TABLE sales EXCHANGE PARTITION sales_q1_2006 WITH TABLE sales_q1_2006 INCLUDING INDEXES WITHOUT VALIDATION; ALTER TABLE sales_prod_cust_mv EXCHANGE PARTITION p2006q1 WITH TABLE sales_mv_q1_2006 INCLUDING INDEXES WITHOUT VALIDATION;
Inform the database that the materialized view is fresh again.
ALTER MATERIALIZED VIEW sales_prod_cust_mv CONSIDER FRESH;
Note that because this scenario uses a prebuilt table and, because the constraints are not RELY
constraints, the query rewrite feature will work only with the query_rewrite_integrity
parameter set to STALE_TOLERATED
.
A particularly effective way of removing and archiving your data is through the use of a rolling window. An example of using a rolling window is when the data warehouse stores the most recent 36 months of sales data. A new partition can be added to the sales
table for each new month, and an old partition can be removed from the sales
table. This way, you will always maintain 36 months of data in the warehouse.
The following example shows a rolling window for the sales
table in the sh
schema.
Add the sales for December 2005.
ALTER TABLE sales ADD PARTITION sales_12_2005 VALUES LESS THAN ('01-JAN-2006');
Note that you must rebuild any existing indexes.
Drop the partition for 1999.
ALTER TABLE sales DROP PARTITION sales_1999;