Oracle® Warehouse Builder Data Modeling, ETL, and Data Quality Guide 11g Release 2 (11.2) Part Number E10935-05 |
|
|
PDF · Mobi · ePub |
This chapter describes the use of data auditors to monitor data quality.
This chapter contains the following topics:
Data auditors are processes that validate data against a set of data rules to determine which records comply and which do not. Data auditors gather statistical metrics on how well the data in a system complies with a rule by auditing and marking how many errors are occurring against the audited data. Data auditors ensure that your data complies with the data rules you defined, and thus let you track compliance with your business rules.
Data auditors are an important tool in ensuring, on an ongoing basis, that data quality levels meet business requirements. Identifying sudden increases in bad data also enables you to associate the increase with specific events and then identify possible root causes, such as addition of new data sources or changes to a data source or application that impacts your system.
To monitor data using Oracle Warehouse Builder, you must first discover or design data rules, as described in "Performing Data Profiling", then define data auditors, and schedule them or incorporate them into larger process flows.
Data auditors can be deployed and run ad hoc if necessary. More often, they are scheduled for regular execution as part of a process flow, to monitor the quality of the data in an operational environment such as a data warehouse or ERP system, either immediately after updates like data loads, or at regular intervals.
Data auditors have thresholds that enable you to create logic because too many noncompliant records can divert the process flow into an error or notification stream. You can specify a threshold value for each data rule that the data auditor audits. This value is used to determine if the data in the data object is within the limits that you defined. Based on this threshold, the process can choose actions. In a process flow, you can test this value and branch based upon the result.
For example, you create a data auditor to audit the data in the Employees
table. This table contains two data rules, emp_email_unique_rule
and emp_sal_min_rule
. You specify that the threshold value for both rules is 80%. If less than 80% of the data in the Employees
table does not follow the data rules, then the auditing for the table fails.
See Also:
"Specifying Actions for Data That Violates Defined Data Rules" for information about specifying threshold value.Audit Results for Data Auditors
In addition to setting thresholds for noncompliant records, you can capture audit results and store them for analysis. When run, the data auditor sets several output values. One of these values is the audit result. Audit results provide information about the extent of data rule violations that occurred while running the data auditor.
See Also:
"Data Auditor Execution Results"Data auditors are objects that you can use to continuously monitor your source schema to ensure that the data adheres to the defined data rules. You can monitor an object only if you have defined data rules for the object.
A data auditor can monitor data quality for multiple data object. You can create data auditors for tables, views, materialized views, and external tables.
See Also:
"Overview of Data Auditors" for more information about data auditors.To monitor data quality, perform the following steps:
Create a data auditor containing the data objects to monitor.
(Optional) Configure the data auditor and set physical deployment parameters for the data auditor. These parameters are used while running the data auditor.
Deploy the data auditor to create the data auditor in the target schema.
See "Deploying Objects".
Run the data auditor to identify records that do not follow the data rules defined on the data objects. You can either run data auditors manually or schedule them to run at specified times.
See "Auditing Data Objects Using Data Auditors" for information about running data auditors.
View the records that were identified as not complying with the defined data rules for the objects that are part of the data auditor.
Note:
You cannot import metadata for data auditors in Merge mode. For more information about import mode options, see Oracle Warehouse Builder Installation and Administration Guide.Use the Create Data Auditor Wizard to create data auditors. Data auditors are part of an Oracle module in a project.
To create a data auditor:
Expand the Oracle module in which you want to create the data auditor.
Right-click Data Auditors and select New Data Auditor.
The Create Data Auditor Wizard is displayed.
On the Name and Description page, provide the following details and click Next.
Name: Enter the name of the data auditor.
Description: Enter an optional description for the data auditor.
On the Select Objects page, select the data objects to audit and click Next.
The Available section lists the objects available for auditing. The Selected section contains the objects that are selected for auditing. Use the buttons to move objects to the Selected section. To select multiple objects, hold down the Ctrl key while selecting objects.
On the Choose Actions page, specify the action to be taken for records that do not follow the data rules bound to the selected objects and click Next.
See "Specifying Actions for Data That Violates Defined Data Rules".
On the Summary page, review the selections that you made. Click Back to modify any selected values or click Finish to create the data auditor.
The new data auditor is added to the Data Auditors node. At this stage, only the metadata for the data auditor is stored in your workspace. To use this data auditor to monitor the quality of data in your data objects, you must run the data auditor as described in "Auditing Data Objects Using Data Auditors".
Use the Choose Actions page of the Create Data Auditor Wizard or the Choose Action tab of the Edit Data Auditor dialog box to specify how records that violate data rules defined on the data objects are handled. You can also specify the level of non-compliance that is permitted for the data.
This page contains two sections: Error threshold mode and Data Rules.
Error threshold mode
Use the Error threshold mode to specify the method used to determine compliance of data to data rules.
Select one of the following methods:
Percent: The data auditor sets the audit result based on the percentage of records that do not follow the data rule. This percentage is specified in the rule's Defect Threshold value.
Six Sigma: The data auditor sets the audit result based on the Six Sigma values for the data rules. If the calculated Six Sigma value for any rule is less than the specified Sigma Threshold value, then the data auditor sets the AUDIT RESULT to 2.
Data Rules
The Data Rules section lists the data rules applied to the objects selected on the Select Object page. For each rule, specify the following:
Action: The action to be performed if data in the source object does not follow the data rule. Select Report to ensure that the data rule is audited. Select Ignore if you want the data rule to be ignored.
Defect Threshold: The percent of records that should follow the data rules to ensure successful auditing. Specify a value between 1 and 100. This value is ignored if you select Six Sigma in the Error threshold mode section.
Sigma Threshold: The required success rate. Specify a number between 0 and 7. If you set the value to 7, then no failures are enabled. This value is ignored if you select Percent in the Error threshold mode section.
After you create a data auditor, you can edit it and modify any of its properties using the following steps.
In the Projects Navigator, right-click the data auditor and select Open.
The Edit Data Auditor dialog box is displayed.
On the Name tab, enter a new name or description for the data auditor.
On the Select Objects tab, use the buttons to add or remove objects that is audited as part of the data auditor.
On the Choose Actions tab, edit the data correction actions that you specified.
See "Specifying Actions for Data That Violates Defined Data Rules".
On the Reconcile Objects tab, select the check box to the left of an object to reconcile its definition with the latest repository definition. Click Reconcile.
Click OK to close the Edit Data Auditor dialog box.
During the configuration phase, you assign physical deployment properties to the data auditor that you created by setting the configuration parameters. The Configuration tab for the data auditor enables you to configure the physical properties of the data auditor.
To configure a data auditor:
From the Projects Navigator, expand the Databases node and then the Oracle node.
Right-click the name of the data auditor to configure and select Configure.
The Configuration tab for the data auditor is displayed.
Based on your requirement, configure the parameters listed in "Run Time Parameters", "Data Auditor Parameters", and "Code Generation Options".
Table 20-1 lists the Run Time configuration parameters.
Table 20-1 Run Time Configuration Parameters for Data Auditors
Configuration Parameter Name | Description |
---|---|
Default Purge Group |
Used when executing the package. Each audit record in the run time schema is assigned to the purge group specified. |
Bulk Size |
Number of rows to be fetched as a batch while processing cursors |
Analyze table sample percentage |
Percentage of rows to be sampled when the target tables are analyzed. You analyze target tables to gather statistics that you can use to improve performance while loading data into the target tables. |
Commit frequency |
Number of rows processed before a commit is issued |
Maximum number of errors |
Maximum number of errors enabled before the execution of this step is terminated |
Default Operating mode |
Represents the operating mode used The options that you can select are Row based, Row based (target only), Set based, Set based fail over to row based, Set based fail over to row based (target only). |
Default Audit Level |
Indicates the audit level used when executing the package. When the package is run, the amount of audit information captured in the run time schema depends on the value set for this parameter. The options that you can select are as follows:
|
This category uses the same name as the data auditor. Table 20-2 describes the generic data auditor configuration parameters.
Table 20-2 Data Auditor Configuration Parameters for Data Auditors
Configuration Parameter Name | Description |
---|---|
Generation Comments |
Specify additional comments for the generated code. |
Threshold Mode |
Specify the mode that is necessary to measure failure thresholds. The options are PERCENTAGE and SIX SIGMA. |
Language |
Language used to define the generated code. The options are PL/SQL (default) and UNDEFINED. Ensure that PL/SQL (default) is selected. |
Deployable |
Select this option to indicate to deploy this data auditor. Oracle Warehouse Builder generates code only if the data auditor is marked as deployable. |
Referred Calendar |
Specify the schedule to associate with the data auditor. The schedule defines when the data auditor runs. |
Table 20-3 describes the code generation options that you can set for data auditors.
Table 20-3 Code Generation Options for Data Auditors
Configuration Parameter Name | Description |
---|---|
ANSI SQL Syntax |
Select this option to use ANSI SQL code in the generated code. If this option is not selected, then Oracle SQL syntax is generated. |
Commit Control |
Specifies how commit is performed. The options available for this parameter are: Automatic, Automatic Correlated, and Manual. Ensure that this parameter is set to Automatic. |
Enable Parallel DML |
Select this option to enable parallel DML at run time. |
Analyze table statements |
Set this option to True to generate the statement used to collect statistics for the data auditor. If the target table is not in the same schema as the mapping and you want to analyze the table, then you must grant |
Optimized Code |
Select this option to indicate that optimized code should be generated. |
Generation Mode |
Select the mode in which optimized code should be generated. The options that you can select are: All Operating Modes, Row based, Row based (target only), Set based, Set based fail over to row based, and Set based fail over to row based (target only). |
Use Target Load Ordering |
Select this option to generate code for target load ordering. |
Error Trigger |
Specify the name of the error trigger procedure. |
Bulk Processing Code |
Select this option to generate bulk processing code. |
After you create a data auditor, you can use it to monitor the data in your data objects. This ensures that the data rule violations for the objects are detected. When you run a data auditor, any records that violate the data rules defined on the data objects are written to the error tables.
There are two ways of using data auditors:
To check if the data in the data object adheres to the data rules defined for the object, you must run the data auditor. You can run data auditors from the Design Center or the Control Center Manager. To run a data auditor from the Design Center, right-click the data auditor and select Start. In the Control Center Manager, select the data auditor, and from the File menu, select Start. The results are displayed in the Job Details window as described in "Data Auditor Execution Results".
You can schedule the execution of a data auditor using the following steps:
Create a process flow that contains a Data Auditor Monitor activity that represents the data auditor.
Schedule this process flow to run at a predefined time.
For more information about scheduling objects, see "Defining Schedules".
Figure 20-1 displays a process flow that contains a Data Auditor Monitor activity. In this process flow, LOAD_EMP_MAP
is a mapping that loads data into the EMP
table. If the data load is successful, then the data auditor EMP_DATA_AUDIT
is run. The data auditor monitors the data in the EMP
table based on the data rules defined for the table.
Figure 20-1 Data Auditor Monitor Activity in a Process Flow
After you run a data auditor, the Job Details window displays the details of the execution. The Job Details window contains two tabs: Input Parameters and Execution Results. The Job Details window is displayed only when you set the deployment preference Show Monitor to true.
See Also:
Oracle Warehouse Builder Concepts for more information about deployment preferences.Figure 20-2 displays the Job Details window containing the "Input Parameters Tab" and the "Execution Results Tab".
Figure 20-2 Data Auditor Execution Results
The Input Parameters tab contains the values of input parameters used to run the data auditor.
The Execution Results tab displays the results of running the data auditor. This tab contains two sections: Row Activity and Output Parameters.
The Row Activity section contains details about the inserts into the error table for each step. When multiple data rule is specified, multitable insert may be used in the data auditor. In this case, the count of the number of rows is not accurate.
For example, in the data auditor execution result in Figure 20-2, the data rule called E_NOT_NULL inserted one record into the error table.
The Output Parameters section contains the following three parameters:
AUDIT_RESULT: Indicates the result of running the data auditor. The possible values for this parameter are as follows:
0: No data rule violations occurred.
1: At least one data rule violation occurred, but no data rule failed to meet the minimum quality threshold as defined in the data auditor.
2: At least one data rule failed to meet the minimum quality threshold.
For more information about setting the threshold, see the step on choosing actions in "Creating Data Auditors".
EO_<data_rule_name>: Represents the calculated error quality for the specified data rule. Zero (0) indicates all errors and 100 indicates no errors.
SO_<data_rule_name>: Represents the Six Sigma quality calculated for the specified data rule.
When you run a data auditor, either manually or as part of the process flow, Oracle Warehouse Builder writes records that do not follow defined data rules for the objects contained in the data auditor to error tables. Each object contained in the data auditor has a corresponding error table that stores noncompliant records for that object.
You view all noncompliant records that are written to error tables by using the Repository Browser.
To view error tables created because of data auditor execution:
Grant privileges on the error tables as described in "Granting Privileges on Error Tables".
Use the Repository Browser to view the error tables. Perform the following steps:
Open the Repository Browser as described in "Opening the Repository Browser".
View error tables using the Repository Browser as described in "Viewing Error Tables Created as a Result of Data Auditor Execution".
Before you view data stored in error tables using the Repository Browser, you must grant privileges on the error tables to the OWBSYS
user. It enables the Repository Browser to access error table data.
To grant privileges on error tables:
In SQL*Plus, log in to the schema containing the error tables.
The error table for an object is stored in the same schema as the object.
Run the SQL script OWB_HOME
\owb\rtp\sql\grant_error_table_privileges.sql
.
When prompted, enter the name of the error table for which you want to grant privileges.
If you did not specify a name for the error table of an object using the Error Table Name property, Oracle Warehouse Builder provides a default name. For objects that use error tables, the default error table name is the object name suffixed by "_ERR".
Repeat Steps 2 and 3 for each error table to which you want to grant privileges.