Oracle® Data Mining Application Developer's Guide 11g Release 2 (11.2) Part Number E12218-07 |
|
|
PDF · Mobi · ePub |
This chapter explains how to create data mining models and retrieve model details.
Note:
This chapter assumes a basic understanding of mining functions and algorithms, as described in Oracle Data Mining Concepts.This chapter contains the following topics:
The general steps involved in creating a data mining model are summarized as follows:
Prepare the data. (See Chapter 3.)
Optionally, specify model settings. (See "Model Settings".)
Create the model. (See "Creating a Model".)
View model details. (See "Model Details".)
Test the model. (See Oracle Data Mining Concepts for information about test metrics for classification and regression.)
Evaluate the model with questions like: How accurate is the model? If there are rules associated with the model, what is their confidence and support? How well does the model address the business question?
Note:
To better understand this process, you can look at the source code of the sample data mining programs provided with Oracle Database. See "Sample Mining Models".See Also:
Oracle Data Mining Concepts for more information about the process of building a modelNumerous settings are available for configuring data mining models at build time. To specify settings, create a settings table and pass it to the model creation process. A settings table is optional, because all settings have default values.
Model settings are documented in Oracle Database PL/SQL Packages and Types Reference. If you are using the HTML version of Oracle Data Mining Application Developer's Guide, the links in Table 5-1 will take you directly to the documentation for each type of model setting.
Settings | Documentation |
---|---|
Modify the mining function |
|
Modify the algorithm |
|
Set global model characteristics |
|
Enable or disable Automatic Data Preparation |
A settings table must have the columns shown in Table 5-2.
Table 5-2 Settings Table Required Columns
Column Name | Data Type |
---|---|
|
|
|
|
The values inserted into the setting_name
column are one or more of several constants defined in the DBMS_DATA_MINING
package. Depending on what the setting name denotes, the value for the setting_value
column can be a predefined constant or the actual numerical value corresponding to the setting itself. The setting_value
column is defined to be VARCHAR2
. You can explicitly cast numerical inputs to string using the TO_CHAR()
function, or you can rely on the implicit type conversion provided by the Database.
Example 5-1 creates a settings table for an SVM classification model. Since SVM is not the default classifier, the ALGO_NAME
setting is used to specify the algorithm. Setting the SVMS_KERNEL_FUNCTION
to SVMS_LINEAR
causes the model to be built with a linear kernel. If you do not specify the kernel function, the algorithm chooses the kernel based on the number of attributes in the data.
Example 5-1 Create a Settings Table
CREATE TABLE svmc_sh_sample_settings ( setting_name VARCHAR2(30), setting_value VARCHAR2(4000)); BEGIN INSERT INTO svmc_sh_sample_settings (setting_name, setting_value) VALUES (dbms_data_mining.algo_name, dbms_data_mining.algo_support_vector_machines); INSERT INTO svmc_sh_sample_settings (setting_name, setting_value) VALUES (dbms_data_mining.svms_kernel_function, dbms_data_mining.svms_linear); COMMIT; END; /
The ALGO_NAME
setting specifies the algorithm for a model. If you wish to use the default algorithm for a particular mining function, or if there is only one algorithm available for the mining function, you do not need to specify the ALGO_NAME
setting.
Table 5-3 Data Mining Algorithms
The CLAS_COST_TABLE_NAME
setting specifies the name of a cost matrix table to be used in building a Decision Tree model. A cost matrix biases a classification model to minimize costly misclassifications. The cost matrix table must have the columns shown in Table 5-4.
Table 5-4 Cost Matrix Table Required Columns
Column Name | Data Type |
---|---|
|
|
|
|
|
|
Decision Tree is the only algorithm that supports a cost matrix at build time. However, you can create a cost matrix and associate it with any classification model for scoring.
If you want to use costs for scoring, create a table with the columns shown in Table 5-4, and use the DBMS_DATA_MINING.ADD_COST_MATRIX
procedure to add the cost matrix table to the model. You can also specify a cost matrix inline when invoking a PREDICTION
function.
See Also:
Oracle Data Mining Concepts for information about costsThe CLAS_PRIORS_TABLE_NAME
setting specifies the name of a table of prior probabilities to be used in building a Naive Bayes model. Prior probabilities can be used to offset differences in distribution between the build data and the actual population. The priors table must have the columns shown in Table 5-5.
Table 5-5 Priors Table Required Columns
Column Name | Data Type |
---|---|
|
|
|
|
See Also:
Oracle Data Mining Concepts for information about priorsThe CLAS_WEIGHTS_TABLE_NAME
setting specifies the name of a table of class weights to be used to bias a logistic regression (GLM classification) or SVM classification model to favor higher weighted classes. The weights table must have the columns shown in Table 5-6.
Table 5-6 Class Weights Table Required Columns
Column Name | Data Type |
---|---|
|
|
|
|
See Also:
Oracle Data Mining Concepts for information about class weightsInformation about mining model settings can be obtained from the data dictionary view ALL/USER/DBA_MINING_MODEL_SETTINGS
. When used with the ALL
prefix, this view returns information about the settings for the models accessible to the current user. When used with the USER
prefix, it returns information about the settings for the models in the user's schema. The DBA
prefix is only available for DBAs.
The columns of ALL_MINING_MODEL_SETTINGS
are described as follows and explained in Table 5-7.
SQL> describe all_mining_model_settings Name Null? Type ----------------------------------------- -------- ---------------------------- OWNER NOT NULL VARCHAR2(30) MODEL_NAME NOT NULL VARCHAR2(30) SETTING_NAME NOT NULL VARCHAR2(30) SETTING_VALUE VARCHAR2(4000) SETTING_TYPE VARCHAR2(7)
Table 5-7 ALL_MINING_MODEL_SETTINGS
Column | Description |
---|---|
|
Owner of the mining model |
|
Name of the mining model |
|
Name of the setting |
|
Value of the setting |
|
'INPUT' if the value is specified by a user; 'DEFAULT' if the value is system-generated |
The following query lists the settings for the SVM classification model SVMC_SH_CLAS_SAMPLE
. The ALGO_NAME
, CLAS_PRIORS_TABLE_NAME
, and SVMS_KERNEL_FUNCTION
settings are user-specified. These settings have been specified in a settings table for the model.
Example 5-2 ALL_MINING_MODEL_SETTINGS
SQL> COLUMN setting_value FORMAT A35 SQL> SELECT setting_name, setting_value, setting_type FROM all_mining_model_settings WHERE model_name in 'SVMC_SH_CLAS_SAMPLE'; SETTING_NAME SETTING_VALUE SETTING ------------------------------ ----------------------------------- ------- SVMS_ACTIVE_LEARNING SVMS_AL_ENABLE DEFAULT PREP_AUTO OFF DEFAULT SVMS_COMPLEXITY_FACTOR 0.244212 DEFAULT SVMS_KERNEL_FUNCTION SVMS_LINEAR INPUT CLAS_WEIGHTS_TABLE_NAME svmc_sh_sample_class_wt INPUT SVMS_CONV_TOLERANCE .001 DEFAULT ALGO_NAME ALGO_SUPPORT_VECTOR_MACHINES INPUT 7 rows selected. SQL> COLUMN setting_value FORMAT A25 SQL> SELECT setting_name, setting_value, setting_type FROM all_mining_model_settings WHERE model_name in 'SVMC_SH_CLAS_SAMPLE'; SETTING_NAME SETTING_VALUE SETTING_TYPE ------------------------------ ------------------------- ------------ ALGO_NAME ALGO_SUPPORT_VECTOR_MACHINES INPUT SVMS_ACTIVE_LEARNING SVMS_AL_ENABLE DEFAULT CLAS_PRIORS_TABLE_NAME svmc_sh_sample_priors INPUT PREP_AUTO OFF DEFAULT SVMS_COMPLEXITY_FACTOR 0.244212 DEFAULT SVMS_KERNEL_FUNCTION SVMS_LINEAR INPUT SVMS_CONV_TOLERANCE .001 DEFAULT
Note:
Some model settings are determined by the algorithm if not specified in a settings table. You can find the system-generated setting values by querying theALL_MINING_MODEL_SETTINGS
view.See Also:
Oracle Database PL/SQL Packages and Types Reference for details about model settingsThe CREATE_MODEL
procedure in the DBMS_DATA_MINING
package creates a mining model with the specified name, mining function, and case table (build data).
DBMS_DATA_MINING.CREATE_MODEL ( model_name IN VARCHAR2, mining_function IN VARCHAR2, data_table_name IN VARCHAR2, case_id_column_name IN VARCHAR2, target_column_name IN VARCHAR2 DEFAULT NULL, settings_table_name IN VARCHAR2 DEFAULT NULL, data_schema_name IN VARCHAR2 DEFAULT NULL, settings_schema_name IN VARCHAR2 DEFAULT NULL, transform_list IN DM_TRANSFORMS DEFAULT NULL;)
The mining function is a required argument to the CREATE_MODEL
procedure. A data mining function specifies a class of problems that can be modeled and solved.
Data mining functions implement either supervised or unsupervised learning. Supervised learning uses a set of independent attributes to predict the value of a dependent attribute or target. Unsupervised learning does not distinguish between dependent and independent attributes.
Supervised functions are predictive. Unsupervised functions are descriptive.
You can specify any of the values in Table 5-8 for the mining_function parameter to CREATE_MODEL
.
Table 5-8 Mining Model Functions
See Also:
Oracle Data Mining Concepts for an introduction to mining functionsYou can optionally specify a list of transformations to be applied to the build data before it is acted on by the algorithm. You can use the STACK
interface in DBMS_DATA_MINING_TRANSFORM
to build a list of transformation expressions for different attributes, you can specify a single transformation using the XFORM
interface in DBMS_DATA_MINING_TRANSFORM
, or you can write your own SQL expressions.
The transformation list argument to CREATE_MODEL
interacts with the PREP_AUTO
setting, which controls Automatic Data Preparation (ADP):
When ADP is on and you specify a transformation list, your transformations are applied with the automatic transformations and embedded in the model.
When ADP is off and you specify a transformation list, your transformations are applied and embedded in the model, but no system-generated transformations are performed.
When ADP is on and you do not specify a transformation list, the system-generated transformations are applied and embedded in the model.
When ADP is off and you do not specify a transformation list, no transformations are embedded in the model; you must separately prepare the data sets you use for building, testing, and scoring the model. This is the pre-release 11 behavior; it is the default behavior in 11g.
See Also:
Oracle Data Mining Concepts for information about Automatic Data PreparationModel details describe model attributes, rules, and other information about the model. You can invoke a GET_MODEL_DETAILS
function to retrieve model details. A separate GET_MODEL_DETAILS
function exists for each algorithm.
Model details reverse the transformations applied to the attributes, thus enabling the information to be easily understood by a user. You can obtain the transformations embedded in the model by invoking the GET_MODEL_TRANSFORMATIONS
function.
Model details, summarized in Table 5-9, support model transparency.
Mining models are database schema objects. Several system and object privileges, described in "Users and Privileges" in Oracle Data Mining Administrator's Guide, govern data mining activities. Mining models also support SQL AUDIT and SQL COMMENT, as described in "Mining Model Schema Objects" in Oracle Data Mining Administrator's Guide.
Information about mining model objects can be obtained from the data dictionary view ALL/USER/DBA_MINING_MODELS
. When used with the ALL
prefix, this view returns information about the mining models accessible to the current user. When used with the USER
prefix, it returns information about the mining models in the user's schema. The DBA
prefix is only available for DBAs.
The columns of ALL_MINING_MODELS
are described as follows and explained in Table 5-10.
SQL> describe all_mining_models Name Null? Type ----------------------------------------- -------- ---------------------------- OWNER NOT NULL VARCHAR2(30) MODEL_NAME NOT NULL VARCHAR2(30) MINING_FUNCTION VARCHAR2(30) ALGORITHM VARCHAR2(30) CREATION_DATE NOT NULL DATE BUILD_DURATION NUMBER MODEL_SIZE NUMBER COMMENTS VARCHAR2(4000)
Column | Description |
---|---|
|
Owner of the mining model. |
|
Name of the mining model. |
|
The mining model function. See "Mining Functions". |
|
The algorithm used by the mining model. See "Specifying the Algorithm". |
|
The date on which the mining model was created. |
|
The duration of the mining model build process in seconds. |
|
The size of the mining model in megabytes. |
|
Results of a SQL |
The query in Example 5-3 returns information about the mining models in the schema DMUSER
.
SQL> select model_name, mining_function, algorithm, creation_date, build_duration FROM all_mining_models WHERE owner IN 'DMUSER'; MODEL_NAME MINING_FUNCTION ALGORITHM CREATION_DATE BUILD_DURA --------------------- --------------------- --------------------------- ----------------------- AI_SH_SAMPLE ATTRIBUTE_IMPORTANCE MINIMUM_DESCRIPTION_LENGTH 13-JUN-07 1 AR_SH_SAMPLE ASSOCIATION_RULES APRIORI_ASSOCIATION_RULES 13-JUN-07 5 DT_SH_CLAS_SAMPLE CLASSIFICATION DECISION_TREE 13-JUN-07 4 KM_SH_CLUS_SAMPLE CLUSTERING KMEANS 13-JUN-07 7 NB_SH_CLAS_SAMPLE CLASSIFICATION NAIVE_BAYES 13-JUN-07 3 OC_SH_CLUS_SAMPLE CLUSTERING O_CLUSTER 13-JUN-07 14 NMF_SH_SAMPLE FEATURE_EXTRACTION NONNEGATIVE_MATRIX_FACTOR 13-JUN-07 2 SVMC_SH_CLAS_SAMPLE CLASSIFICATION SUPPORT_VECTOR_MACHINES 13-JUN-07 4 GLMR_SH_REGR_SAMPLE REGRESSION GENERALIZED_LINEAR_MODEL 13-JUN-07 3 GLMC_SH_CLAS_SAMPLE CLASSIFICATION GENERALIZED_LINEAR_MODEL 13-JUN-07 3 SVMR_SH_REGR_SAMPLE REGRESSION SUPPORT_VECTOR_MACHINES 13-JUN-07 7 SVMO_SH_CLAS_SAMPLE CLASSIFICATION SUPPORT_VECTOR_MACHINES 13-JUN-07 3 T_SVM_CLAS_SAMPLE CLASSIFICATION SUPPORT_VECTOR_MACHINES 13-JUN-07 8 T_NMF_SAMPLE FEATURE_EXTRACTION NONNEGATIVE_MATRIX_FACTOR 13-JUN-07 7
You need the CREATE MINING MODEL
privilege to create models in your own schema. You can perform any operation on models that you own. This includes applying the model, adding a cost matrix, renaming the model, and dropping the model.
You can perform specific operations on mining models in other schemas if you have the appropriate system privileges. For example, CREATE ANY MINING MODEL
enables you to create models in other schemas. SELECT ANY MINING MODEL
enables you to apply models that reside in other schemas. You can add comments to models if you have the COMMENT ANY MINING MODEL
privilege.
See Also:
Oracle Data Mining Administrator's Guide for detailsThe models listed in Example 5-3 are created by the Oracle Data Mining sample programs provided with Oracle Database. The sample programs, in PL/SQL and in Java, create mining models that illustrate each of the algorithms supported by Oracle Data Mining.
The sample programs are installed using Oracle Database Companion. Once installed, you can locate them in the rdbms/demo
subdirectory under Oracle Home. You can list the sample PL/SQL data mining programs on a Linux system with commands like these.
> cd $ORACLE_HOME/rdbms/demo > ls dm*.sql
Likewise, you can list the sample Java data mining programs with commands like the following:
> cd $ORACLE_HOME/rdbms/demo > ls dm*.java
See Also:
Oracle Data Mining Administrator's Guide to learn how to install, configure, and execute the Data Mining sample programs.