Oracle® Data Mining Concepts 11g Release 2 (11.2) Part Number E16808-06 |
|
|
PDF · Mobi · ePub |
This section describes new features in Oracle Data Mining. It includes the following sections:
The Oracle Data Mining Java API is deprecated in this release.
Note:
Oracle recommends that you not use deprecated features in new applications. Support for deprecated features is for backward compatibility onlyOracle Data Mining supports a new release of Oracle Data Miner. The earlier release, Oracle Data Miner Classic, is still available for download on OTN, but it is no longer under active development.
To download Oracle Data Miner 11g Release 2 (11.2.0.2), go to:
http://www.oracle.com/technetwork/database/options/odm/dataminerworkflow-168677.html
To download Oracle Data Miner Classic, go to:
http://www.oracle.com/technetwork/database/options/odm/downloads/odminer-097463.html
In Oracle Data Mining 11g Release 2 (11.2.0.2), you can import externally-created GLM models when they are presented as valid PMML documents. PMML is an XML-based standard for representing data mining models.
The IMPORT_MODEL
procedure in the DBMS_DATA_MINING
package is overloaded with syntax that supports PMML import. When invoked with this syntax, the IMPORT_MODEL
procedure will accept a PMML document and translate the information into an Oracle Data Mining model. This includes creating and populating model tables as well as SYS
model metadata.
External models imported in this way will be automatically enabled for Exadata scoring offload.
See Also:
Oracle Database PL/SQL Packages and Types Reference for details about DBMS_DATA_MINING.IMPORT_MODEL
In Oracle 11g, Data Mining models are implemented as data dictionary objects in the SYS
schema. A set of new data dictionary views present mining models and their properties. New system and object privileges control access to mining model objects.
In previous releases, Data Mining models were implemented as a collection of tables and metadata within the DMSYS
schema. In Oracle 11g, the DMSYS
schema no longer exists.
See Also:
Oracle Data Mining Administrator's Guide for information on privileges for accessing mining models
Oracle Data Mining Application Developer's Guide for information on Oracle Data Mining data dictionary views
Automatic Data Preparation (ADP)
In most cases, data must be transformed using techniques such as binning, normalization, or missing value treatment before it can be mined. Data for build, test, and apply must undergo the exact same transformations.
In previous releases, data transformation was the responsibility of the user. In Oracle Database 11g, the data preparation process can be automated. Algorithm-appropriate transformation instructions are embedded in the model and automatically applied to the build data and scoring data. The automatic transformations can be complemented by or replaced with user-specified transformations.
Because they contain the instructions for their own data preparation, mining models are known as supermodels.
See Also:
Chapter 19 for information on automatic and custom data transformation for Data Mining
Oracle Database PL/SQL Packages and Types Reference for information on DBMS_DATA_MINING_TRANSFORM
Scoping of Nested Data and Enhanced Handling of Sparse Data
Oracle Data Mining supports nested data types for both categorical and numerical data. Multi-record case data must be transformed to nested columns for mining.
In Oracle Data Mining 10gR2, nested columns were processed as top-level attributes; the user was burdened with the task of ensuring that two nested columns did not contain an attribute with the same name. In Oracle Data Mining 11g, nested attributes are scoped with the column name, which relieves the user of this burden.
Handling of sparse data and missing values has been standardized across algorithms in Oracle Data Mining 11g. Data is sparse when a high percentage of the cells are empty but all the values are assumed to be known. This is the case in market basket data. When some cells are empty, and their values are not known, they are assumed to be missing at random. Oracle Data Mining assumes that missing data in a nested column is a sparse representation, and missing data in a non-nested column is assumed to be missing at random.
In Oracle Data Mining 11g, Decision Tree and O-Cluster algorithms do not support nested data.
Generalized Linear Models
A new algorithm, Generalized Linear Models, is introduced in Oracle 11g. It supports two mining functions: classification (logistic regression) and regression (linear regression).
See Also:
Chapter 12, "Generalized Linear Models"New SQL Data Mining Function
A new SQL Data Mining function, PREDICTION_BOUNDS
, has been introduced for use with Generalized Linear Models. PREDICTION_BOUNDS
returns the confidence bounds on predicted values (regression models) or predicted probabilities (classification).
Enhanced Support for Cost-Sensitive Decision Making
Cost matrix support is significantly enhanced in Oracle 11g. A cost matrix can be added or removed from any classification model using the new procedures, DBMS_DATA_MINING.ADD_COST_MATRIX
and DBMS_DATA_MINING.REMOVE_COST_MATRIX
.
The SQL Data Mining functions support new syntax for specifying an in-line cost matrix. With this new feature, cost-sensitive model results can be returned within a SQL statement even if the model does not have an associated cost matrix for scoring.
Only Decision Tree models can be built with a cost matrix.
Features Not Available in 11g Release 1 (11.1)
DMSYS
schema
Oracle Data Mining Scoring Engine
In Oracle 10.2, you could use Database Configuration Assistant (DBCA) to configure the Data Mining option. In Oracle 11g, you do not need to use DBCA to configure the Data Mining option.
Basic Local Alignment Search Tool (BLAST)
Features Deprecated in 11g Release 1 (11.1)
Adaptive Bayes Network classification algorithm (replaced with Decision Tree)
DM_USER_MODELS
view and functions that provide information about models, model signature, and model settings (for example, GET_MODEL_SETTINGS
, GET_DEFAULT_SETTINGS
, and GET_MODEL_SIGNATURE
) are replaced by data dictionary views. See Oracle Data Mining Application Developer's Guide.