PK
-Aoa, mimetypeapplication/epub+zipPK -A iTunesMetadata.plistV
This section describes new features in the Oracle Data Mining APIs. It includes the following sections:
Oracle Data Mining 11g Release 2 (11.2.0.3) API New Features
Oracle Data Mining 11g Release 2 (11.2.0.1) API New Features
See Also: Oracle Data Mining Concepts for additional information about new features and for information about features that were new or deprecated in Oracle Data Mining 11g Release 1 |
The Oracle Data Mining Java API is deprecated in this release.
Note: Oracle recommends that you not use deprecated features in new applications. Support for deprecated features is for backward compatibility only |
This section lists the changes that have been introduced in the Oracle Data Mining 11.2 PL/SQL API:
Support for Native Transactional Data with Association Rules
In Oracle Data Mining 11g Release 2 (11.2), you can build association rules models without first transforming the transactional data.
SVM Class Weights Specified with CLAS_WEIGHTS_TABLE_NAME
Previously SVM class weights were specified in the priors table (CLAS_PRIORS_TABLE_NAME
setting). Now SVM class weights and GLM class weights are both specified in a class weights table (CLAS_WEIGHTS_TABLE_NAME
setting)
See: DBMS_DATA_MINING setting CLAS_WEIGHTS_TABLE_NAME in Oracle Database PL/SQL Packages and Types Reference |
FORCE argument to DROP_MODEL
You can now force a drop model operation even if a serious system error has interrupted the model build process.
GET_MODEL_DETAILS_SVM has new REVERSE_COEF parameter
To preserve model transparency, the GET_MODEL_DETAILS
functions automatically reverse the transformations generated by ADP during the model build. You can obtain the transformed attribute coefficients used internally by an SVM model by setting the new reverse_coef
parameter to 1. This causes the coefficients and bias to be returned with the normalization shifts and scales applied by ADP.
Application Developer's Guide
11g Release 2 (11.2)
E12218-07
July 2011
Oracle Data Mining Application Developer's Guide, 11g Release 2 (11.2)
E12218-07
Copyright © 2005, 2011, Oracle and/or its affiliates. All rights reserved.
Primary Author: Kathy L. Taylor
This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited.
The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing.
If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, the following notice is applicable:
U.S. GOVERNMENT RIGHTS Programs, software, databases, and related documentation and technical data delivered to U.S. Government customers are "commercial computer software" or "commercial technical data" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, the use, duplication, disclosure, modification, and adaptation shall be subject to the restrictions and license terms set forth in the applicable Government contract, and, to the extent applicable by the terms of the Government contract, the additional rights set forth in FAR 52.227-19, Commercial Computer Software License (December 2007). Oracle America, Inc., 500 Oracle Parkway, Redwood City, CA 94065.
This software or hardware is developed for general use in a variety of information management applications. It is not developed or intended for use in any inherently dangerous applications, including applications that may create a risk of personal injury. If you use this software or hardware in dangerous applications, then you shall be responsible to take all appropriate fail-safe, backup, redundancy, and other measures to ensure its safe use. Oracle Corporation and its affiliates disclaim any liability for any damages caused by use of this software or hardware in dangerous applications.
Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. AMD, Opteron, the AMD logo, and the AMD Opteron logo are trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registered trademark of The Open Group.
This software or hardware and documentation may provide access to or information on content, products, and services from third parties. Oracle Corporation and its affiliates are not responsible for and expressly disclaim all warranties of any kind with respect to third-party content, products, and services. Oracle Corporation and its affiliates will not be responsible for any loss, costs, or damages incurred due to your access to or use of third-party content, products, or services.
Oracle Data Mining supports the mining of data sets that have one or more text columns. These columns must undergo a special preprocessing step whereby text tokens known as terms are extracted and stored in a nested column. The transformed text can then be used as any other attribute in the building, testing, and scoring of models. Any algorithm that supports nested data can be used for text mining. (See Table 3-2 in Chapter 3.)
This chapter explains how to use PL/SQL to prepare a column of text for mining.
Note: Oracle Data Mining includes sample programs that illustrate text transformation and text mining using the PL/SQL API. Refer to Oracle Data Mining Administrator's Guide for information about the Oracle Data Mining sample programs. |
This chapter contains the following sections.
Oracle Data Mining uses specialized Oracle Text routines to preprocess text data. Oracle Text is a technology within the database for building text querying and classification applications.
Oracle Text provides the following facilities for the Oracle Data Mining term extraction process:
SVM_CLASSIFIER
, defined in the CTX_DLL
Oracle Text PL/SQL package, specifies an index preference for Oracle Data Mining term extraction.
The CTXSYS.DRVODM
Oracle Text PL/SQL package defines the table functions, FEATURE_PREP
and FEATURE_EXPLAIN
, which generate intermediate and final tables of text terms for Oracle Data Mining.
In PL/SQL, the term extraction process requires the use of these Oracle Text facilities. (See "Using Text Transformation".)
See Also: Oracle Text Application Developer's Guide and Oracle Text Reference for information on Oracle Text |
Note: The Oracle Text facilities for Oracle Data Mining are documented in this chapter. They are not documented in the Oracle Text manuals. |
Note on Terminology: Text terms are also known as features. In text mining, a feature is a word or group of words extracted from a text document. Features (terms) are the fundamental unit of text that can be manipulated and analyzed.The feature extraction mining function (NMF algorithm in Oracle Data Mining) and text mining transformation both perform a kind of feature extraction.
|
A good place to start in learning the term extraction process is with the sample programs. The following sample programs contain term extraction code for text mining:
dmsh.sql
— Prepares the build, test, and scoring data for the sample programs, including the text mining programs. dmsh.sql
creates views for data mining and tables and indexes for text mining.
dmtxtfe.sql
— Using the indexed text column created by dmsh.sql
, creates a nested column suitable for text mining.
The dmtxtfe.sql
program is a sample term extractor. It contains extensive comments that explain the code in a step-by-step fashion. You can expand this program into a complete term extraction solution by adding index creation and the preparation of test and scoring data (as in dmsh.sql
).
Two sample PL/SQL programs use the data prepared by dmsh.sql
to mine text.
dmtxtnmf.sql
creates and applies a model that uses Non-Negative Matrix Factorization.
dmtxtsvm.sql
creates and applies a model that uses SVM classification.
Both these programs mine a table of customer data that includes a nested column of text data called COMMENTS
. The COMMENTS
column has been pre-processed by dmsh.sql
. The models created by these programs are referenced in the following example from a Linux system.
-- Run the programs to create the models SQL> @ $ORACLE_HOME/rdbms/demo/dmtxtnmf.sql SQL> @ $ORACLE_HOME/rdbms/demo/dmtxtsvm.sql -- List the models created by the programs SQL> SELECT model_name, mining_function, algorithm FROM user_mining_models; MODEL_NAME MINING_FUNCTION ALGORITHM ------------------------ --------------------------- ------------------------ T_SVM_CLAS_SAMPLE CLASSIFICATION SUPPORT_VECTOR_MACHINES T_NMF_SAMPLE FEATURE_EXTRACTION NONNEGATIVE_MATRIX_FACTOR
See Also: Oracle Data Mining Administrator's Guide. This manual provides instructions for obtaining and running the sample programs. It includes information about the build, training, and scoring data used by these programs. |
The pre-processing steps for text mining create nested table columns of type DM_NESTED_NUMERICALS
from columns of type VARCHAR2
or CLOB
. Each row of the nested table specifies an attribute name and a value. The DM_NESTED_NUMERICALS
type defines the following columns.
attribute_name VARCHAR2(4000) value NUMBER)
The term extraction process treats the text in each row of the original table as a separate document. Each document is transformed to a set of terms that have a numeric value and a text label. Within the nested table column, the attribute_name
column holds the text and the value
column holds the numeric value of the term, which is derived using the term frequency in the document and in the document collection (other rows).
For example, the following query returns various attributes of customer 102998, including a text column of comments. The text column has not been transformed.
SQL> select cust_id, cust_gender, cust_income_level, affinity_card, comments from mining_build_text where cust_id = 102998; CUST_ID C CUST_INCOME_LEVEL AFFINITY_CARD COMMENTS ------- -- -------------------- ------------- -------------------------------- 102998 M J: 190,000 - 249,999 1 I wanted to write you to let you know that I've purchased several items at your store recently and have been very satisfied with my purchases. Keep up the good work.
The following query returns the same attributes of customer 102998, but the text in the comments column has been transformed. The query extracts the ATTRIBUTE_NAME
and VALUE
columns from the nested table that holds the transformed text.
SQL> select b.cust_id, b.cust_gender, b.cust_income_level, b.affinity_card, n.* from mining_build_nested_text b, table(b.comments) n where b.cust_id = 102998 order by n.attribute_name; CUST_ID C CUST_INCOME_LEVEL AFFINITY_CARD ATTRIBUTE_NAME VALUE ------- -- ------------------- ------------- -------------- -------- 102998 M J: 190,000 - 249,999 1 GOOD .26894 102998 M J: 190,000 - 249,999 1 ITEMS 158062 102998 M J: 190,000 - 249,999 1 KEEP 238765 102998 M J: 190,000 - 249,999 1 KNOW .2006 102998 M J: 190,000 - 249,999 1 LET 299856 102998 M J: 190,000 - 249,999 1 PURCHASED 142743 102998 M J: 190,000 - 249,999 1 PURCHASES 173146 102998 M J: 190,000 - 249,999 1 RECENTLY .195223 102998 M J: 190,000 - 249,999 1 SATISFIED .355851 102998 M J: 190,000 - 249,999 1 SEVERAL .355851 102998 M J: 190,000 - 249,999 1 STORE .0712537 102998 M J: 190,000 - 249,999 1 UP .159838 102998 M J: 190,000 - 249,999 1 WANTED .355851 102998 M J: 190,000 - 249,999 1 WORK .299856 102998 M J: 190,000 - 249,999 1 WRITE .355851
The ATTRIBUTE_NAME
column holds an item of text from the original comments column. The VALUE
column holds the term value. Note that not all words from the original comments column are extracted as terms. For example, the articles the
and to
are not included.
The steps in the term extraction process are summarized in this section. Further details and specific syntax requirements are explained later in this chapter.
First transform the text in the build data. During this process you will generate the text term definitions, which you will reuse for the test and apply data. Perform the following steps:
Create an index on the text column in the build table.
Define a table to hold the categories specified by the SVM_CLASSIFIER
index.
Use the FEATURE_PREP
table function to create the term definitions and populate an intermediate terms table.
Use the FEATURE_EXPLAIN
table function to populate the final terms table.
Replicate the columns of the original build table (using a view or another table), replacing the text column with a nested table column. Load the terms from the final terms table into the nested table column.
The test and apply data must undergo the same pre-processing as the build data. To transform the test and apply data, you will reuse the term definitions generated for the build data. Perform the following steps:
Create an index on the text column in the test or apply table.
Use the FEATURE_PREP
table function to populate an intermediate terms table. Use the term definitions previously generated for the build data.
Use the FEATURE_EXPLAIN
table function to populate the final terms table.
Replicate the columns of the original test or apply table, replacing the text column with a nested table column. Load the terms from the final terms table into the nested table column.
Oracle Text processing requires a text index. Oracle Text supports several types of indexes for querying, cataloging, and classifying text documents. The Oracle Data Mining term extraction process requires a CONTEXT
index for text querying.
You must create an index for each text column to be transformed. Use the following syntax to create the index.
SQL>CREATE INDEXindex_name
ONtable_name
(column_name
) INDEXTYPE IS ctxsys.context PARAMETERS ('nopopulate');
Note: This statement creates a basicCONTEXT index. You can further define the characteristics of the index by specifying additional arguments to the CREATE INDEX statement. Refer to Oracle Text Reference for details. |
Oracle Text supports index preferences for overriding the default characteristics of an index. The CREATE_PREFERENCE
procedure in the Oracle Text package CTX_DDL
creates a preference with the name and type that you specify. The SVM_CLASSIFIER
preference type defines the characteristics of an index for Oracle Data Mining.
You must create an index preference when you prepare the build data. It will be reused when you prepare the test and apply data. Use the following syntax to create the index preference.
SQL>EXECUTE ctx_ddl.create_preference('preference_name
', 'SVM_CLASSIFIER');
The SVM_CLASSIFIER
index preference uses a predefined table with two numeric columns: ID
and CAT
. ID
holds the case ID; CAT
holds the category. The category table is used for internal processing. You must create the category table using the following syntax.
SQL>CREATE TABLE category_table_name
(id NUMBER, cat NUMBER);
The FEATURE_PREP
table function in the CTXSYS.DRVODM
Oracle Text package extracts terms from a text column using an index preference of type SVM_CLASSIFIER
. FEATURE_PREP
creates a table of term definitions from the build data and reuses these definitions for the test and apply data.
FEATURE_PREP
returns an intermediate terms table.
FEATURE_PREP
is an over-loaded function that accepts two different sets of arguments. You will specify one set of arguments for the build data and another set for the test and apply data.
--- syntax for build data --- CTXSYS.DRVODM.FEATURE_PREP ( text_index IN VARCHAR2, case_id IN VARCHAR2, category_tbl IN VARCHAR2, category_tbl_id_col IN VARCHAR2, category_tbl_cat_col IN VARCHAR2, feature_definition_tbl IN VARCHAR2, index_preference IN VARCHAR2) RETURN DRVODM; --- syntax for test/apply data --- CTXSYS.DRVODM.FEATURE_PREP ( text_index IN VARCHAR2, case_id IN VARCHAR2, feature_definition_tbl IN VARCHAR2, RETURN DRVODM;
FEATURE_PREP
returns the following columns. The SEQUENCE_ID
column holds the case ID; the ATTRIBUTE_ID
column holds the term ID.
Name NULL? Type ---------------------- ------- ------ SEQUENCE_ID NUMBER ATTRIBUTE_ID NUMBER VALUE NUMBER
FEATURE_PREP
accepts the arguments described in Table 4-1.
Table 4-1 FEATURE_PREP Table Function Arguments
Argument Name | Data Type | |
---|---|---|
|
|
Name of the index on the text column in the build, test, or apply table. |
|
|
Name of the case ID column in the build, test, or apply table. |
|
|
Name of the table used by the Specify this argument only for build data. |
|
|
Specify 'id'. This is the name of the Specify this argument only for build data. |
|
|
Specify 'cat'. This is the name of the Specify this argument only for build data. |
|
|
Name of the term definition table created by Name Null? Type --------------------------------- CAT_ID NUMBER TYPE NUMBER RULE BLOB |
|
|
Name of the Specify this argument only for build data. |
The following example creates an intermediate terms table called txt_term_out
. The FEATURE_PREP
table function extracts terms from a text column with an index called build_text_idx
. The text column is in a build table with a case ID column called cust_id
. The index preference txt_pref
is applied to the index using the id
and cat
columns in the table cat_tbl
. FEATURE_PREP
creates a table of term definitions called txt_pref_terms
.
CREATE TABLE txt_term_out AS SELECT * FROM TABLE(ctxsys.drvodm.feature_prep ( 'build_text_idx', 'cust_id', 'cat_tbl', 'id', 'cat', 'txt_pref_terms', 'txt_pref'));
The FEATURE_EXPLAIN
table function in the CTXSYS.DRVODM
Oracle Text package extracts the term values from the definitions created by FEATURE_PREP
and appends the associated word to each value.
FEATURE_EXPLAIN
returns the final terms table.
The calling syntax of FEATURE_EXPLAIN
is described as follows.
CTXSYS.DRVODM.FEATURE_EXPLAIN ( feature_definition_tbl IN VARCHAR2, RETURN DRVODM;
FEATURE_EXPLAIN
returns the following columns.
Name Type --------------- --------------- text VARCHAR2(160) type NUMBER(3) ID NUMBER score NUMBER
FEATURE_EXPLAIN
accepts a single argument: the terms definition table created by FEATURE_PREP
.
The following example creates a final terms table called txt_final_terms
using the intermediate terms table txt_term_out
. The FEATURE_EXPLAIN
table function returns the terms specified in the terms definition table txt_pref_terms
.
SQL> create table txt_final_terms as select A.sequence_id, B.text, A.value FROM txt_term_out A, TABLE(ctxsys.drvodm.feature_explain( 'txt_pref_terms')) B WHERE A.attribute_id = B.id;
Use the final terms table to populate a nested table column of type DM_NESTED_NUMERICALS
.
The following example creates the table mining_build_nested_text
. (Alternatively, you could create a view.) The table has a case ID column of customer IDs and three customer attribute columns: age, education, and occupation. It also includes a comments column of type DM_NESTED_NUMERICALS
created from the terms table txt_final_terms
.
SQL> CREATE TABLE mining_build_nested_text NESTED TABLE comments store AS build_comments AS SELECT non_text.cust_id, non_text.age, non_text.education, non_text.occupation, txt.comments FROM mining_build_text non_text, ( SELECT features.sequence_id, cast(COLLECT(dm_nested_numerical(features.text,features.value)) as dm_nested_numericals) comments FROM txt_final_terms features group by features.sequence_id) txt WHERE non_text.cust_id = txt.sequence_id(+);
In the following example, a text column in MINING_BUILD_TEXT
is transformed to a nested table column in MINING_BUILD_NESTED_TEXT
. The same text column in MINING_APPLY_TEXT
is transformed to a nested table column in MINING_APPLY_NESTED_TEXT
.
Both MINING_BUILD_TEXT
and MINING_APPLY_TEXT
have the following columns.
Name Null? Type --------------------------------- -------- --------------------------- CUST_ID NOT NULL NUMBER AGE NUMBER EDUCATION VARCHAR2(21) OCCUPATION VARCHAR2(21) COMMENTS VARCHAR2(4000)
The following statements create the indexes.
SQL> create index build_text_idx on mining_build_text (comments) indextype is ctxsys.context paramyeters ('nopopulate'); SQL> create index apply_text_idx ON mining_apply_text (comments) indextype is ctxsys.context parameters ('nopopulate');
The following statements create the index preference and its table.
SQL> execute ctx_ddl.create_preference('idx_pref', 'SVM_CLASSIFIER'); SQL> create table idx_pref_cat (id number, cat number);
The following statement returns the intermediate terms in the table BUILD_TERMS_OUT
. It also creates the table FEATURE_DEFS
and populates it with the term definitions.
SQL> create table build_terms_out as select * from table (ctxsys.drvodm.feature_prep ('build_text_idx', 'cust_id', 'idx_pref_cat', 'id', 'cat', 'feature_defs', 'idx_pref'));
The following statement returns the final terms in the table BUILD_EXPLAIN_OUT
.
SQL> create table build_explain_out as select a.sequence_id, b.text, a.value from build_terms_out a, table (ctxsys.drvodm.feature_explain('feature_defs')) b where a.attribute_id = b.id;
The following statement creates the table MINING_BUILD_NESTED_TEXT
. This table contains the non-text attributes from the original build table and a nested table of comments. This table can be used to build a model.
SQL> create table mining_build_nested_text nested table comments store as build_comments as select non_text.cust_id, non_text.age, non_text.education, non_text.occupation, txt.comments from mining_build_text non_text, (select features.sequence_id, cast(collect(dm_nested_numerical(features.text,features.value)) as dm_nested_numericals) comments from build_explain_out features group by features.sequence_id) txt where non_text.cust_id = txt.sequence_id(+);
The following statement creates the intermediate terms table for the comments column in the apply table, MINING_APPLY_TEXT
. It uses the term definitions in the FEATURE_DEFS
table, which was created during the pre-processing of the comments column in MINING_BUILD_TEXT
.
SQL> create table apply_terms_out as select * from table (ctxsys.drvodm.feature_prep ('apply_text_idx', 'cust_id', 'feature_defs'));
The following statement creates the final terms table for apply.
SQL> create table apply_explain_out as select a.sequence_id, b.text, a.value from apply_terms_out a, table (ctxsys.drvodm.feature_explain('feature_defs')) b where a.attribute_id = b.id;
The following statement creates the table MINING_APPLY_NESTED_TEXT
. This table contains the non-text attributes from the original apply table and a nested table of comments. This table can be used to apply the model.
SQL> create table mining_apply_nested_text nested table comments store as apply_comments as select non_text.cust_id, non_text.age, non_text.education, non_text.occupation, txt.comments from mining_apply_text non_text, (select features.sequence_id, cast(collect(dm_nested_numerical(features.text,features.value)) as dm_nested_numericals) comments from apply_explain_out features group by features.sequence_id) txt where non_text.cust_id = txt.sequence_id(+);
This manual describes the programmatic interfaces to Oracle Data Mining. You can use the PL/SQL and Java interfaces to create data mining applications or add data mining features to existing applications. You can use the data mining SQL operators in applications or in ad hoc queries.
Note: The Oracle Data Mining Java API is deprecated in this release.Oracle recommends that you not use deprecated features in new applications. Support for deprecated features is for backward compatibility only |
This manual should be used along with the demo applications and the related reference documentation. (See "Related Documentation".)
The preface contains these topics:
This manual is intended for database programmers who are familiar with Oracle Data Mining.
For information about Oracle's commitment to accessibility, visit the Oracle Accessibility Program website at http://www.oracle.com/pls/topic/lookup?ctx=acc&id=docacc
.
Access to Oracle Support
Oracle customers have access to electronic support through My Oracle Support. For information, visit http://www.oracle.com/pls/topic/lookup?ctx=acc&id=info
or visit http://www.oracle.com/pls/topic/lookup?ctx=acc&id=trs
if you are hearing impaired.
The documentation set for Oracle Data Mining is part of the Oracle Database 11g Release 2 (11.2) Online Documentation Library. The Oracle Data Mining documentation set consists of the following documents:
Oracle Data Mining Java API Reference (javadoc)
Oracle Data Mining Administrator's Guide
Note: Information to assist you in installing and using the Data Mining demo programs is provided in Oracle Data Mining Administrator's Guide. |
Oracle Database PL/SQL Packages and Types Reference (DBMS_DATA_MINING
, DBMS_DATA_MINING_TRANSFORM
, and DBMS_PREDICTIVE_ANALYTICS
)
Oracle Database SQL Language Reference (Data Mining functions)
The following text conventions are used in this document:
Convention | Meaning |
---|---|
boldface | Boldface type indicates graphical user interface elements associated with an action, or terms defined in text or the glossary. |
italic | Italic type indicates book titles, emphasis, or placeholder variables for which you supply particular values. |
monospace | Monospace type indicates commands within a paragraph, URLs, code in examples, text that appears on the screen, or text that you enter. |