Oracle® R Enterprise User's Guide Release 1.3 for Linux and Windows Part Number E36761-04 |
|
|
PDF · Mobi · ePub |
This chapter describes these topics:
Both R scripts and SQL scripts allow access to the database server. For this reason, creation of scripts must be controlled. The RQADMIN Role is required for those users who create and drop scripts.
Oracle R Enterprise creates the RQADMIN role.
The RQADMIN role must be explicitly granted to a user.
The RQADMIN role is required in these instances:
Calling ore.doEval()
with FUN
argument
Creating and dropping scripts with ore.scriptCreate
and ore.scriptDrop
The RQADMIN role is not required when calling ore.rowApply
, ore.groupApply
, ore.tableApply
, ore.indexApply
, and ore.doEval
with the FUN.NAME
argument. I
To grant RQADMIN to RQUSER, start SQL*Plus as sysdba
and type
grant rqadmin to RQUSER
Note:
You should grant RQADMIN only to those users who need it.Parallel processing is not restricted to Oracle R Enterprise functions only; it can be enabled for Open Source R packages that are not part of Oracle R Enterprise. For such packages data-parallelism can be leveraged through Oracle R Enterprise embedded R execution.
On the R side, Oracle R Enterprise provides ore.groupApply()
, ore.rowApply(), and ore.indexApply() for data-parallel processing. Data-parallel processing consists of dividing a dataset into multiple subsets that can be processed in parallel (independently). Oracle R Enterprise also provides SQL-equivalent functionality for group apply and row apply as described in Oracle R Enterprise Embedded SQL Scripts.
Open Source packages (CRAN packages) can generally not leverage the Oracle R Enterprise transparency layer (because they are not written using base R exclusively or include callouts to functionality such as C functions) and execute on data in the R address space. This means that their use is subject to memory and parallelism constraints of R and the way the CRAN package was written. Oracle R Enterprise does not automatically parallelize the internal code of CRAN packages.
Embedded R execution enables leveraging what is likely a larger server (a Database server, such as Oracle Exadata) in terms of memory and number of processors to expand what a typical R client may be able to achieve. In addition, embedded R execution provides for more efficient transfer of data between the database and the R engine (since they are on the same machine). Embedded R execution also allows for data parallel execution of user R functions that may leverage CRAN packages, both from Oracle R Enterprise R and SQL APIs.
These Oracle R Enterprise functions permit R-based applications to embed Oracle R Enterprise functionality in the scripts. For example, they allow R scripts to perform operations on database objects.
An R script contains a single function definition. R scripts reside in the Oracle R Enterprise in-database R script archive.
Embedded R scripts provide several advantages:
You can execute R scripts in the database where the data resides; you do not have to move data out of the database. The scripts may contain custom techniques or include functions from CRAN packages.
You can run existing R scripts within R-based applications and operational SQL-based applications.
You can leverage distributed data flow parallelism in the Oracle database; the parallelism is user controlled, but Database managed.
You can also make use of the security provided by the Oracle Database. See Security Issues for Embedded R Scripts for information about how to register scripts so that they are secure.
Oracle R Enterprise provides these functions that support running R scripts in the database:
There are example scripts in ore.doEval() and ore.indexApply().
All of these scripts require an argument FUN
or FUN.NAME
. For security reasons, use of the argument FUN
requires the RQADMIN role, a collection of Oracle Database privileges. Since creation of the script represented by argument FUN.NAME
must be published by a user with RQADMIN credentials, it can be used by anyone authorized to use Oracle R Enterprise.
The functions ore.tableApply()
, ore.groupApply()
, ore.rowApply(),and
ore.indexApply()
and ore.doEval()
all take either a FUN.NAME
parameter (for a function that has been loaded into the R script repository) or FUN
, which is an R function (closure).
All functions can return anything. However, when you specify the FUN.VALUE argument the output should be a matching data.frame
.
All functions take parameters that are passed as optional arguments (... arguments
). They can named or not.
All functions take the FUN.NAME
parameter, which is the name of a function in the R script repository in the database, or an actual R function in the FUN
parameter.
ore.doEval()
invokes a stand-alone R script in the database without input data; parameters are allowed. It returns an ore.frame
object or serialized R objects.
Input for ore.doEval()
is internally generated data. You can load data from a file or a table using ore.pull()
.
Input data is one of the following:
Internally generated
Loaded from a file or pulled from the database by using ore.pull()
Made available through the Transparency Layer
ore.doEval()
takes the FUN.NAME
parameter, which is the name of a function in the R script repository in the database, or an actual R function in the FUN
parameter.
The following additional arguments to the FUN
parameter starting with ore.
are special control arguments. They are not passed to the function specified by the FUN
or FUN.NAME
arguments, but instead control what happens before or after the execution of the closure. The following control arguments are supported:
ore.drop
controls the input data. If TRUE, one column data.frame
will be converted to a vector. The default value is TRUE.
ore.connect
controls whether to automatically connect to Oracle R Enterprise inside the closure. This is equivalent to doing an ore.connect
call with the same credentials as the client session. The default value is FALSE.
ore.graphics
controls whether to start a graphical driver and look for images. The default value is TRUE.
ore.png.*
specifies additional parameters for the png
graphics driver if ore.graphics
is TRUE. The naming convention for these arguments is to add an ore.png. prefix
to the arguments of the png
function. For example, if ore.png.height
is supplied, argument height will be passed to the png
function. If not set, the standard default values for the png
function are used.
This example scales the first n integers by value provided. The result is a serialized R object (data.frame
):
Oracle R Enterprise comes with a number of predefined graphical scripts. All predefined scripts have a reserved name that start with RQG$
followed by a function name from the graphics package that the script wraps. Depending on the function it either takes the first, the first and second or all columns of the input data.frame. Thus, predefined scripts can only be used with ore.tableApply
, ore.groupApply
, or ore.rowApply
. Each function also has ...
so that it can pass any parameter to the function that it wraps.
res <- ore.doEval(function (num = 10, scale = 100) { ID <- seq(num) data.frame(ID = ID, RES = ID / scale) }) class(res) res local_res <- ore.pull(res) class(local_res) local_res
For more examples, see the R help for ore.doEval()
.
ore.tableApply()
invokes an R script with an entire table (ore.frame
) as input. The input is provided all at once to the function. As with ore.doEval()
, it can return an ore.frame
object or serialized R objects.
Input data is an ore.frame
object.
Returns a Data Frame signature as an ore.frame
object.
Takes NULL or <variable>=<value>
as argument.
ore.groupApply()
partitions the data according to a specified column's values and invokes the R script on each partition in parallel, when possible. The return value is a list of each group's execution results.
Input data is an ore.frame
object.
Returns either a NULL value as an ore.object
or a Data Frame signature as an ore.frame
object.
Takes NULL or <variable>=<value>
as argument.
You must specify the partition column for ore.groupApply()
.
Takes the FUN.NAME
parameter, which is the name of a function in the R script repository in the database, or an actual R function in the FUN
parameter
ore.rowApply()
enables you to specify a chunk size, the number of rows that the function should act upon. The function is invoked multiple times in parallel, if multiple R engines can be invoked at the database server, until all data is processed. The return value is a list of each chunk's execution results.
Input data is an ore.frame
object.
Returns either a NULL value as an ore.object
or a Data Frame signature as an ore.frame
object.
Takes NULL or <variable>=<value>
as argument.
You can specify the chunk size for ore.rowApply()
.
Takes the FUN.NAME
parameter, which is the name of a function in the R script repository in the database, or an actual R function in the FUN
parameter
ore.indexApply()
invokes an R script n times, where n is a positive integer. The return value is a list of each execution's results.
Input data is one of the following:
Internally generated
Loaded from a file or pulled from the database by using ore.pull()
Made available through the Transparency Layer
ore.indexApply()
can take NULL or <variable>=<value>
as arguments.
You must specify n, the number of times to invoke the R function.
Takes the FUN.NAME
parameter, which is the name of a function in the R script repository in the database, or an actual R function in the FUN
parameter
For example, this code applies the function 10 times:
res<-ore.indexApply(10,function (x, scale = 100) x / scale)
ore.scriptCreate()
creates an R script in the database. The script can be used by name in other embedded R script functions.
Note:
ore.scriptCreate()
requires the RQADMIN role.ore.scriptCreate()
has this syntax:
ore.scriptCreate(name, FUN)
where
name
is a character string specifying the name of the R script in the Oracle Database.
FUN
is a function definition to be used with functions ore.doEval(,
ore.groupApply()
, ore.indexApply(), ore.rowApply()
, or ore.tableApply()
.
The function returns an invisible NULL value if it succeeds; if it does not succeed in creating the script, it returns an error.
This example creates a script and then drops it:
ore.scriptCreate("MYLM",function(data, formula, ...) lm(formula, data, ...)) IRIS <- ore.push(iris) ore.tableApply(IRIS[1:4], FUN.NAME = "MYLM" formula = Sepal.Length ~ .) ore.scriptDrop("MYLM")
ore.scriptDrop()
drops a named R script from the database repository. Requires the RQADMIN role.
Note:
ore.scriptDrop()
requires the RQADMIN role.ore.scriptDrop()
has this syntax:
ore.scriptDrop(name)
where
name
is a character string specifying the name of the R script in the Oracle Database.
The function returns an invisible NULL value if it succeeds; if it does not succeed in dropping the script, it returns an error.
For an example, see ore.scriptCreate() Example.
This feature automatically enables database connectivity inside embedded R scripts.
Auto Connect provides this functionality
Embedded R scripts are automatically connected to the database.
The automatic connection has the same credentials as the session that invokes the embedded R SQL functions.
The script runs in an autonomous transaction.
ROracle queries work with the automatic connection.
Oracle R Enterprise transparency is enabled in the embedded script.
User and site-wide R profile loading is disabled in embedded R.
Profile loading was supported in earlier Oracle R Enterprise releases. Auto Connect provides a more secure connection.
Auto-connect is off by default. It can be controlled by ore.connect
control argument. Control arguments are documented in R help for ore.doEval()
.
To enable Auto Connect, ROracle was extended by adding a new driver ExtDriver
with constructor Extproc
that is initialized by passing an external pointer wrapping extproc
context. Similarly to OraDriver, ExtDriver is a singleton. Both drivers can exist simultaneously in a session since these are represented by two distinct singletons. This setup allows working with extproc
and explicit OraDriver
connections in the same R script as shown by the following example.
ore.doEval(function() { ore.disconnect() con1 <- dbConnect(Extproc()) res1 <- dbGetQuery(con1, "select * from grade order by name") con2 <- dbConnect(Oracle(), "scott", "tiger") res2 <- dbGetQuery(con2, "select * from emp order by empno") dbDisconnect(con1) dbDisconnect(con2) cbind(head(res1)[,1:3], head(res2)[,1:3]) } }, ore.connect = TRUE)
For a detailed example of an embedded R script, see the Oracle R Enterprise Blog "Introduction to ORE Embedded R Script Execution" at https://blogs.oracle.com/R/entry/analyzing_big_data_using_the1
.
Part 6: ORE Embedded R Scripts: R Interface in the free Oracle R Enterprise Tutorial Series describes embedded R scripts and contains several examples. See Oracle R Enterprise Training for information about the Tutorial Series.
Several of the Oracle R Enterprise Demos illustrated embedded execution.
The SQL interface allows you to embed R script execution in production database applications.
The functions associated with SQL interface must be stored in the database R repository, and referenced by name in SQL API functions. See Registering and Managing SQL Scripts for a description of how to add scripts to the repository, remove scripts from the repository, and list and use scripts in the repository.
For descriptions of the SQL functions, see Oracle R Enterprise SQL Functions.
For security purposes, you must first register the R script under some system unique name and use new name instead of the actual script in the call to rq*Eval
table functions.
There are two administrative functions that create and drop scripts and a view that lists scripts:
sys.rqScriptCreate()
sys.rqScriptDrop()
view allows you to list and use scripts that were created
The scripts require the RQADMIN role described in RQADMIN Role.
When using sys.rqScriptCreate()
, you must specify a corresponding R Closure of the function string.
Here is an example of registering the scripts and using the registered scripts:
begin sys.rqScriptCreate('tmrqfun2', 'function() { ID <- 1:10 res <- data.frame(ID = ID, RES = ID / 100) res }'); end; / select * from table(rqEval( NULL, 'select 1 id, 1 res from dual', 'tmrqfun2')); begin sys.rqScriptDrop('tmrqfun2'); end; /
The rq*Eval
functions result in one or more new R engines being started at the database depending on database parallelism settings.
To enable execution of an R script in the database (lights-out processing), Oracle R Enterprise provides variants of ore.doEval()
, ore.tableApply()
, ore.groupApply()
, and ore.rowApply()
in SQL. (ore.doEval()
, ore.tableApply()
, ore.groupApply()
, and ore.rowApply()
are described inR Interface for Embedded Oracle R Enterprise Scripts.
The SQL functions are
rqTableEval()
rqEval()
rqRowEval()
rqGroupEval()
rqGroupEval()
requires additional SQL specification and is provided here as a virtual function, which partitions the data according to a specified column's values and invokes the R script on each partition. For more information, see rqGroupEval() Function.
You can also use these functions with objects in a datastore, as described in rq*Eval() and Objects in a Datastore
The rq*:Eval()
functions (rqEval()
, rqTableEval()
, rqGroupEval()
, and rqRowEval()
) have similar syntax:
rq*Eval( cursor(select * from table-1), cursor(select * from table-2), 'select <column list> from table-3 t', <grouping col-name from table-1 or num_rows>, <R closure name of registered-R-code> )
where
The first cursor is the input cursor: Input is passed as a whole table, group, or N rows at a time to the R closure described in the fourth parameter.
rqEval()
does not have this cursor argument.
The second cursor is the parameters cursor: One row of scalar values (string and/or numeric) can be passed; for example, the name of the model and several numeric scalar values for model setting.
The query specifies the output table definition; output can be 'SELECT statement', 'XML', or 'PNG'.
grouping col-name
applies to rqGroupEval()
; it provides the name of the grouping column
num_rows
applies to rqRowEval()
; it provides then number of rows to provide to the functions at one time.
<R closure name of registered-R-code> is a registered version of the R function to execute. See Registering and Managing SQL Scripts for details.
The return values for all of the SQL functions specify one of these values:
A table signature that is specified in a SELECT statement, which returns results as a table from the rq
function
XML, returned as a CLOB which returns both structured and graph images in an XML string. The structured components are provided first, followed by the base 64 encoding of the png
representation of the image.
PNG, returned as a BLOB which returns graph images in png format.
rqEval(),
rqTableEval()
, rqGroupEval()
, and rqRowEval()
must specify an R script by the name that is stored in the R script repository. See Registering and Managing SQL Scripts for information about the sys.rq_scripts
view provides a list of registered scripts.
The following examples illustrate using these functions:
This example uses all rows from the table fish
as input to the R function that takes no other parameters and produces output
that contains all input data plus the ROWSUM of values.
Note that parameters (param
) to the R function is optional.
begin sys.rqScriptCreate('tmrqfun2', 'function(x, param) { dat <- data.frame(x, stringsAsFactors=F) cbind(dat, ROWSUM = apply(dat,1,sum)+10) }'); end; / select * from table(rqTableEval( cursor(select * from fish), NULL, 'select t.*, 1 rowsum from fish t', 'tmrqfun2' )); begin sys.rqScriptDrop('tmrqfun2'); end; /
This example illustrates passing n=1 (4th parameter) row at a time from the table fish
to the R function. No parameters are required by the function. The function generates ROWSUM which is added as an extra column to fish
in the output.
begin sys.rqScriptCreate('tmrqfun2', 'function(x, param) { dat <- data.frame(x, stringsAsFactors=F) cbind(dat, ROWSUM = apply(dat,1,sum)+10) }'); end; / select * from table(rqRowEval( cursor(select * from fish), NULL, 'select t.*, 1 rowsum from fish t', 1, 'tmrqfun2' )); begin sys.rqScriptDrop('tmrqfun2'); end; /
rqGroupEval()
invokes an R script on data that is partitioned by a grouping column:
rqGroupEval()
requires the creation of two PL/SQL objects, a package and a pipelined table function:
First create a PL/SQL package that specifies the types of result to be returned.
Next create a function that takes the return value of the package and uses the return value with PIPELINED_PARALLEL_ENABLE set to indicate the column on which to partition data.
Suppose that ONTIME_S is a table that stores information about arrival of airplanes. The data cursor uses all data, but you could also define cursors that use some columns using PL/SQL records. Then you must define as many PL?SQL table functions as the number of grouping columns that you are interested in using for a particular data cursor.
CREATE PACKAGE ontimePkg AS TYPE cur IS REF CURSOR RETURN ontime_s%ROWTYPE; END ontimePkg; / CREATE FUNCTION ontimeGroupEval( inp_cur ontimePkg.cur, par_cur SYS_REFCURSOR, out_qry VARCHAR2, grp_col VARCHAR2, exp_txt CLOB) RETURN SYS.AnyDataSet PIPELINED PARALLEL_ENABLE (PARTITION inp_cur BY HASH (month)) CLUSTER inp_cur BY (month) USING rqGroupEvalImpl; /
At this time, only one grouping column is supported. If you have multiple columns combine the columns into one column and use the new column as a grouping column. PARALLEL_ENABLE clause is optional but CLUSTER BY is not.
rq*Eval()
and related functions allow you to use serialized R objects saved in a datastore using a parameter cursor. You can specify the association of object and datastore names of the serialized R objects with the R function parameter names in that parameter cursor.
Here is an example of how to use rq*Eval()
this way. Suppose that user scott
has saved a model in the datastore ontime_model
as the object lm.mod
,. Suppose scott
wants to use this model in SQL for embedded Oracle R Enterprise scoring. This code shows how to use the model for embedded scoring. See Embedded Execution Auto Connect for R Scripts for the configuration parameters for ore.connect()
.
begin sys.rqScriptCreate('tmrqmodelscore', 'function(dat, in.dsname, in.objname) { ore.load(name=in.dsname, list=in.objname) mod <- get(in.objname) prd <- predict(mod, newdata=dat) prd[as.integer(rownames(prd))] <- prd res <- cbind(dat, PRED = prd) res }'); end; / -- score model select * from table(rqTableEval( cursor(select ARRDELAY, DISTANCE, DEPDELAY from ontime_s where year = 2003 and month = 5 and dayofmonth = 2), cursor(select 'ontime_model' as "in.dsname", 'lm.mod' as "in.objname", 1 as "ore.connect" from dual), 'select ARRDELAY, DISTANCE, DEPDELAY, 1 PRED from ontime_s', 'tmrqmodelscore')) order by 1, 2, 3;
Oracle R Enterprise provides basic management for datastores in SQL. Basic datastore management includes show, search, and drop. The following functions and views are provided:
rqDropDataStore()
deletes a datastore and all of the objects in the datastore.
Syntax: rqDropDataStore('<ds_name>')
, where <ds_name>
is the name of the datastore to delete.
The following example deletes the datastore ds_model from current user schema:
rqDropDataStore('ds_model')
rquser_DataStoreList
is a view containing datastore-level information for all datastores in the current user schema. The information consists of datastore name, number of objects, size, creation date, and description.
These examples illustrate using the view:
select * from rquser_DataStoreList select dsname, nobj, size from rquser_datastorelist where dsname = 'ds_1'
rquser_DataStoreContents
is a view containing object-level information about all datastores in the current user schema. The information consists of object name, size, class, length, number of rows and columns.
This example lists the datastore contents for datastore ds_1
:
select * from rquser_DataStoreContents where dsname = 'ds_1';