| Oracle® R Enterprise User's Guide Release 1.3 for Linux and Windows Part Number E36761-04 |
|
|
PDF · Mobi · ePub |
This chapter explains how to use Oracle R Enterprise to analyze data stored in tables or views in an Oracle Database.
This chapter discusses these topics:
We assume familiarity with R in the remainder of this section.
For additional examples of using Oracle R Enterprise functionality, see Oracle R Enterprise Statistical Functions. For examples of building statistical models, including models created using Oracle Data Mining algorithm, see In-Database Predictive Models in Oracle R Enterprise.
Before you can use Oracle R Enterprise to analyze data stored in database tables, you must install Oracle R Enterprise, start a client, and connect to the database, as described in Oracle R Enterprise Administrator's Guide.
By convention, most of the functions and methods defined in Oracle R Enterprise begin with the prefix ore. This is done to avoid name collisions with other R software. However, the objects created by those functions and methods can be anything the end user wants them to be. The end user has complete control over object naming.
Pick any object returned by ore.ls() and type either class(OBJECTNAME) or class(OBJECTNAME$COLUMN_NAME).
For an example, see Example: Load Data. This example loads a data frame into the database, creating the table DF_TABLE. DF_TABLE has class ore.frame:
R> class(DF_TABLE) [1] "ore.frame"
The prefix ore indicates that the object is an Oracle R Enterprise created object that holds metadata for the corresponding object in Oracle Database.
ore.frame is the Oracle R Enterprise metadata object that maps to a database table. The ore.frame object is the counterpart to an R data.frame.
ore.frame or can be returned by the class() function. For an example of creating ore.frame data, see Load an R Data Frame into the Database.
Use this command to view the Oracle R Enterprise documentation library:
R> OREShowDoc()
Oracle R Enterprise supports this functionality:
Oracle R Enterprise handles R naming conventions for ore.frame columns, instead of a more restrictive Database names. ore.frame column names can be longer than 30 bytes, contain double quotes, and be non-unique.
Follow these steps to load data from R data frames on your system to the Oracle Database:
Load contents of the file to an R data frame using read.table() or read.csv() functions documented in R online help.
Then use ore.create()to load a data frame to a table:
ore.create(data_frame, table="TABLE_NAME")
loads data_frame into the database table TABLE_NAME.
For an example, see Example: Load Data.
This example creates an R data frame df consisting of pairs of numbers and letters and then loads the data frame into the Oracle table DF_TABLE. The example shows that the data frame and the table have the same dimensions and the same first few elements, but different values for class. The class for DF_TABLE is ore.frame. At the end of the example is a check that DF_TABLE exists in the current schema.
R> df <- data.frame(A=1:26, B=letters[1:26])
R> dim(df)
[1] 26 2
R> class(df)
[1] "data.frame"
R> head(df)
A B
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e
6 6 f
R> ore.create(df, table="DF_TABLE")
R> ore.ls()
[1] "DF_TABLE"
R> class(DF_TABLE)
[1] "ore.frame"
attr(,"package")
[1] "OREbase"
R> dim(DF_TABLE)
[1] 26 2
R> head(DF_TABLE)
A B
0 1 a
1 2 b
2 3 c
3 4 d
4 5 e
5 6 f
R> exists("DF_TABLE")[1] TRUE
If you connect to the database using a tool such as SQL Developer, you can view DF_TABLE directly in the database.
ore.push(data.frame) stores an R object in the database as a temporary object, and returns a handle to that object. It converts data frame, matrix, and vector to a table, and list, model, and others to a serialized object.
The object that you create exists during the R session; to store the data in a permanent way, see Persist and Manage R Objects in the Database
This example pushes the numerical vector created by the R command c(1,2,3,4,5) to v, an Oracle R Enterprise object:
R> v <- ore.push(c(1,2,3,4,5)) R> class(v) [1] "ore.numeric" attr(,"package") [1] "OREbase" R> head(v) [1] 1 2 3 4 5
ore.exists() checks for the existence of an ore.frame object in the ORE schema environment. For ore.exists()to find an ore.frame object the object must have been synchronized with ore.sync() first.
The objects available in the ORE environment are not necessarily the same as the database objects. One should not use ore.exists() to check for table existence.
For an example, see Example: Load Data.
ore.exists(name, schema)has these arguments:
name: A character string specifying the name of the ore.frame object
schema: A character string specifying the name of database schema to check
ore.exists() returns TRUE if the object exists in the ORE schema and FALSE, if it does not exist.
To drop a table in the database use
ore.drop(table="NAMEOFTABLE")
For example, these commands drop the table v and verifies that it does not exist:
R> ore.drop(table="v")
R> ore.exists("v")
[1] FALSE
If you drop a table that does not exist, there is no error message.
To pull the contents of an Oracle Database table or view into an in-memory R data frame use ore.pull(OBJECT_NAME)for the name of an object returned by ore.ls().
Note:
You can pull a table or view to an R frame only if the data can fit into R's memory.Suppose that your Oracle Database contains the table NARROW. Then ore.pull() creates the data frame df_narrow from the table NARROW. When you verify that df_narrow is a data frame. The warning message appears because the table NARROW is not indexed:
R> df_narrow <- ore.pull(NARROW) Warning message: ORE object has no unique key - using random order R> class(df_narrow) [1] "data.frame"
Almost all data in R is a vector or is based on vectors (vectors themselves, lists, matrices, data frames, and so forth). The elements of a vector have an explicit order. Each element has an index. R code actively uses this order of elements.
However, database-backed relational data (tables and views) does not define any order of rows and thus cannot be directly mapped to R data structures. You can define an explicit order on database tables and views via an ORDER BY clause. The order is usually achieved by having a unique identifier (single- or multi- column key). Ordering in this way can be inefficient and slow for some operations that lead to unnecessary sorting.
row.names<- defines ordering but doesn't actually index a table. The assignment option provides a way to specify a unique column. Initially it supports at least one column but may support multi-column specifications as well. When row.names<- is applied to unordered frames, it returns an error.
You can use the integer indexing created by the ordering infrastructure to perform sampling and partitioning, as described in Sampling and Partitioning.
Suppose that the table NARROW is not indexed. The following example illustrates using row.names to create an indexed table:
R> row.names(head(NARROW))
Error: ORE object has no unique key
In addition: Warning message:
ORE object has no unique key - using random order
R>
R> row.names(NARROW) <- NARROW$ID
R>
R> row.names(head(NARROW[,1:3]))
[1] "101501" "101502" "101503" "101504" "101505" "101506"
R>
R> head(NARROW[,1:3])
ID GENDER AGE
101501 101501 <NA> 41
101502 101502 <NA> 27
101503 101503 <NA> 20
101504 101504 <NA> 45
101505 101505 <NA> 34
101506 101506 <NA> 38
The ordering (indexing) for tables described in Order in Tables can be used to perform sampling and partitioning.
This section provides examples of
R supports powerful constructions using vectors as indices. Oracle R Enterprise supports similar functionality with these differences:
Integer indexing is not supported for ore.vector objects.
Negative integer indexes are not supported.
Row order is not preserved.
This example illustrates indexing:
R> tmp <- ASTHMA
R> tmp[c(1L, 2L, 1L),]
Error: ORE object has no unique key
R> rownames(tmp) <- tmp
R> tmp[c(1L, 2L, 1L),]
CITY ASTHMA COUNT
1|0|65 1 0 65
1|0|65.1 1 0 65
1|1|35 1 1 35
R> tmp[c(1L, 2L, 1L),]@dataQry
This code illustrates several sampling techniques:
# Generate random data
set.seed(123)
N <- 1000000
mydata <- data.frame(x = rnorm(N, mean = 20, sd = 2),
group =
sample(letters, N, replace = TRUE,
prob = (26:1)/sum(26:1)))
mydata$y <-
rbinom(N, 1,
1/(1+exp(-(.5 - 0.25 * mydata$x + .1 * as.integer(mydata$group)))))
MYDATA <- ore.push(mydata)
rm(mydata)
# Create a function that creates random row indices from large tables
mysampler <- function(n, size, replace = FALSE)
{
#' Random Whole Number Sampler
#' @param n number of observations in sample
#' @param size total number of observations
#' @param replace indicator for sampling with replacement
#' @return numeric vector containing the sample indices
n <- round(n)
size <- round(size)
if (n < 0) stop("'n' must be a non-negative number")
if (size < 1) stop("'size' must be a positive number")
if (!replace && (n > size))
stop("'n' cannot exceed 'size' when 'replace = FALSE'")
if (n == 0)
numeric()
else if (replace)
round(runif(n, min = 0.5, max = size + 0.5))
else
{
maxsamp <- seq(size + 0.5, by = -1, length.out = n)
samp <- round(runif(n, min = 0.5, max = maxsamp))
while(length(bump1 <- which(duplicated(samp))))
samp[bump1] <- samp[bump1] + 1
samp
}
}
# Data set and sample size
N <- nrow(MYDATA)
sampleSize <- 500
# 1. Simple random sampling
srs <- mysampler(sampleSize, N)
simpleRandomSample <- ore.pull(MYDATA[srs, , drop = FALSE])
# 2. Systematic sampling
systematic <- round(seq(1, N, length.out = sampleSize))
systematicSample <- ore.pull(MYDATA[systematic, , drop = FALSE])
# 3. Stratified sampling
stratifiedSample <-
do.call(rbind,
lapply(split(MYDATA, MYDATA$group),
function(y)
{
ny <- nrow(y)
y[mysampler(sampleSize * ny/N, ny), , drop = FALSE]
}))
# 4. Cluster sampling
clusterSample <- do.call(rbind, sample(split(MYDATA, MYDATA$group), 2))
# 5a. Accidental/Convenience sampling (via row order access)
convenientSample1 <- head(MYDATA, sampleSize)
# 5b. Accidental/Convenience sampling (via hashing)
maxHash <- 2^32 # maximum allowed in ore.hash
convenient2 <- (ore.hash(rownames(MYDATA), maxHash)/maxHash) <= (sampleSize/N)
convenientSample2 <- ore.pull(MYDATA[convenient2, , drop = FALSE])
Random
For Oracle R Enterprise random partitions can be generated in the transparency layer by adding a partition or group column to an ore.frame object in the following manner:
nrowX <- nrow(x) x$partition <- sample(rep(1:k, each = nrowX/k, length.out = nrowX), replace = TRUE)
After these partitions have been joined to the original data set, the ore.groupApply function can be used to perform the little bootstraps:
results <- ore.groupApply(x, x$partition, function(y) {...}, parallel = TRUE)
R objects exist for the duration of the current session, unless they are explicitly saved. For example, if you build a model in a particular R session, the model is not available when the session is closed, unless the model was explicitly saved.
Oracle R Enterprise supports persistence for R objects onto the database.
Persistence provides these advantages:
You can access the same R and Oracle R Enterprise object (for example, a model) among different R sessions.
You can build a model in R and use it for prediction and scoring in embedded Oracle R Enterprise.
Oracle R Enterprise creates datastores to contain persisted objects.
Persisted objects reside in a datastore. The following Oracle R Enterprise functionality allows you manage persistence:
ore.save() saves an R object or a list of R objects to the specified datastore in the connected database in the current user's schema:
ore.save({...}, list = character(0), name, envir = parent.frame(), overwrite = FALSE, append = FALSE, description = character(0)))
The parameters for ore.save() are as follows:
{...} is the list of R objects to save; the names of the objects to be saved (as symbols or character strings)
list is a character vector containing the names of objects to be saved
envir is the environment to search for objects to be saved
overwrite is a logical value specifying whether to overwrite the datastore if already exists; the default is FALSE (do not overwrite)
name is the name of the datastore; name must be specified
description is a comment describing the datastore
append is a logical value specifying whether to append objects to the datastore if already exists; the default is FALSE (do not append)
Save all objects in the current workspace environment to the datastore ds_1 in the user's current schema:
ore.save(list=ls(), name="ds_1", description = "example datastore")
Overwrite existing datastore ds_2 with objects x, y, and z in the current workspace environment:
ore.save(x, y, z, name="ds_2", overwrite=TRUE)
Add objects x, y, and z in the current workspace environment to the existing datastore ds_3 (that is append the objects to the datastore):
ore.save(x, y, z, name="ds_3", append=TRUE)
ore.load() loads all of the R objects stored in a specified datastore in the current user schema in the connected database to R:
ore.load(name, list = character(0), envir = parent.frame())
The parameters for ore.load() are
name is a character string specifying the name of datastore to load the objects from; you must specify a name
list is a character vector containing the names of objects to be loaded
envir is the R environment that objects are loaded to
ore.load() returns a character vector containing the names of objects loaded from the datastore.
ore.delete() deletes the specified datastore (and all of the R objects in it) from the current user schema in the connected database:
ore.delete(name)
The parameter for ore.delete() is
name is a character string specifying the name of datastore to delete; you must specify a name
Use ore.datastore() to list the datastores that exist in the user's Oracle Database schema.
ore.datastore() lists the datastores and basic information about each datastore in the current schema:
ore.datastore(name, pattern)
The parameters for ore.datastore() are
name is a character string specifying the name of datastore to list
pattern is a regular expression character string specifying the names of the datastores to list.
ore.datastore() lists information about the datastore with name specified in name or information about the datastores whose names match the regular expression specified in pattern.
If neither name nor pattern is provided, ore.datastore() returns information about all datastores in user's schema.
Either name or pattern can be specified but not both.
ore.datastore() returns a data.frame object with these columns:
datastore.name name of the datastore
object.count number of objects in the datastore identified by datastore.name
size size of the datastore in bytes
creation.date date of datastore creation
description comment for datastore (comment is specified in the description parameter of ore.save)
Each row of the data.frame lists one datastore. Rows are sorted by column datastore.name in alphabetical order.
ore.datastoreSummary() returns a data.frame that lists the names and summary information for the R objects saved in the specified datastore in the schema in the connected database:
ore.datastoreSummary(name)
The parameter for ore.datastoreSummary() is
name is a character string specifying the name of datastore to summarize; you must specify a name
If the specified datastore does not exist, an error is returned.
ore.datastoreSummary() returns a data.frame object with these columns:
object.name is the name of the R object
class.name is the class name of the R object
size is the size of the R object in bytes
length is the length of the R object
row.count is the number of rows for the R object
col.count is number of columns of the R object
Each row of the data.frame lists one R object. Rows are sorted by column datastore.name in alphabetical order.
The following examples illustrate using R with Oracle R Enterprise data types:
Simple column and row selection in R:
# Push built-in R data set iris to database
R> ore.create(iris, table="IRIS")
R> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
R> iris_projected = IRIS[, c("PETAL_LENGTH", "SPECIES")]
R> head (iris_projected)
PETAL_LENGTH SPECIES
0 1.4 setosa
1 1.4 setosa
2 1.3 setosa
3 1.5 setosa
4 1.4 setosa
5 1.7 setosa
df1 <- data.frame(x1=1:5, y1=letters[1:5]) df2 <- data.frame(x2=5:1, y2=letters[11:15]) merge (df1, df2, by.x="x1", by.y="x2") x1 y1 y2 1 1 a o 2 2 b n 3 3 c m 4 4 d l 5 5 e k # Create database objects to correspond to in-memory R objects df1 and df2 ore.df1 <- ore.create(df1, table="DF1") ore.df2 <- ore.create(df2, table="DF2") # Compare results R> merge (DF1, DF2, by.x="X1", by.y="X2") X1 Y1 Y2 0 1 a o 1 2 b n 2 3 c m 3 4 d l 4 5 e k
# Push built-in data set iris to database ore.create(iris, table="IRIS") aggdata <- aggregate(IRIS, by = list(IRIS$SPECIES), FUN = summary) class(aggdata) head(aggdata)
Data formatting and creating derived columns in R
Note that adding derived columns does not change the database table. See Derived Columns in Oracle R Enterprise.
diverted_fmt <- function (x) {
ifelse(x==0, 'Not Diverted',
ifelse(x==1, 'Diverted',''))
}
cancellationCode_fmt <- function(x) {
ifelse(x=='A', 'A CODE',
ifelse(x=='B', 'B CODE',
ifelse(x=='C', 'C CODE',
ifelse(x=='D', 'D CODE', 'NOT CANCELLED'))))
}
delayCategory_fmt <- function(x) {
ifelse(x>200,'LARGE',
ifelse(x>=30,'MEDIUM','SMALL'))
}
zscore <- function(x) {
(x-mean(x,na.rm=TRUE))/sd(x,na.rm=TRUE)
# ONTIME_S is a database table
ONTIME_S$DIVERTED <- diverted_fmt(DIVERTED)
ONTIME_S$CANCELLATIONCODE <- cancellationCode_fmt(CANCELLATIONCODE)
ONTIME_S$ARRDELAY <- delayCategory_fmt(ARRDELAY)
ONTIME_S$DEPDELAY <- delayCategory_fmt(DEPDELAY)
ONTIME_S$DISTANCE_ZSCORE <- zscore(DISTANCE)
When you add derived columns using Oracle R Enterprise, the derived columns do not affect the underlying table in the database. A SQL query is generated that has the additional derived columns in the select list, but the table is not changed.
This example illustrates using Oracle R Enterprise with a standard R package downloaded from CRAN:
This example illustrates building a regression model using a CRAN package. You can prepare the data used for training in the database (filtering out observations that are not of interest, selecting attributes, imputing missing values, and so forth). Suppose that the preprocessed data is in the table ONTIME_S_PREPROCESSED_SUBSET. Then pull the prepared training set (which is usually small enough to fit in desktop R memory) into the R client to execute the model build.
You can use the resulting model to score (predict) large numbers of rows, in parallel, in Oracle Database. The data are stored in ONTIME_S_FINAL_DATA_TO_BE_SCORED.
Note that scoring is a trivially parallelizable operation because one row can be scored independent of and in parallel with another row. The model built on the desktop is shipped to the database to perform scoring on vast numbers of rows in the database.
The computations are divided into these steps:
dat <- ore.pull(ONTIME_S_PREPROCESSED_SUBSET) mod <- glm(ARRDELAY ~ DISTANCE + DEPDELAY, dat) mod summary(mod)
Score in-parallel in the database using embedded R:
prd <- predict(mod, newdata=ONTIME_S_FINAL_DATA_TO_BE_SCORED) class(prd) # Add predictions as a new column res <- cbind(newdat, PRED = prd) head(res)
R provides many other ways to build regression models, such as lm().
For other ways to build regression models, see Oracle R Enterprise Versions of R Models and In-Database Predictive Models in Oracle R Enterprise.
The embedded R engine in the Oracle Database allows R users to off load desktop calculations that may require either more resources such as those available to Oracle Database or database-driven data parallelism. The embedded R engine also executes R scripts embedded in SQL or PL/SQL programs (lights-out processing).
These examples illustrate using Oracle R Enterprise embedded R engine with standard R packages downloaded from CRAN:
This example illustrates off loading R computation to execute in the embedded R engine. To off load an R computation, simply include the R code within a closure (that is, function() {}) and invoke ore.doEval(). ore.doEval() schedules execution of the R code with the database-embedded R engine and returns the results back to the desktop for continued analysis:
library(biglm)
mod <- ore.doEval(
function() {
library(biglm)
dat <- ore.pull(ONTIME_S)
mod <- biglm(ARRDELAY ~ DISTANCE + DEPDELAY, dat)
mod
}, ore.connect = TRUE);
print(mod)
mod=ore.pull(mod)
print(mod)
This example illustrates database-driven data parallelism at work in building a series of regression models using a CRAN package. One model is built per unique value of a factor. The database orchestrates the parallel and concurrent building of the models, one per factor and brings the list of all models built to the user desktop for further analysis:
modList <- ore.groupApply(
# Organize input to the R script – This is always an Oracle R Enterprise
# data frame
X=ONTIME_S,
# Specify the grouping column. Here we request one model per unique value of
# ONTIME_S$DEST
INDEX=ONTIME_S$DEST,
# Model building code goes inside the closure. Input and grouping
# conditions can be referenced as parameters to the function
function(x) {
library(biglm)
biglm(ARRDELAY ~ DISTANCE + DEPDELAY, x)
});
modList_local <- ore.pull(modList)
# Print the model for just one destination - BOSTON
summary(modList_local$BOS)
Oracle R Enterprise is shipped with a collection of demos, examples that illustrate how to use Oracle R Enterprise. These examples are a collection of self-contained R scripts.
Most of the sample programs use the data frame iris, which is included in the R distribution. iris is loaded into a table as described in Load Data Frame to a Table.
The rest of this section describes two examples in detail and includes a list of all of the examples:
Start R, load the ORE packages via library(ORE), and then connect to the database.
Follow these steps to load an R data frame to a database table:
This example uses the R data set iris.
The iris data set is located in the datasets package that is part of the R distribution:
R> find("iris")
[1] "package:datasets"
Use the R command class to verify that iris is an R data frame:
R> class(iris) [1] "data.frame"
iris consist of measurements of parts of iris flowers. Use the R command head to see a small sample of the data in iris.
R> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
Now load the data frame iris into the database that you are connected to.
Suppose that the database table version of iris is named IRIS_TABLE. Drop IRIS_TABLE to make sure that no table of this name exists in the connected schema:
ore.drop(table = "IRIS_TABLE")
If IRIS_TABLE doesn't exist, you do not get a message.
Now create a database table with the data contained in iris:
ore.create(iris, table = "IRIS_TABLE")
Use ore.ls() to verify that the table was created:
R> ore.ls() [1] "IRIS_TABLE" "NARROW" "ONTIME_S"
IRIS_TABLE is a database-resident table with just metadata on the R side:
R> class(IRIS_TABLE) [1] "ore.frame" attr(,"package") [1] "OREbase"
Use head to see the column names and the first few values in IRIS_TABLE:
R> head(IRIS_TABLE) SEPAL_LENGTH SEPAL_WIDTH PETAL_LENGTH PETAL_WIDTH SPECIES 0 5.1 3.5 1.4 0.2 setosa 1 4.9 3.0 1.4 0.2 setosa 2 4.7 3.2 1.3 0.2 setosa 3 4.6 3.1 1.5 0.2 setosa 4 5.0 3.6 1.4 0.2 setosa 5 5.4 3.9 1.7 0.4 setosa
Use class to see the data type of the column SPECIES.
R > class(IRIS_TABLE$SPECIES) [1] "raw"
You can use R functions to analyze the data in the table. Here are some simple examples taken from the example basic.R:
Use unique to get a list of the unique entries in a column. This example finds the unique SPECIES:
R> unique(IRIS_TABLE$SPECIES) [1] setosa versicolor virginica Levels: setosa versicolor virginica
Find the minimum, maximum, and mean of PETAL_LENGTH:
R> min(IRIS_TABLE$PETAL_LENGTH) [1] 1 R> max(IRIS_TABLE$PETAL_LENGTH) [1] 6.9 R> mean(IRIS_TABLE$PETAL_LENGTH) [1] 3.758
If you need information about an R function, use the command help(function-name).
Oracle Database has logical values: TRUE, FALSE, NULL. There is a 3x3 table that defines truth values for propositions with AND and OR. NULL is treated as unknown value. For some operations the result is either deterministic (for example TRUE OR NULL) or unknown (TRUE AND NULL). If logical values are used in a WHERE clause, only rows with the condition TRUE are selected; FALSE and NULL are ignored. R, on the other hand, keeps TRUE and NA. Rows with NA are selected with value NA.
The option ore.na.extract controls if NAs are selected or not. The default is to treat NA as SQL treats FALSE.
The demo nulls.R is the only sample that does not use iris as data. nulls.R compares the handling of NULLs in SQL with the handling of NAs in R.
In R, NA is a logical constant of length 1 which contains a missing value indicator. In the database, null refers to the absence of a value in a column of a row. Nulls indicate missing, unknown, or inapplicable data.
Follow these steps to understand the demo nulls.R:
This demo uses the data frame airquality. Verify that the data set is a data frame and look at the few rows of the data frame:
R> class(airquality) [1] "data.frame" R> head(airquality) Ozone Solar.R Wind Temp Month Day 1 41 190 7.4 67 5 1 2 36 118 8.0 72 5 2 3 12 149 12.6 74 5 3 4 18 313 11.5 62 5 4 5 NA NA 14.3 56 5 5 6 28 NA 14.9 66 5 6
Load airquality into the database as "AIRQUALITY":
ore.drop(table = "AIRQUALITY") ore.create(airquality, table = "AIRQUALITY")
Use ore.ls() to verify that the table was created. If you wish, use class(AIRQUALITY) to verify that AIRQUALITY is a database-resident table with just metadata on the R side.
Examine how R handles NAs. Return all observations where ozone < 30 is TRUE:
R> nrow(airquality[airquality$Ozone < 30,])[1] 92
Compare this with the results when NAs are explicitly excluded:
R> nrow(airquality[airquality$Ozone < 30 & !is.na(airquality$Ozone),]) [1] 55
The default behavior for SQL tables is to exclude NULLS in output:
nrow(AIRQUALITY[AIRQUALITY$OZONE < 30,]) [1] 55
To handle NULLs the same way that R handles NA, request the behavior explicitly:
options(ore.na.extract = TRUE) nrow(AIRQUALITY[AIRQUALITY$OZONE < 30,]) [1] 92
These scripts have been added as demos to the ORE package.
To list all of the demos included with Oracle R Enterprise, type
R> demo(package = "ORE")
To run one of these scripts, specify the name of the demo in a demo function call. For example, to run aggregate.R, type
R> demo("aggregate", package = "ORE")
These demos are shipped with Oracle R Enterprise:
aggregate Aggregation analysis Basic analysis & data processing operations basic Basic connectivity to database binning Binning logic columnfns Column functions cor Correlation matrix crosstab Frequency cross tabulations datastore DataStore operations datetime Date/Time operations derived Handling of derived columns distributions Distribution, density, and quantile functions do_eval Embedded R processing freqanalysis Frequency cross tabulations graphics Demonstrates visual analysis group_apply Embedded R processing by group hypothesis Hyphothesis testing functions matrix Matrix related operations nulls Handling of NULL in SQL vs. NA in R odm_ai Oracle Data Mining: attribute importance odm_dt Oracle Data Mining: decision trees odm_glm Oracle Data Mining: generalized linear models odm_kmeans Oracle Data Mining: enhanced k-means clustering odm_nb Oracle Data Mining: naive Bayes classification odm_svm Oracle Data Mining: support vector machines push_pull RDBMS <-> R data transfer rank Attributed-based ranking of observations reg Ordinary least squares linear regression row_apply Embedded R processing by row chunks sampling Random row sampling and partitioning of an ore.frame sql_like Mapping of R to SQL commands stepwise Stepwise OLS linear regression summary Summary functionality table_apply Embedded R processing of entire table