Oracle® Database 2 Day + Real Application Clusters Guide 11g Release 2 (11.2) Part Number E17264-12 |
|
|
PDF · Mobi · ePub |
This chapter describes how to administer your Oracle Clusterware environment. It describes how to administer the voting disks and the Oracle Cluster Registry (OCR).
This chapter contains the following sections:
Oracle Real Application Clusters (Oracle RAC) uses Oracle Clusterware as the infrastructure that binds multiple nodes that then operate as a single server. In an Oracle RAC environment, Oracle Clusterware monitors all Oracle components (such as instances and listeners). If a failure occurs, then Oracle Clusterware automatically attempts to restart the failed component and also redirects operations to a surviving component.
Oracle Clusterware includes a high availability framework for managing any application that runs on your cluster. Oracle Clusterware manages applications to ensure they start when the system starts. Oracle Clusterware also monitors the applications to make sure that they are always available. For example, if an application process fails, then Oracle Clusterware attempts to restart the process based on scripts that you customize. If a node in the cluster fails, then you can program application processes that typically run on the failed node to restart on another node in the cluster.
The voting disk records node membership information. A node must be able to access more than half the voting disks at any time. To avoid simultaneous loss of multiple voting disks, each voting disk should be on a storage device that does not share any components (controller, interconnect, and so on) with the storage devices used for the other voting disks.
For example, if you have five voting disks configured, then a node must be able to access at least three of the voting disks at any time. If a node cannot access the minimum required number of voting disks, then it is evicted, or removed, from the cluster. After the cause of the failure has been corrected and access to the voting disks has been restored, you can instruct Oracle Clusterware to recover the failed node and restore it to the cluster.
Oracle Cluster Registry (OCR) is a file that contains information about the cluster node list and instance-to-node mapping information. OCR also contains information about Oracle Clusterware resource profiles for resources that you have customized. The voting disk data is also backed up in OCR.
Each node in a cluster also has a local copy of the OCR, called an Oracle Local Registry (OLR), that is created when Oracle Clusterware is installed. Multiple processes on each node have simultaneous read and write access to the OLR particular to the node on which they reside, whether Oracle Clusterware is fully functional. By default, OLR is located at Grid_home
/cdata/
$HOSTNAME
.olr
High availability configurations have redundant hardware and software that maintain operations by avoiding single points of failure. When a component is down, Oracle Clusterware redirects its managed resources to a redundant component. However, if a disaster strikes, or a massive hardware failure occurs, then having redundant components might not be enough. To fully protect your system it is important to have backups of your critical files.
The Oracle Clusterware installation process creates the voting disk and the OCR on shared storage. If you select the option for normal redundant copies during the installation process, then Oracle Clusterware automatically maintains redundant copies of these files to prevent the files from becoming single points of failure. The normal redundancy feature also eliminates the need for third-party storage redundancy solutions. When you use normal redundancy, Oracle Clusterware automatically maintains two copies of the OCR file and three copies of the voting disk file.
See Also:
Oracle Clusterware Administration and Deployment Guide for more information about managing voting disksBy default, Oracle Clusterware is configured to restart whenever the server it resides on is restarted. During certain maintenance operations, you may be required to stop or start the Oracle Clusterware stack manually.
Note:
Do not use Oracle Clusterware Control (CRSCTL) commands on Oracle entities (such as resources, resource types, and server pools) that have names beginning withora
unless you are directed to do so by Oracle Support. The Server Control utility (SRVCTL) is the correct utility to use on Oracle entities.You use the CRSCTL utility to manage Oracle Clusterware. If the Oracle High Availability Services daemon (OHASD) is running on all the cluster nodes, then you can start the entire Oracle Clusterware stack (all the processes and resources managed by Oracle Clusterware), on all nodes in the cluster by executing the following command on any node:
crsctl start cluster -all
You can start the Oracle Clusterware stack on specific nodes by using the -n
option followed by a space-delimited list of node names, for example:
crsctl start cluster -n racnode1 racnode4
To use the previous command, the OHASD process must be running on the specified nodes.
To start the entire Oracle Clusterware stack on a node, including the OHASD process, run the following command on that node:
crsctl start crs
To stop Oracle Clusterware on all nodes in the cluster, execute the following command on any node:
crsctl stop cluster -all
The previous command stops the resources managed by Oracle Clusterware, the Oracle ASM instance, and all the Oracle Clusterware processes (except for OHASD and its dependent processes).
To stop Oracle Clusterware and Oracle ASM on select nodes, include the -n
option followed by a space-delimited list of node names, for example:
crsctl stop cluster -n racnode1 racnode3
If you do not include either the -all
or the -n
option in the stop cluster
command, then Oracle Clusterware and its managed resources are stopped only on the node where you execute the command.
To completely shut down the entire Oracle Clusterware stack, including the OHASD process, use the crsctl stop crs
command. CRSCTL attempts to gracefully stop the resources managed by Oracle Clusterware during the shutdown of the Oracle Clusterware stack. If any resources that Oracle Clusterware manages are still running after executing the crsctl stop crs
command, then the command fails. You must then use the -f
option to unconditionally stop all resources and stop the Oracle Clusterware stack, for example:
crsctl stop crs -all -f
Note:
When you shut down the Oracle Clusterware stack, you also shut down the Oracle Automatic Storage Management (Oracle ASM) instances. If the Oracle Clusterware files (voting disk and OCR) are stored in an Oracle ASM disk group, then the only way to shut down the Oracle ASM instances is to shut down the Oracle Clusterware stack.This section describes how to perform the following tasks:
If you choose to store Oracle Clusterware files on Oracle ASM and use redundancy for the disk group, then Oracle ASM automatically maintains the ideal number of voting files based on the redundancy of the diskgroup.
If you use a different form of shared storage to store the voting disks, then you can dynamically add and remove voting disks after installing Oracle RAC. Do this using the following commands where path
is the fully qualified path for the additional voting disk.
To add or remove a voting disk on stored on disk:
Run the following command as the root
user to add a voting disk:
crsctl add css votedisk path
Run the following command as the root
user to remove a voting disk:
crsctl delete css votedisk path
This section contains the following topics:
In Oracle Clusterware 11g Release 2 (11.2), you no longer have to back up the voting disk. The voting disk data is automatically backed up in OCR as part of any configuration change. The voting disk files are backed up automatically by Oracle Clusterware if the contents of the files have changed in the following ways:
Configuration parameters, for example misscount
, have been added or modified
After performing voting disk add
or delete
operations
If a voting disk is damaged, and no longer usable by Oracle Clusterware, then you can replace or re-create the voting disk. You replace a voting disk by deleting the unusable voting disk and then adding a new voting disk to your configuration. The voting disk contents are restored from a backup when a new voting file is added; this occurs regardless of whether the voting disk file is stored in Oracle Automatic Storage Management (Oracle ASM).
To replace a corrupt, damaged, or missing voting disk that is not stored in Oracle ASM:
Use CRSCTL to remove the damaged voting disk. For example, if the damaged voting disk is stored in the disk location /dev/sda3
, then execute the command:
crsctl delete css votedisk /dev/sda3
Use CRSCTL to create a new voting disk in the same location, for example:
crsctl add css votedisk /dev/sda3
If all voting disks are corrupted, then you can restore them as described in Oracle Clusterware Administration and Deployment Guide.
Starting with Oracle Clusterware 11g release 2, you can store the Oracle Clusterware voting disk files in an Oracle ASM disk group. If you choose to store your voting disks in Oracle ASM, then Oracle ASM stores all the voting disks for the cluster in the disk group you choose. You cannot combine voting disks stored in Oracle ASM and voting disks not stored in Oracle ASM in the same cluster.
The number of voting files you can store in a particular Oracle ASM disk group depends upon the redundancy of the disk group. By default, Oracle ASM puts each voting disk in its own failure group within the disk group. A normal redundancy disk group must contain at least two failure groups but if you are storing your voting disks on Oracle ASM, then a normal redundancy disk group must contain at least three failure groups. A high redundancy disk group must contain at least three failure groups.
Once you configure voting disks on Oracle ASM, you can only make changes to the voting disks' configuration using the crsctl replace votedisk
command. This is true even in cases where there are no working voting disks. Despite the fact that the crsctl query css votedisk
command reports zero voting disks in use, Oracle Clusterware remembers the fact that Oracle ASM was in use and the replace
verb is required. Only after you use the replace
verb to move voting disks back to non-Oracle ASM storage are the CRSCTL commands add css votedisk
and delete css votedisk
again usable.
To move voting disks from shared storage to an Oracle ASM disk group:
Use the Oracle ASM Configuration Assistant (ASMCA) to create an Oracle ASM disk group.
Verify that the ASM Compatibility attribute for the disk group is set to 11.2.0.0 or higher.
Use CRSCTL to create a voting disk in the Oracle ASM disk group by specifying the disk group name in the following command:
crsctl replace votedisk +ASM_disk_group
See Also:
Oracle Automatic Storage Management Administrator's Guide for more information about disk group compatibility attributes
Oracle Clusterware Administration and Deployment Guide for more information about managing voting disks
Oracle Clusterware automatically creates OCR backups every four hours. At any one time, Oracle Clusterware always retains the latest three backup copies of the OCR that are four hours old, one day old, and one week old.
You cannot customize the backup frequencies or the number of files that Oracle Clusterware retains. You can use any backup software to copy the automatically generated backup files at least once daily to a different device from where the primary OCR file resides.
This section contains the following topics:
Use the ocrconfig
utility to view the backups generated automatically by Oracle Clusterware.
To find the most recent backup of the OCR:
Run the following command on any node in the cluster:
ocrconfig -showbackup
Use the ocrconfig
utility to force Oracle Clusterware to perform a backup of OCR at any time, rather than wait for the automatic backup that occurs at four-hour intervals. This option is especially useful when you want to obtain a binary backup on demand, such as before you make changes to OCR.
To manually backup the contents of the OCR:
Log in as the root
user.
Use the following command to force Oracle Clusterware to perform an immediate backup of the OCR:
ocrconfig -manualbackup
The date and identifier of the recently generated OCR backup is displayed.
(Optional) If you must change the location for the OCR backup files, then use the following command, where directory_name
is the new location for the backups:
ocrconfig -backuploc directory_name
The default location for generating backups on Oracle Linux systems is Grid_home
/cdata/
cluster_name
where cluster_name
is the name of your cluster and Grid_home is the home directory of your Oracle Grid Infrastructure for a cluster software. Because the default backup is on a local file system, Oracle recommends that you include the backup file created with the ocrconfig
command as part of your operating system backup using standard operating system or third-party tools.
There are two methods for recovering the OCR. The first method uses automatically generated OCR file copies and the second method uses manually created OCR export files.
This section contains the following topics:
In event of a failure, before you attempt to restore the OCR, ensure that the OCR is unavailable.
To check the status of the OCR:
Run the following command:
ocrcheck
If this command does not display the message 'Device/File integrity check succeeded'
for at least one copy of the OCR, then all copies of the OCR have failed. You must restore the OCR from a backup or OCR export.
If there is at least one copy of the OCR available, then you can use that copy to restore the other copies of the OCR.
When restoring the OCR from automatically generated backups, you first have to determine which backup file to use for the recovery.
To restore the OCR from an automatically generated backup on an Oracle Linux system:
Log in as the root
user.
Identify the available OCR backups using the ocrconfig
command:
[root]# ocrconfig -showbackup
Review the contents of the backup using the following ocrdump
command, where file_name
is the name of the OCR backup file for which the contents should be written out to the file ocr_dump_output_file:
[root]# ocrdump ocr_dump_output_file -backupfile file_name
If you do not specify an output file name, then the OCR contents are written to a file named OCRDUMPFILE
in the current directory.
As the root
user, stop Oracle Clusterware on all the nodes in your cluster by executing the following command:
[root]# crsctl stop cluster -all
As the root
user, restore the OCR by applying an OCR backup file that you identified in Step 1 using the following command, where file_name
is the name of the OCR to restore. Ensure that the OCR devices that you specify in the OCR configuration exist, and that these OCR devices are valid before running this command.
[root]# ocrconfig -restore file_name
As the root
user, restart Oracle Clusterware on all the nodes in your cluster by running the following command:
[root]# crsctl start cluster -all
Use the Cluster Verification Utility (CVU) to verify the OCR integrity. Exit the root
user account, and, as the software owner of the Oracle Grid Infrastructure for a cluster installation, run the following command, where the -n all
argument retrieves a list of all the cluster nodes that are configured as part of your cluster:
cluvfy comp ocr -n all [-verbose]
This section describes how to administer the OCR. The OCR contains information about the cluster node list, which instances run on which nodes, and information about Oracle Clusterware resource profiles for applications that have been modified to be managed by Oracle Clusterware.
This section contains the following topics:
Note:
The operations in this section affect the OCR for the entire cluster. However, theocrconfig
command cannot modify OCR configuration information for nodes that are shut down or for nodes on which Oracle Clusterware is not running. Avoid shutting down nodes while modifying the OCR using the ocrconfig
command.Oracle Clusterware supports up to five OCR copies. You can add an OCR location after an upgrade or after completing the Oracle RAC installation. Additional OCR copies provide greater fault tolerance.
As the root
user, enter the following command to add a new OCR file:
[root]# ocrconfig -add new_ocr_file_name
This command updates the OCR configuration on all the nodes on which Oracle Clusterware is running.
Starting with Oracle Clusterware 11g release 2, you can store the OCR in an Oracle ASM disk group. By default, the OCR is configured to use Oracle ASM when you perform a new installation of Oracle Clusterware. However, if you upgrade from a previous release, then you can migrate OCR to reside on Oracle ASM, and take advantage of the improvements in managing Oracle Clusterware storage.
The OCR inherits the redundancy of the disk group. If you want high redundancy for the OCR, then you must create an Oracle ASM disk group with high redundancy. You should use a disk group with at least normal redundancy, unless you have an external mirroring solution. If you store the OCR in an Oracle ASM disk group, and the Oracle ASM instance fails on a node, then the OCR becomes unavailable only on that node. The failure of one Oracle ASM instance does not affect the availability of the entire cluster.
Oracle does not support storing the OCR on different storage types simultaneously, such as storing the OCR on both Oracle ASM and a shared file system, except during a migration. After you have migrated the OCR to Oracle ASM storage, you must delete the existing OCR files.
To move the OCR from shared storage to an Oracle ASM disk group:
Use the Oracle ASM Configuration Assistant (ASMCA) to create an Oracle ASM disk group that is at least the same size of the existing OCR and has at least normal redundancy.
Verify that the ASM Compatibility attribute for the disk group is set to 11.2.0.0 or higher.
Run the following OCRCONFIG command as the root
user, specifying the Oracle ASM disk group name:
# ocrconfig -add +ASM_disk_group
You can run this command more than once if you add multiple OCR locations. You can have up to five OCR locations. However, each successive run must point to a different disk group.
Remove the non-Oracle ASM storage locations by running the following command as the root
user:
# ocrconfig -delete old_storage_location
You must run this command once for every shared storage location for the OCR that is not using Oracle ASM.
See Also:
Oracle Automatic Storage Management Administrator's Guide for more information about disk group compatibility attributes
Oracle Clusterware Administration and Deployment Guide for more information about migrating the OCR to Oracle ASM
If you must change the location of an existing OCR, or change the location of a failed OCR to the location of a working one, then you can use the following procedure if one OCR file remains online.
To change the location of an OCR or replace an OCR file:
Use the OCRCHECK utility to verify that a copy of the OCR other than the one you are going to replace is online, using the following command:
ocrcheck
Note:
The OCR that you are replacing can be either online or offline.Use the following command to verify that Oracle Clusterware is running on the node on which you are going to perform the replace operation:
crsctl check cluster -all
As the root
user, enter the following command to designate a new location for the specified OCR file:
[root]# ocrconfig -replace source_ocr_file -replacement destination_ocr_file
This command updates the OCR configuration on all the nodes on which Oracle Clusterware is running.
Use the OCRCHECK utility to verify that OCR replacement file is online:
ocrcheck
To remove an OCR file, at least one copy of the OCR must be online. You can remove an OCR location to reduce OCR-related overhead or to stop mirroring your OCR because you moved the OCR to a redundant storage system, such as a redundant array of independent disks (RAID).
To remove an OCR location from your cluster:
Use the OCRCHECK utility to ensure that at least one OCR other than the OCR that you are removing is online.
ocrcheck
Note:
Do not perform this OCR removal procedure unless there is at least one active OCR online.As the root
user, run the following command on any node in the cluster to remove a specific OCR file:
[root]# ocrconfig -delete ocr_file_name
This command updates the OCR configuration on all the nodes on which Oracle Clusterware is running.
If a node in your cluster was not available when you modified the OCR configuration, then you must repair the OCR configuration on that node before it is restarted.
To repair an OCR configuration:
As the root
user, run one or more of the following commands on the node on which Oracle Clusterware is stopped, depending on the number and type of changes that were made to the OCR configuration:
[root]# ocrconfig –repair -add new_ocr_file_name
[root]# ocrconfig –repair -delete ocr_file_name
[root]# ocrconfig –repair -replace source_ocr_file -replacement dest_ocr_file
These commands update the OCR configuration only on the node from which you run the command.
Note:
You cannot perform these operations on a node on which the Oracle Clusterware daemon is running.Restart Oracle Clusterware on the node you have just repaired.
As the root
user, check the OCR configuration integrity of your cluster using the following command:
[root]# ocrcheck
This section includes the following topics about troubleshooting the Oracle Cluster Registry (OCR):
The OCRCHECK utility displays the data block format version used by the OCR, the available space and used space in the OCR, the ID used for the OCR, and the locations you have configured for the OCR. The OCRCHECK utility calculates a checksum for all the data blocks in all the OCRs that you have configured to verify the integrity of each block. It also returns an individual status for each OCR file and a result for the overall OCR integrity check. The following is a sample of the OCRCHECK output:
Status of Oracle Cluster Registry is as follows : Version : 3 Total space (kbytes) : 262144 Used space (kbytes) : 16256 Available space (kbytes) : 245888 ID : 570929253 Device/File Name : +CRS_DATA Device/File integrity check succeeded ... Decive/File not configured Cluster registry integrity check succeeded Logical corruption check succeeded
The OCRCHECK utility creates a log file in the following directory, where Grid_home
is the location of the Oracle Grid Infrastructure for a cluster installation, and hostname
is the name of the local node:
Grid_home/log/hostname/client
The log files have names of the form ocrcheck_
nnnnn
.log
, where nnnnn
is the process ID of the operating session that issued the ocrcheck
command.
Table 5-1 describes common OCR problems and their corresponding solutions.
Table 5-1 Common OCR Problems and Solutions
Problem | Solution |
---|---|
The OCR is not mirrored. |
Run the |
A copy of the OCR has failed and you must replace it. Error messages are being reported in Enterprise Manager or the OCR log file. |
Run the |
OCRCHECK does not find a valid OCR, or all copies of the OCR are corrupted. |
Run the |
The OCR configuration was updated incorrectly. |
Run the |
You are experiencing a severe performance effect from updating multiple OCR files, or you want to remove an OCR file for other reasons. |
Run the |
You want to change the location or storage type currently used by the OCR. |
Run the |