HEMESH ORAWORLD: December 2009

Tuesday 22 December 2009

Memory parameters in 11g - MEMORY_TARGET and MEMORY_MAX_TARGET

Few days back, I was checking metalink and came across few good notes on 11g memory management.
Though we are not using 11g in our development environment, i have downloaded a copy of 11g rel1 on my windows machine.

Testing out these new paarmeters gave me an insight on how Oracle has improvised its memory management.

Oracle has introduced 2 new parameters (along with many others!) - MEMORY_TARGET and MEMORY_MAX_TARGET

Using these parameters, you can manage SGA and PGA together rather than managing them separately (using SGA_TARGET, SGA_MAX_SIZE , PGA_AGGREGATE_TARGET and WORKAREA_SIZE_POLICY in 10g)

If you set SGA_TARGET, SGA_MAX_SIZE and PGA_AGGREGATE_SIZE to 0 and set MEMORY_TARGET (and optionally MEMORY_MAX_TARGET) to non zero value, Oracle will manage both SGA components and PGA together within the limit specified by you.

For instance :
If MEMORY_TARGET is set to 1024MB, Oracle will manage SGA and PGA components within itself.

If MEMORY_TARGET is set to non zero value:

SGA_TARGET, SGA_MAX_SIZE and PGA_AGGREGATE_TARGET are set to 0, 60% of memory mentioned in MEMORY_TARGET is allocated to SGA and rest 40% is kept for PGA.
SGA_TARGET and PGA_AGGREGATE_TARGET are set to non-zero values, these values will be considered minimum values.
SGA_TARGET is set to non zero value and PGA_AGGREGATE_TARGET is not set. Still these values will be autotuned and PGA_AGGREGATE_TARGET will be initialized with value of (MEMORY_TARGET-SGA_TARGET).
PGA_AGGREGATE_TARGET is set and SGA_TARGET is not set. Still both parameters will be autotunes. SGA_TARGET will be initialized to a value of (MEMORY_TARGET-PGA_AGGREGATE_TARGET).

With this version, oracle has become smart as in exchanging memory between SGA and PGAs. This is a huge achievement.
When starting up, Oracle takes up memory equal to MEMORY_TARGET (or MEMORY_MAX_TARGET if mentioned) from Operating System RAM and manage its reqources within itself.
This feature helps DBA to allocate chunk of memory to a particular instance without worrying about the subcateogary allocations of different components.

I tested out few of these experiments mentioned above. Here are the results:

Test 1: When Memory_target is 1G

init.ora parameter
> memory_target=1073741824
> sga_max_size=0

SQL> startup
ORACLE instance started.

Total System Global Area 640303104 bytes
Fixed Size 1335024 bytes
Variable Size 226492688 bytes
Database Buffers 406847488 bytes
Redo Buffers 5627904 bytes
Database mounted.
Database opened.

>> It seems around 600M is used for SGA. The decesion of using 600 MB of SGA is of Oracle itself.

SQL> sho parameter sga

NAME TYPE VALUE
------------------------------------ ----------- -------------------
lock_sga boolean FALSE
pre_page_sga boolean FALSE
sga_max_size big integer 612M
sga_target big integer 0

Test 2: When MEMORY_TARGET is 2g (with no change in any other parameter)

SQL> alter system set memory_target=2048m scope=spfile;

System altered.

SQL> startup force
ORACLE instance started.

Total System Global Area 1288978432 bytes
Fixed Size 1339344 bytes
Variable Size 226492464 bytes
Database Buffers 1052770304 bytes
Redo Buffers 8376320 bytes
Database mounted.
Database opened.

SQL> sho parameter sga

NAME TYPE VALUE
------------------------------------ ----------- ---------------------
lock_sga boolean FALSE
pre_page_sga boolean FALSE
sga_max_size big integer 1232M
sga_target big integer 0

Test 3: When sga_max_size/sga_target=700M and memory_target=1G

SQL> startup
ORACLE instance started.

Total System Global Area 732352512 bytes
Fixed Size 1335696 bytes
Variable Size 192941680 bytes
Database Buffers 532676608 bytes
Redo Buffers 5398528 bytes
Database mounted.
Database opened.
SQL> sho parameter sga

NAME TYPE VALUE
------------------------------------ ----------- ---------------------
lock_sga boolean FALSE
pre_page_sga boolean FALSE
sga_max_size big integer 700M
sga_target big integer 700M
SQL>
SQL> sho parameter memory

NAME TYPE VALUE
------------------------------------ ----------- ---------------------
hi_shared_memory_address integer 0
memory_max_target big integer 1G
memory_target big integer 1G
shared_memory_address integer 0

This time, oracle has acknowledged the SGA_MAX_SIZE parameter.

Thursday 17 December 2009

Oracle RAC History

Oracle RAC 11g Overview

Before introducing the details for building a RAC cluster, it might be helpful to first clarify what a cluster is. A cluster is a group of two or more interconnected computers or servers that appear as if they are one server to end users and applications and generally share the same set of physical disks. The key benefit of clustering is to provide a highly available framework where the failure of one node (for example a database server running an instance of Oracle) does not bring down an entire application. In the case of failure with one of the servers, the other surviving server (or servers) can take over the workload from the failed server and the application continues to function normally as if nothing has happened.

The concept of clustering computers actually started several decades ago. The first successful cluster product was developed by DataPoint in 1977 named ARCnet. The ARCnet product enjoyed much success by academia types in research labs, but didn't really take off in the commercial market. It wasn't until the 1980's when Digital Equipment Corporation (DEC) released its VAX cluster product for the VAX/VMS operating system.

With the release of Oracle 6 for the Digital VAX cluster product, Oracle was the first commercial database to support clustering at the database level. It wasn't long, however, before Oracle realized the need for a more efficient and scalable distributed lock manager (DLM) as the one included with the VAX/VMS cluster product was not well suited for database applications. Oracle decided to design and write their own DLM for the VAX/VMS cluster product which provided the fine-grain block level locking required by the database. Oracle's own DLM was included in Oracle 6.2 which gave birth to Oracle Parallel Server (OPS) - the first database to run the parallel server.

By Oracle 7, OPS was extended to included support for not only the VAX/VMS cluster product but also with most flavors of UNIX. This framework required vendor-supplied clusterware which worked well, but made for a complex environment to setup and manage given the multiple layers involved. By Oracle8, Oracle introduced a generic lock manager that was integrated into the Oracle kernel. In later releases of Oracle, this became known as the Integrated Distributed Lock Manager (IDLM) and relied on an additional layer known as the Operating System Dependant (OSD) layer. This new model paved the way for Oracle to not only have their own DLM, but to also create their own clusterware product in future releases.

Oracle Real Application Clusters (RAC), introduced with Oracle9i, is the successor to Oracle Parallel Server. Using the same IDLM, Oracle 9i could still rely on external clusterware but was the first release to include their own clusterware product named Cluster Ready Services (CRS). With Oracle 9i, CRS was only available for Windows and Linux. By Oracle 10g release 1, Oracle's clusterware product was available for all operating systems and was the required cluster technology for Oracle RAC. With the release of Oracle Database 10g Release 2 (10.2), Cluster Ready Services was renamed to Oracle Clusterware. When using Oracle 10g or higher, Oracle Clusterware is the only clusterware that you need for most platforms on which Oracle RAC operates (except for Tru cluster, in which case you need vendor clusterware). You can still use clusterware from other vendors if the clusterware is certified, but keep in mind that Oracle RAC still requires Oracle Clusterware as it is fully integrated with the database software. This guide uses Oracle Clusterware which as of 11g Release 2 (11.2), is now a component of Oracle grid infrastructure.

Like OPS, Oracle RAC allows multiple instances to access the same database (storage) simultaneously. RAC provides fault tolerance, load balancing, and performance benefits by allowing the system to scale out, and at the same time since all instances access the same database, the failure of one node will not cause the loss of access to the database.

At the heart of Oracle RAC is a shared disk subsystem. Each instance in the cluster must be able to access all of the data, redo log files, control files and parameter file for all other instances in the cluster. The data disks must be globally available in order to allow all instances to access the database. Each instance has its own redo log files and UNDO tablespace that are locally read-writeable. The other instances in the cluster must be able to access them (read-only) in order to recover that instance in the event of a system failure. The redo log files for an instance are only writeable by that instance and will only be read from another instance during system failure. The UNDO, on the other hand, is read all the time during normal database operation (e.g. for CR fabrication).

A big difference between Oracle RAC and OPS is the addition of Cache Fusion. With OPS a request for data from one instance to another required the data to be written to disk first, then the requesting instance can read that data (after acquiring the required locks). This process was called disk pinging. With cache fusion, data is passed along a high-speed interconnect using a sophisticated locking algorithm.

Not all database clustering solutions use shared storage. Some vendors use an approach known as a Federated Cluster, in which data is spread across several machines rather than shared by all. With Oracle RAC, however, multiple instances use the same set of disks for storing data. Oracle's approach to clustering leverages the collective processing power of all the nodes in the cluster and at the same time provides failover security.

Pre-configured Oracle RAC solutions are available from vendors such as Dell, IBM and HP for production environments. This article, however, focuses on putting together your own Oracle RAC 11g environment for development and testing by using Linux servers and a low cost shared disk solution; iSCSI.

Shared-Storage Overview

Today, fibre channel is one of the most popular solutions for shared storage. As mentioned earlier, fibre channel is a high-speed serial-transfer interface that is used to connect systems and storage devices in either point-to-point (FC-P2P), arbitrated loop (FC-AL), or switched topologies (FC-SW). Protocols supported by Fibre Channel include SCSI and IP. Fibre channel configurations can support as many as 127 nodes and have a throughput of up to 2.12 Gigabits per second in each direction, and 4.25 Gbps is expected.

Fibre channel, however, is very expensive. Just the fibre channel switch alone can start at around US$1,000. This does not even include the fibre channel storage array and high-end drives, which can reach prices of about US$300 for a single 36GB drive. A typical fibre channel setup which includes fibre channel cards for the servers is roughly US$10,000, which does not include the cost of the servers that make up the cluster.

A less expensive alternative to fibre channel is SCSI. SCSI technology provides acceptable performance for shared storage, but for administrators and developers who are used to GPL-based Linux prices, even SCSI can come in over budget, at around US$2,000 to US$5,000 for a two-node cluster.

Another popular solution is the Sun NFS (Network File System) found on a NAS. It can be used for shared storage but only if you are using a network appliance or something similar. Specifically, you need servers that guarantee direct I/O over NFS, TCP as the transport protocol, and read/write block sizes of 32K. See the Certify page on Oracle Metalink for supported Network Attached Storage (NAS) devices that can be used with Oracle RAC. One of the key drawbacks that has limited the benefits of using NFS and NAS for database storage has been performance degradation and complex configuration requirements. Standard NFS client software (client systems that use the operating system provided NFS driver) is not optimized for Oracle database file I/O access patterns. With the introduction of Oracle 11g, a new feature known as Direct NFS Client integrates the NFS client functionality directly in the Oracle software. Through this integration, Oracle is able to optimize the I/O path between the Oracle software and the NFS server resulting in significant performance gains. Direct NFS Client can simplify, and in many cases automate, the performance optimization of the NFS client configuration for database workloads. To learn more about Direct NFS Client, see the Oracle White Paper entitled "Oracle Database 11g Direct NFS Client".

The shared storage that will be used for this article is based on iSCSI technology using a network storage server installed with Openfiler. This solution offers a low-cost alternative to fibre channel for testing and educational purposes, but given the low-end hardware being used, it should not be used in a production environment.

iSCSI Technology

For many years, the only technology that existed for building a network based storage solution was a Fibre Channel Storage Area Network (FC SAN). Based on an earlier set of ANSI protocols called Fiber Distributed Data Interface (FDDI), Fibre Channel was developed to move SCSI commands over a storage network.

Several of the advantages to FC SAN include greater performance, increased disk utilization, improved availability, better scalability, and most important to us — support for server clustering! Still today, however, FC SANs suffer from three major disadvantages. The first is price. While the costs involved in building a FC SAN have come down in recent years, the cost of entry still remains prohibitive for small companies with limited IT budgets. The second is incompatible hardware components. Since its adoption, many product manufacturers have interpreted the Fibre Channel specifications differently from each other which has resulted in scores of interconnect problems. When purchasing Fibre Channel components from a common manufacturer, this is usually not a problem. The third disadvantage is the fact that a Fibre Channel network is not Ethernet! It requires a separate network technology along with a second set of skill sets that need to exist with the data center staff.

Ratified on February 11, 2003 by the Internet Engineering Task Force (IETF), the Internet Small Computer System Interface, better known as iSCSI, is an Internet Protocol (IP)-based storage networking standard for establishing and managing connections between IP-based storage devices, hosts, and clients. iSCSI is a data transport protocol defined in the SCSI-3 specifications framework and is similar to Fibre Channel in that it is responsible for carrying block-level data over a storage network. Block-level communication means that data is transferred between the host and the client in chunks called blocks. Database servers depend on this type of communication (as opposed to the file level communication used by most NAS systems) in order to work properly. Like a FC SAN, an iSCSI SAN should be a separate physical network devoted entirely to storage, however, its components can be much the same as in a typical IP network (LAN).

While iSCSI has a promising future, many of its early critics were quick to point out some of its inherent shortcomings with regards to performance. The beauty of iSCSI is its ability to utilize an already familiar IP network as its transport mechanism. The TCP/IP protocol, however, is very complex and CPU intensive. With iSCSI, most of the processing of the data (both TCP and iSCSI) is handled in software and is much slower than Fibre Channel which is handled completely in hardware. The overhead incurred in mapping every SCSI command onto an equivalent iSCSI transaction is excessive. For many the solution is to do away with iSCSI software initiators and invest in specialized cards that can offload TCP/IP and iSCSI processing from a server's CPU. These specialized cards are sometimes referred to as an iSCSI Host Bus Adaptor (HBA) or a TCP Offload Engine (TOE) card. Also consider that 10-Gigabit Ethernet is a reality today!

As with any new technology, iSCSI comes with its own set of acronyms and terminology. For the purpose of this article, it is only important to understand the difference between an iSCSI initiator and an iSCSI target.

iSCSI Initiator

Basically, an iSCSI initiator is a client device that connects and initiates requests to some service offered by a server (in this case an iSCSI target). The iSCSI initiator software will need to exist on each of the Oracle RAC nodes (racnode1 and racnode2).

An iSCSI initiator can be implemented using either software or hardware. Software iSCSI initiators are available for most major operating system platforms. For this article, we will be using the free Linux Open-iSCSI software driver found in the iscsi-initiator-utils RPM. The iSCSI software initiator is generally used with a standard network interface card (NIC) — a Gigabit Ethernet card in most cases. A hardware initiator is an iSCSI HBA (or a TCP Offload Engine (TOE) card), which is basically just a specialized Ethernet card with a SCSI ASIC on-board to offload all the work (TCP and SCSI commands) from the system CPU. iSCSI HBAs are available from a number of vendors, including Adaptec, Alacritech, Intel, and QLogic.

iSCSI Target

An iSCSI target is the "server" component of an iSCSI network. This is typically the storage device that contains the information you want and answers requests from the initiator(s). For the purpose of this article, the node openfiler1 will be the iSCSI target.

So with all of this talk about iSCSI, does this mean the death of Fibre Channel anytime soon? Probably not. Fibre Channel has clearly demonstrated its capabilities over the years with its capacity for extremely high speeds, flexibility, and robust reliability. Customers who have strict requirements for high performance storage, large complex connectivity, and mission critical reliability will undoubtedly continue to choose Fibre Channel.

Before closing out this section, I thought it would be appropriate to present the following chart that shows speed comparisons of the various types of disk interfaces and network technologies. For each interface, I provide the maximum transfer rates in kilobits (kb), kilobytes (KB), megabits (Mb), megabytes (MB), gigabits (Gb), and gigabytes (GB) per second with some of the more common ones highlighted in grey.

Disk Interface / Network / BUS	Speed
Disk Interface / Network / BUS	Kb	KB	Mb	MB	Gb	GB
Serial	115	14.375	0.115	0.014
Parallel (standard)	920	115	0.92	0.115
10Base-T Ethernet			10	1.25
IEEE 802.11b wireless Wi-Fi (2.4 GHz band)			11	1.375
USB 1.1			12	1.5
Parallel (ECP/EPP)			24	3
SCSI-1			40	5
IEEE 802.11g wireless WLAN (2.4 GHz band)			54	6.75
SCSI-2 (Fast SCSI / Fast Narrow SCSI)			80	10
100Base-T Ethernet (Fast Ethernet)			100	12.5
ATA/100 (parallel)			100	12.5
IDE			133.6	16.7
Fast Wide SCSI (Wide SCSI)			160	20
Ultra SCSI (SCSI-3 / Fast-20 / Ultra Narrow)			160	20
Ultra IDE			264	33
Wide Ultra SCSI (Fast Wide 20)			320	40
Ultra2 SCSI			320	40
FireWire 400 - (IEEE1394a)			400	50
USB 2.0			480	60
Wide Ultra2 SCSI			640	80
Ultra3 SCSI			640	80
FireWire 800 - (IEEE1394b)			800	100
Gigabit Ethernet			1000	125	1
PCI - (33 MHz / 32-bit)			1064	133	1.064
Serial ATA I - (SATA I)			1200	150	1.2
Wide Ultra3 SCSI			1280	160	1.28
Ultra160 SCSI			1280	160	1.28
PCI - (33 MHz / 64-bit)			2128	266	2.128
PCI - (66 MHz / 32-bit)			2128	266	2.128
AGP 1x - (66 MHz / 32-bit)			2128	266	2.128
Serial ATA II - (SATA II)			2400	300	2.4
Ultra320 SCSI			2560	320	2.56
FC-AL Fibre Channel			3200	400	3.2
PCI-Express x1 - (bidirectional)			4000	500	4
PCI - (66 MHz / 64-bit)			4256	532	4.256
AGP 2x - (133 MHz / 32-bit)			4264	533	4.264
Serial ATA III - (SATA III)			4800	600	4.8
PCI-X - (100 MHz / 64-bit)			6400	800	6.4
PCI-X - (133 MHz / 64-bit)				1064	8.512	1
AGP 4x - (266 MHz / 32-bit)				1066	8.528	1
10G Ethernet - (IEEE 802.3ae)				1250	10	1.25
PCI-Express x4 - (bidirectional)				2000	16	2
AGP 8x - (533 MHz / 32-bit)				2133	17.064	2.1
PCI-Express x8 - (bidirectional)				4000	32	4
PCI-Express x16 - (bidirectional)				8000	64	8

Wednesday 16 December 2009

Oracle RAC and TAF to Guarantee availability

One of the most exciting new features in Oracle Database is Real Application Clusters (RAC). The Oracle RAC solution delivers 24/7 database availability, performance, and scalability. Cache Fusion is the key memory feature that enables Oracle RAC performance, and the new Transparent Application Failover (TAF) is what applications use to sync up with Oracle RAC availability. This article explores the cooperation between Oracle RAC, Cache Fusion, and TAF and offers insights into the architecture and use of these tools for continuous availability and infinite scalability.

Oracle RAC Architecture

Oracle has long recognized that a clustered environment is the best protection against hardware and software failure. In a clustered environment, many Oracle instances exist on separate servers, each with direct connectivity to a single Oracle database. Should any single server or instance fail, processing continues on the surviving servers.

Cache Fusion and Oracle RAC

The introduction of the Cache Fusion shared RAM cache for multiple Oracle instances is a breakthrough in clustered solutions. Oracle RAC fully implements Cache Fusion, which both provides high performance and enables continuous cluster availability. The high-availability capability of Oracle RAC is almost unfathomable. It's estimated that in a 12-computer configuration, any application running on Oracle RAC will not experience a catastrophic failure for well over 100,000 years.

Cache Fusion technology changes the internal configuration of the Oracle system global area (SGA). Cache Fusion moves the RAM data buffers from local RAM storage into a shared RAM area accessible by all Oracle instances.

Beyond high performance and high availability, Oracle RAC offers significant benefits as a scalability tool. Whenever the processing load becomes excessive in an existing Oracle RAC cluster, you can add additional processors—each with its own Oracle instance—to the Oracle RAC configuration. This allows companies to start small and scale infinitely as processing demands increase.

Oracle RAC and Hardware Failover

To detect a node failure, the Cluster Manager uses a background process—Global Enqueue Service Monitor (LMON)—to monitor the health of the cluster. When a node fails, the Cluster Manager reports the change in the cluster's membership to Global Cache Services (GCS) and Global Enqueue Service (GES). These services are then remastered based on the current membership of the cluster.

To successfully remaster the cluster services, Oracle RAC keeps track of all resources and resource states on each node and then uses this information to restart these resources on a backup node.

These processes also manage the state of in-flight transactions and work with TAF to either restart or resume the transactions on the new node. Now let's see how Oracle RAC and TAF work together to ensure that a server failure does not cause an unplanned service interruption.

Using Transparent Application Failover

After an Oracle RAC node crashes—usually from a hardware failure—all new application transactions are automatically rerouted to a specified backup node. The challenge in rerouting is to not lose transactions that were "in flight" at the exact moment of the crash. One of the requirements of continuous availability is the ability to restart in-flight application transactions, allowing a failed node to resume processing on another server without interruption. Oracle's answer to application failover is a new Oracle Net mechanism dubbed Transparent Application Failover. TAF allows the DBA to configure the type and method of failover for each Oracle Net client.

For an application to use TAF, it must use failover-aware API calls from the Oracle Call Interface (OCI). Inside OCI are TAF callback routines that can be used to make any application failover-aware.

While the concept of failover is simple, providing an apparent instant failover can be extremely complex, because there are many ways to restart in-flight transactions. The TAF architecture offers the ability to restart transactions at either the transaction (SELECT) or session level:

SELECT failover. With SELECT failover, Oracle Net keeps track of all SELECTstatements issued during the transaction, tracking how many rows have been fetched back to the client for each cursor associated with a SELECT statement. If the connection to the instance is lost, Oracle Net establishes a connection to another Oracle RAC node and re-executes the SELECT statements, repositioning the cursors so the client can continue fetching rows as if nothing has happened. The SELECT failover approach is best for data warehouse systems that perform complex and time-consuming transactions.
SESSION failover. When the connection to an instance is lost, SESSION failover results only in the establishment of a new connection to another Oracle RAC node; any work in progress is lost. SESSION failover is ideal for online transaction processing (OLTP) systems, where transactions are small.

Oracle TAF also offers choices on how to restart a failed transaction. The Oracle DBA may choose one of the following failover methods:

BASIC failover. In this approach, the application connects to a backup node only after the primary connection fails. This approach has low overhead, but the end user experiences a delay while the new connection is created.
PRECONNECT failover. In this approach, the application simultaneously connects to both a primary and a backup node. This offers faster failover, because a pre-spawned connection is ready to use. But the extra connection adds everyday overhead by duplicating connections.

Currently, TAF will fail over standard SQL SELECT statements that have been caught during a node crash in an in-flight transaction failure. In the current release of TAF, however, TAF must restart some types of transactions from the beginning of the transaction.

The following types of transactions do not automatically fail over and must be restarted by TAF:

Transactional statements. Transactions involving INSERT, UPDATE, or DELETEstatements are not supported by TAF.
ALTER SESSION statements. ALTER SESSION and SQL*Plus SETstatements do not fail over.
The following do not fail over and cannot be restarted:
Temporary objects. Transactions using temporary segments in the TEMP tablespace and global temporary tables do not fail over.
PL/SQL package states. PL/SQL package states are lost during failover.

Using Oracle RAC and TAF Together

The continuous availability features of Oracle RAC and TAF come together when these products cooperate in restarting failed transactions. Let's take a closer look at how this works.

Within each connected Oracle Net client, tnsnames.ora file parameters define the failover types and methods for that client. The parameters direct Oracle RAC and TAF on how to restart any transactions that may be in-flight during a hardware failure on the node.

It is important to note that TAF failover control is external to the Oracle RAC cluster, and each Oracle Net client may have unique failover types and methods, depending on processing requirements. The following is a client tnsnames.ora file entry for a node, including its current TAF failover parameters:

bubba.world =

(DESCRIPTION_LIST =

(FAILOVER = true)

(LOAD_BALANCE = true)

(DESCRIPTION =

(ADDRESS =

(PROTOCOL = TCP)

(HOST = redneck)(PORT = 1521))

(CONNECT_DATA =

(SERVICE_NAME = bubba)

(SERVER = dedicated)

(FAILOVER_MODE =

(BACKUP=cletus)

(TYPE=select)

(METHOD=preconnect)

(RETRIES=20)

(DELAY=3)

)

The failover_mode section of the tnsnames.ora file lists the parameters and their values:

BACKUP=cletus. This names the backup node that will take over failed connections when a node crashes. In this example, the primary server is bubba, and TAF will reconnect failed transactions to the cletus instance in case of server failure.

TYPE=select. This tells TAF to restart all in-flight transactions from the beginning of the transaction (and not to track cursor states within each transaction).

METHOD=preconnect. This directs TAF to create two connections at transaction startup time: one to the primary bubba database and a backup connection to the cletus database. In case of instance failure, the cletus database will be ready to resume the failed transaction.

RETRIES=20. This directs TAF to retry a failover connection up to 20 times.

DELAY=3. This tells TAF to wait three seconds between connection retries.

Remember, you must set these TAF parameters in every tnsnames.ora file on every Oracle Net client that needs transparent failover.

Putting It All Together

An Oracle Net client can be a single PC or a huge application server. In the architectures of giant Oracle RAC systems, each application server has a customized tnsnames.ora file that governs the failover method for all connections that are routed to that application server.

Watching TAF in Action

The transparency of TAF operation is a tremendous advantage to application users, but DBAs need to quickly see what has happened and where failover traffic is going, and they need to be able to get the status of failover transactions. To provide this capability, the Oracle data dictionary has several new columns in the V$SESSION view that give the current status of failover transactions.

The following query calls the new FAILOVER_TYPE, FAILOVER_METHOD, and FAILED_OVER columns of the V$SESSION view. Be sure to note that the query is restricted to nonsystem sessions, because Oracle data definition language (DDL) and data manipulation language (DML) are not recoverable with TAF.

select

username,

sid,

serial#,

failover_type,

failover_method,

failed_over

from

v$session

where

username not in ('SYS','SYSTEM',

'PERFSTAT')

and

failed_over = 'YES';

You can run this script against the backup node after an instance failure to see those transactions that have been reconnected with TAF. Remember, TAF will quickly redirect transactions, so you'll only see entries for a short period of time immediately after the failover. A backup node can have a variety of concurrent failover transactions, because the tnsnames.ora file on each Oracle Net client specifies the backup node, the failover type, and the failover method.

Conclusion

Oracle RAC, TAF, and Cache Fusion work together to guarantee continuous availability and infinite scalability. To summarize, here's a short description of each component:

Oracle RAC. The clustering component of Oracle that allows the creation of multiple, independent Oracle instances, all sharing a single database.

Cache Fusion. The shared RAM component of Oracle RAC that provides fast interchange of Oracle data blocks between SGA regions.

TAF. The failover method implemented on the Oracle Net client to restart in-flight transactions when a node crashes.

Monday 14 December 2009

Oracle Enterprise Linux 5

Specifications	Oracle Enterprise Linux 4	Oracle Enterprise Linux 5
Architectures	X86, X86-64, Itanium *	X86, X86-64, Itanium **
Versions	Update 4, Update 5, Update 6, Update 7, Update 8	General Availability (Release 1), Update 1, Update 2, Update 3, Update 4
Minimum Memory Requirements	256MB X86, X86-64 512MB IA64	512MB
Minimum Disk Space	800MB	1GB
Max Memory	64GB x86 1TB Intel64 256GB AMD64	16 GB x86 1TB Intel64 256GB AMD64
Max Number of CPUs	Unlimited	Unlimited
Kernel Base Version	2.6.9	2.6.18
Compiler	Gcc 3.4	Gcc 4.1
Libraries	Glibc 2.3.4	Glibc 2.5
Cluster Administration	OCFS2 available, GFS and Cluster Suite not in base OS	OCFS2 available, GFS and Cluster Suite included in base OS
Supported Filesystems
Ext3 Max File Size	2TB	2TB
Ext3 Max filesystem size	8TB	8TB/16TB (in 5.1)
OCFS2 Max File Size	16TB	16TB
OCFS2 Max filesystem size	16TB	16TB
GFS Max File Size	N/A	16TB/8EB
GFS Max filesystem size	N/A	16TB/8EB



* For the Itanium architecture, Oracle provides rpms for download from ULN starting with OEL4U6. Oracle provides ISOs for free download from Oracle E-Delivery starting with OEL4U8. ** For the Itanium architecture, starting with OEL5 Update 4, ISOs are available on Oracle E-Delivery and rpms are available on ULN