AWR TOP 5 Timed Events Analysis: December 2011

Wednesday, December 28, 2011

TOP 5 Timed Events - ARCH wait on SENDREQ

ARCH wait on SENDREQ -- Is Network related Wait event at Primary database side when a standby database is configured. ARCH wait on SENDREQ monitors the amount of time spent by all archiver processes to write the received redo to disk as well as open and close the remote archived redo logs

There are three things one can do if you have high wait time against this wait event.

1) Optimize the Network -

Overall, the goal for all Data Guard configurations is to ship redo data to the remote disaster recovery site fast enough to meet recovery time and recovery point objectives. If there is insufficient bandwidth available to handle the required volume however, no amount of tuning can achieve this goal. In order to figure out how much bandwidth is needed, the volume of data that is generated by the production database will need to be determined. Ideally, this can be found by measuring an existing production database or a database which has been set up in a test environment.

SDU (SESSION DATA UNIT)
Before sending data across the network, Oracle Net buffers data into the Session Data Unit (SDU). When large amounts of data are being transmitted or when the message size is consistent, increasing the size of the SDU buffer can improve performance and network utilization. You can configure SDU size within an Oracle Net connect descriptor or globally within the sqlnet.ora.
For Data Guard broker configurations configure the DEFAULT_SDU_SIZE parameter in the sqlnet.ora file:
DEFAULT_SDU_SIZE=32767

For non Data Guard broker configurations that use a connect descriptor, you can override the current settings in the primary database sqlnet.ora file. If you are setting the SDU in a connect descriptor you must use a static SID. Using a dynamic service registration will use the default SDU size defined by DEFAULT_SDU_SIZE. This example uses a connect descriptor with the SDU parameter in the description.
sales.us.acme.com=
(DESCRIPTION=
(SDU=32767)
(ADDRESS=(PROTOCOL=tcp)
(HOST=sales-server)
(PORT=1521))
(CONNECT_DATA=
(SID=sales.us.acme.com))
)
On the standby database, set SDU in the SID_LIST of the listener.ora file:
SID_LIST_listener_name=
(SID_LIST=
(SID_DESC=
(SDU=32767)
(GLOBAL_DBNAME=sales.us.acme.com)
(SID_NAME=sales)
(ORACLE_HOME=/usr/oracle)))
2) Multiple archive processes can transmit a redo log in parallel to the standby database, reducing the time for the redo transmission to the secondary. The MAX_CONNECTIONS attribute of the LOG_ARCHIVE_DEST_n control the number of these processes. This can be very beneficial during batch loads.

3)A system's network queues sizes can also be adjusted to optimize performance. You can regulate the size of the queue between the kernel network subsystems and the driver for network interface card. Any queue should be sized so that losses do not occur due to local buffer overflows. Therefore, careful tuning is required to ensure that the sizes of the queues are optimal for your network connection, particularly for high bandwidth networks.

These settings are especially important for TCP because losses on local queues cause TCP to fall into congestion control, which limits the TCP sending rates.
For Linux there are two queues to consider, the interface transmit queue and the network receive queue. The transmit queue size is configured with the network interface option txqueuelen. The network receive queue size is configured with the kernel parameter netdev_max_backlog
----- Find below more events related to Standby Database on Primary and Standby sides.----

WAIT EVENTS ON THE PRIMARY

On the primary database there are two categories of wait events which are either related to the ARC process or the LGWR process. The descriptions of these events are below.
1. ARCH PROCESS WAIT EVENTS
ARCH wait on ATTACH monitors the amount of time spent by all archiver processes to spawn an RFS connection.
ARCH wait on SENDREQ monitors the amount of time spent by all archiver processes to write the received redo to disk as well as open and close the remote archived redo logs.
ARCH wait on DETACH monitors the amount of time spent by all archiver processes to delete an
RFS connection.

2. LGWR SYNC WAIT EVENTS
LGWR wait on ATTACH monitors the amount of time spent by all log writer processes to spawn an RFS connection.
LGWR wait on SENDREQ monitors the amount of time spent by all log writer processes to write the received redo to disk as well as open and close the remote archived redo logs.
LGWR wait on DETACH monitors the amount of time spent by all log writer processes to delete an RFS connection.

3. LGWR ASYNC WAIT EVENTS
LNS wait on ATTACH monitors the amount of time spent by all network servers to spawn an RFS connection.
LNS wait on SENDREQ monitors the amount of time spent by all network servers to write the received redo to disk as well as open and close the remote archived redo logs.
LNS wait on DETACH monitors the amount of time spent by all network servers to delete an RFS connection.

LGWR wait on full LNS buffer monitors the amount of time spent by the log writer (LGWR) process waiting for the network server (LNS) to free up ASYNC buffer space. If buffer space has not been freed in a reasonable amount of time, availability of the primary database is not compromised by allowing the archiver process (ARCn) to transmit the redo log data. This wait event is not relevant for destinations configured with the LGWR SYNC=PARALLEL attributes.

WAIT EVENTS ON THE SECONDARY

RFS Write is the elapsed time for the write to standby redo log or archive log to occur as well as non I/O work such as redo block checksum validation.

RFS Random I/O is the elapsed time for the write to a standby redo log to occur.

RFS Sequential I/O is the elapsed time for the write to an archive log to occur.

Tuesday, December 20, 2011

SQL ordered by Version Count

This time i want to discuss about the SQL ordered by Version Count under Sql Statistics of AWR as it is hard finding the different TOP Wait Events in AWRs. This needs a little explanation as most of us worry what the queries under this section fall into.

In the above we see query has 204 version count and has more than 165 executions. First thing to start with looking at the sql text and see if the bind variables are being used or not. If they are being used then we need to look for the cuase of having many child cursors for that query.

As the definition SQL Ordered by Version Count - AWR reports statements with more than child cursors than the threshold value which defaults to 20.

There are so many reasons why Oracle does not share the same statement. Oracle records the reason in v$sql_shared_cursor view. In 10.2.0.1 database, v$sql_shared_cursor has 53 reasons of unshareability.

V$SQL_SHARED_CURSOR explains why a particular child cursor is not shared with existing child cursors. Each column identifies a specific reason why the cursor cannot be shared.

I have done the same found that the reason for above sql in the image is due to Bind Mismatch. I have gone through many blogs including the blog below of Jonathan which beautifully explains the reasons behind the bind mismatch for columns with varchar.

http://jonathanlewis.wordpress.com/2007/01/05/bind-variables/

Unfortunately my column in the query ( Above Image ) is a number field and explanation of the above link does not suite my situation. But i found that this is one of the busy queries hit by my application and the reason for Bind Mismatch in this situation is due to the heavy activity in shared pool. This is becuase when a session attempts to validate a cursor and finds it has to wait for a pin then it may simply generate a new child rather than waiting which is causing for creation of new child cursor.

Other reasons might be - if the optimizer environments are different you get different child cursors - even if the final plan is the same - check the column on v$sql, and the content of $sql_optimizer_env or

there are various bugs that cause child cursors to be created unnecessarily in various versions from 10.2.0.1.

Find below the little explanation on Parent and Child Cursors which is very interesting in this context from Steve Adams.

"For each SQL statement the library cache contains a "parent" cursor for the text of the SQL statement.

The parent cursor is comprised of a "handle" that can be looked up by hash value via the library cache hash table, and an "object" that contains pointers to each of its "child" cursors. Each child cursor is also comprised of a handle and an object. The child object is comprised of two heaps numbered 0 and 6. Heap 0 contains all the identifying information for a particular version of the SQL statement and heap 6 contains the execution plan. This distinction between parent and child cursors is maintained even when there is only one version of each SQL statement.

For parent cursors the convention used in the X$ tables is that the parent address is the same as the handle address. V$OPEN_CURSORS, V$SQL and so on are only interested in child cursors and so they exclude parent cursors by requiring that the two addresses are different. The columns that you mentioned are of course the two addresses."

Friday, December 9, 2011

Top 5 Timed Foreground Events - direct path read temp

Direct Path Read Temp

When a session reads buffers from disk directly into the PGA, the wait is on direct path read temp. This is closely related to the direct path read wait. If the I/O subsystem doesn’t support asynchronous I/Os, then each wait corresponds to a physical read request. If the I/O subsystem supports asynchronous I/O, then the process overlaps read requests with processing the blocks already in the PGA. When the process attempts to access a block in the PGA that has not yet been read from disk, it issues a wait call and updates the statistics for this event. So, the number of waits is not always the same as the number of read requests.

Similarly, the direct path write temp wait event and direct path write wait event can occur when a process is writing buffers directly from PGA. The process will wait on this event for the write call to complete. Operations that could perform direct path writes include when a sort goes to disk, parallel DML operations, direct-path INSERTs, parallel create table as select, and some LOB operations. Like direct path reads, the number of waits is not the same as number of write calls issued if the I/O subsystem supports asynchronous writes. The session will wait if it has processed the buffers in the PGA and can’t continue work until an I/O request completes.

Causes for the direct path read temp wait

To reduce the direct path read wait event and direct path read temp wait event:

§ High disk sorts – If the sorts are too large to fit in memory and get sent to disk, this wait can occur.

§ Parallel slaves – Parallel slaves are used for scanning data or parallel DML may be used to create and populate objects. These may lead to direct path read wait and direct path write wait respectively.

§ Direct path loads – The direct path API is used to pass data to the load engine in the server and can cause the related direct path write wait.

§ Server process ahead of I/O – The server process is processing buffers faster than the I/O system can return the buffers. This can indicate an overloaded I/O system

§ Data Warehouse – Sorts in a data warehouse environment may always go to disk leading to high waits on direct path read temp and/or direct path write temp.

§ Hash area size – For query plans that call for a hash join, excessive I/O could result from having HASH_AREA_SIZE too small.

One can go down the AWR to the IO Stats Section to see what is causing these wait events.

Under IO Stats , IO Stat by FileType Summary we see that Temp is on the TOP with Major Volume of Data Writes and Reads.

High disk sorts – The disk sort distribution can be checked: It is recommended to use pga_aggregate_target. This area is used both for sorts and the hash join area. If possible the PGA can be sized larger to fit more in-memory sorts reducing the disk sorts. Memory_Target can be increased if required.

Full Table Scans – If tables are defined with a high degree of parallelism, the optimizer leans towards using full table scans with parallel slaves. For a direct path read, locate the object being loaded. Consider using disk striping or Automatic Storage Management (ASM) which can stripe for you.

Parallel DML – Check the I/O distribution across the disks and make sure your I/O is configured for the parallelism being used. Verify that the parameter DISK_ASYNCH_IO is set to true.

The direct path read temp wait is most often encountered when the PGA is not able to support the size of the sorts. The closely related wait events of direct path read, direct path write temp, and direct path write can occur due to parallel operations, direct path inserts and overloaded I/O. But tuning the PGA, the I/O placement, and SQL tuning can reduce or eliminate this wait.

AWR TOP 5 Timed Events Analysis

Search Oracle Related Sites