AWR TOP 5 Timed Events Analysis: 2012

Tuesday, April 17, 2012

Top 5 Timed Foreground Events - Buffer Exterminate

Buffer Exterminate - Buffer exterminate wait event is caused when using Oracle's "automatic memory management" (AMM) when the MMON process shrinks the data buffer cache to re-allocate that RAM for another SGA region. The experience has indicated that AMM resize operations can hurt overall database performance especially the OLTP environments, and you may want to consider turning off AMM which will relieve the buffer exterminate waits, and manually adjusting your SGA regions.

If you see this in the TOP 5 Times Events, One has to look into v$sga_resize_ops and v$memory_resize_ops and see how many times it is occurring and effecting the performance of your database. If one sees more events of these and especially during your peak times of database one has to turn of the feature adjusting manually the corresponding SGA and PGA sizes.

If you want to analyze Oracle's use of memory and look at various memory resizing operations you can use the V$MEMORY_RESIZE_OPS view. This view contains a list of the last 800 SGA resize requests handled by Oracle. Here is an example:

SELECT parameter, initial_size, target_size, start_time FROM v$memory_resize_ops WHERE initial_size > = 0 and final_size > = 0 ORDER BY parameter, start_time;

This shows that Oracle has made a number of changes to the database cache and the shared pool, over a pretty short period of time. These changes will often decrease as the database stays up for a longer period of time, and you will often see changes as the load profile of the database changes, say from being report heavy to OLTP heavy.

Please find below the various tables and thier descriptions to check the information in the database.

V$MEMORY_DYNAMIC_COMPONENTS - Displays information on the current size of all automatically tuned and static memory components, with the last operation (for example, grow or shrink) that occurred on each.

V$SGA_DYNAMIC_COMPONENTS -- Displays the current sizes of all SGA components, and the last operation for each component.

V$SGA_DYNAMIC_FREE_MEMORY -- Displays information about the amount of SGA memory available for future dynamic SGA resize operations.

V$MEMORY_CURRENT_RESIZE_OPS -- Displays information about resize operations that are currently in progress. A resize operation is an enlargement or reduction of the SGA, the instance PGA, or a dynamic SGA component.

V$SGA_CURRENT_RESIZE_OPS -- Displays information about dynamic SGA component resize operations that are currently in progress.

V$MEMORY_RESIZE_OPS -- Displays information about the last 800 completed memory component resize operations, including automatic grow and shrink operations for SGA_TARGET and PGA_AGGREGATE_TARGET.

V$SGA_RESIZE_OPS -- Displays information about the last 800 completed SGA component resize operations.

Tuesday, February 28, 2012

Top 5 Timed Events - gc cr failure

gc cr failure – This wait event is triggered when a CR ( Consistent Read) block is requested from the holder of the block and a failure status message is received. This happens where there are unforeseen events such as lost block or checksum or an invalid block request or when the holder cannot process the request. One will see multiple timeouts for the place holder wait like gc cr request before receiving gc cr failure event. One can query system statistics view v$sysstat for gc blocks lost or gc claim blocks lost.

Failure is not an option in cluster communications because lot messages or block may potentially trigger node evictions.

In the above case this wait event is because of gc buffer busy as the node holding the block requested is busy and cannot process the request.

Let us understand how Consistent Read (CR) requests are handled in RAC to get more clarity and why the nodes get busy fulfilling the requests. When an instance needs to generate a CR version of the current block, the block can be either in the local or remote cache. If the latter, then LMS ( Lock Manager Server) on the other instance will try to create the CR block, when the former, the foreground process executing the query will perform the CR block generation. When a CR version is created, the instance or instances needs to read the transaction table and undo blocks from the rollback /undo segment that are referenced in the active transaction table of the block. Sometimes this cleanout/rollback process may cause several lookups of remote undo headers and undo blocks. The remote undo header and undo block lookups will result in a gc cr request . Also as undo headers are frequently accessed, a buffer wait may also occur.

We got rid of these kind of wait events after reducing the traffic between the nodes by pointing the applications which are depended on each specific tables to specific nodes.

Friday, February 10, 2012

Top 5 Timed Foreground Events - Library Cache Lock & Library Cache Pin

Library Cache Lock & Library Cache Pin

Library cache lock will be obtained on database objects referenced during parsing or compiling of SQL or PL/SQL statements (table, view, procedure, function, package, package body, trigger, index, cluster, and synonym). The lock will be released at the end of the parse or compilation.

I am not discussing more on the theory part of the this wait event as I have discussed more about them in my previous blog. You can find the much detailed explanation of possible reasons and solutions with the below link

http://orakhoj.blogspot.com/2011/10/top-5-timed-foreground-events-library_17.html

Here I am just discussing about the recent issue we faced in our load test environment. When we ran the load with X amount of SGA we see DB Time increased to very high value and found Library Cache Lock and Library Cache Pin as TOP 2 Wait events.

Solutions: We just increased the SGA to Y amount and re ran the same load which has given wonderful results. The DB time has reduced to a considerable amount and Library Cache Lock has disappeared from the TOP wait event. So the solutions which I discussed in the above link, here it are the practical implementation. The increase in Shared Pool via increasing the SGA has solved the problem to get rid of this wait event.

Monday, January 30, 2012

gc cr block lost / gc current block lost

TOP 5 Timed Events - gc cr block lost / gc current block lost

Global cache lost blocks statistics ("gc cr block lost" and/or "gc current block lost") for each node in the cluster as well as aggregate statistics for the cluster represent a problem or inefficiencies in packet processing for the interconnect traffic. These statistics should be monitored and evaluated regularly to guarantee efficient interconnect Global Cache and Enqueue Service (GCS/GES) and cluster processing. Any block loss indicates a problem in network packet processing and should be investigated.

The vast majority of escalations attributed to RDBMS global cache lost blocks can be directly related to faulty or mis-configured interconnects. “lost blocks” at the RDBMS level, responsible for 64% of escalations.

Misconfigured or Faulty Interconnect Can Cause:

• Dropped packets/fragments

• Buffer overflows

• Packet reassembly failures or timeouts

• Ethernet Flow control kicks in

• TX/RX errors

“Lost Blocks”: NIC Receive Errors

Db_block_size = 8K

ifconfig –a:

eth0 Link encap:Ethernet HWaddr 00:0B:DB:4B:A2:04

inet addr:130.35.25.110 Bcast:130.35.27.255 Mask:255.255.252.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:21721236 errors:135 dropped:0 overruns:0 frame:95

TX packets:273120 errors:0 dropped:0 overruns:0 carrier:0

“Lost Blocks”: IP Packet Reassembly Failures

netstat –s

Ip:
84884742 total packets received
…

1201 fragments dropped after timeout
…
3384 packet reassembles failed

Detailed Explanation of the this wait event can be found at Metalink - gc block lost diagnostics [ID 563566.1]

Sunday, January 29, 2012

gc current block busy & gc cr block busy

Top 5 Wait Events (RAC) - gc current block busy & gc cr block busy

gc current block busy - When a request needs a block in current mode, it sends a request to the master instance. The requestor evenutally gets the block via cache fusion transfer. However sometimes the block transfer is delayed due to either the block was being used by a session on another instance or the block transfer was delayed because the holding instance could not write the corresponding redo records to the online logfile immediately.

One can use the session level dynamic performance views v$session and v$session_event to find the programs or sesions causing the most waits on this events

select a.sid , a.time_waited , b.program , b.module from v$session_event a , v$session b where a.sid=b.sid and a.event='gc current block busy' order by a.time_waited;

gc cr block busy - When a request needs a block in CR mode , it sends a request to the master instance. The requestor evenutally gets the block via cache fusion transfer. However sometimes the block transfer is delayed due to either the block was being used by a session on another instance or the block transfer was delayed because the holding instance could not write the corresponding redo records to the online logfile immediately.

One can use the session level dynamic performance views v$session and v$session_event to find the programs or sesions causing the most waits on this events

select a.sid , a.time_waited , b.program , b.module from v$session_event a , v$session b where a.sid=b.sid and a.event='gc cr block busy' order by a.time_waited;

This event indicates significant write/write contention. If it appears like the below in TOP 5 list of AWR

Ensure that the log writer (lgwr) is tuned. In our situation planning for appropriate application partitioning to multiple instances avoided the contention. We will see more of the RAC realted wait events in the other posts.

Friday, January 13, 2012

gc current block 2 way - gc current block 3 way

Block oriented waits are the most common wait events in the cluster wait events. The block oriented wait event statistics indicate that the requested block was served from the other instances. In a two – node cluster environment a message is transferred to the current holder of the block and holder ships the block to the requestor. In a cluster environment with more than two nodes, the request for the block is sent to the holder of the block through the resource master and includes an additional message.

The average wait time and total wait time should be considered when being alerted to performance issues where these particular waits have a high impact. Usually, either interconnects or load issues of SQL execution against a large shared working ser can be found the root cause. The following are the most common block oriented waits

gc current block 2 – way ( 2 Node RAC Environment)

gc current block 3 – way ( 3 or More Node RAC Environment )

Pictorial Description of the steps involved in the above wait events in exclusive mode and shared mode.

gc current block 2 – way ( Exclusive Mode)

gc current block 2 – way ( Shared Mode)

gc current block 3 – way ( Exclusive Mode)

gc current block 3 – way ( Shared Mode)

More Events Shortly :)

AWR TOP 5 Timed Events Analysis

Search Oracle Related Sites