AWR TOP 5 Timed Events Analysis: May 2011

Tuesday, May 31, 2011

AWR Top 5 Timed Events - TCP Socket (KGAS)

TCP Socket (KGAS)

A session is waiting for an external host to provide requested data over a network socket. The time that this wait event tracks does not indicate a problem, and even a long wait time is not a reason to contact Oracle Support. It naturally takes time for data to flow between hosts over a network, and for the remote aspect of an application to process any request made to it. An application that communicates with a remote host must wait until the data it will read has arrived.

"The db session cannot proceed to do anything else until the external host provides the requested data over the network socket. "

KGAS = > is a component in the server (wait event), that handles TCP/IP sockets on Oracle 10.2+; (packages -> UTL_TCP, UTP_SMTP, UTP_HTTP, ...)

When there is a network delay between application server and database server we see this event mostly. In our case we have found a report in an application which is generating huge data is the cause for this wait event.

This wait event some ties relates to mail server was down and the procedure tries to open a new connection with the SMTP server and it takes 75 /20 seconds to timeout on various operating systems.

On 10g Release 2 there is bug reported by oracle support BUG:5490208 - TIME WAIT TCP SOCKET KGAS

Tuesday, May 24, 2011

AWR Top 5 Timed Events - resmgr: cpu quantum

resmgr: cpu quantum -- It shows that the sessions are waiting for CPU allocation.

This event occurs when the resource manager is enabled and is throttling CPU consumption. To reduce the occurrence of this wait event, increase the CPU allocation for the sessions’s current consumer group. There are many bugs in 10g related to Resource Manager.

Wait event 'resmgr: cpu quantum' is appearing in the top 5 wait events.

First of all it is important to understand that the 'resmgr:cpu quantum' is a normal wait event used by Resource Manager to control CPU distribution. Whenever Resource Manager is in use this event will be seen in the database. It becomes a problem when it is one the top waiter events in the database.

I will try to illustrate it with an example. Let assume the following multi-level plan.

SYS_GROUP 100% CPU_Level 1 - user TEST_SYS
OPER_GROUP 100% CPU_Level 2 - user TEST_OPER
LOW_GROUP 100% CPU_Level 3 - user TEST_LOW

The following query will give you the timings for wait event 'resmgr:cpu quantum' on a user-basis

select s.username, se.event, sum(total_waits) total_waits, sum(time_waited) total_time_waited, avg(average_wait) avg_wait from v$session_event se, v$session s where se.event = 'resmgr:cpu quantum' and se.sid = s.sid group by s.username, se.event;

USERNAME EVENT TOTAL_WAITS TOTAL_TIME_WAITED AVG_WAIT
---------- -------------------- ----------- ------------------ --------
TEST_SYS resmgr:cpu quantum 13 11 0.88
TEST_OPER resmgr:cpu quantum 26 108 4.17
TEST_LOW resmgr:cpu quantum 8 62019 7752.33

This output clearly illustrate that user TEST_SYS, although it also has timings for the 'resmgr:cpu quantum', the total time waited is very low. Our next group TEST_OPER who receives 100% CPU at level 2 is getting more waits and the waits last longer. And our third group TEST_LOW is not getting that much 'resmgr:cpu quantum' wait calls, basically because there is an intelligent system behind the polling, but as the output clearly shows thet wait time is much-much higher.

So, when you receive waits for 'resmgr:cpu quantum'

1. Verify the AWR/statspack reports to see whether it is appearing as a top waiter ?

2. Verify the wait event on a user basis to see which user has the most wait timings and compare this to the consumer group directives. There may be a very logical explanation.

3. If the above is verified, check the following

3.1 NUMA architecture ?

Starting with 10.2.0.4 NUMA is enabled by default in the database. Wo when we detect a NUMA enabled machine this will automatically be enabled at the Oracle level as well, however if the system has not specifically been setup and tuned to make use of this architecture then the Resource Manager may cause unnecessary waits for 'resmgr:cpu quantum'

Download and apply patch for bug:8199533 to disable NUMA optimization at the database layer .

As a short term solution or by means of test, you can enable the following initialization parameter:
_enable_NUMA_optimization=FALSE

However, Oracle Customer Support recommended using fix for BUG:8199533 to disable NUMA. The patch is rolling upgradeable.

3.2 Version pre-11gR2 ?

BUG:7510766: Resource Manager is over throttling

This is practically a pre-requistite for most versions .Fixed in 11g Release2 and planned to be included in patchset 10.2.0.5, all lower versions require patching

3.3 version pre-11gR1 ?

Bug:8221960: Waits for "RESMGR CPU QUANTUM"

Make sure to have a merge fix with the above bug (#7510766) in 3.1
3.4 Is the wait event observed during the time RMAN is in use ?

Bug:6874858: Poor performance with Resource Manager when RMAN is running

Fixed in 11g Release2 and planned to be included in patchset 10.2.0.5 and 11.1.0.8

3.5 Is the wait event observed during the time a datafile is added ?

Bug:4602661: Adding a datafile hangs other sessions and new connections with resource manager

Affects all versions pre-10.2.0.3
Typically this is associated with another wait event: 'ksfd: async disk IO'

Friday, May 6, 2011

AWR TOP 5 Timed Event Analysis - enq: TX – index contention

enq: TX – index contention

Waits for TX in mode 4 also occur when a transaction inserting a row in an index has to wait for the end of an index block split being done by another transaction.

This type of TX enqueue wait corresponds to the wait event enq: TX – index contention. This wait event shows one process waiting for another to complete an index block split.

If there is an index on a column that is being populated from a call to sequence.nextval on insert then you will most likely see many of these waits if your system inserts into the table very frequently from a number of concurrent sessions.

One solution for this problem is to identify which index is waiting for and try to create reverse key index if it is a single instance. If it is RAC then one has to increase the sequence cache value to few hundreds to thousands so that it reduces the competition for the same lead node across participating instances in RAC.

In our case we could able to get rid of this by increasing the Cache value for sequences where we have indexes.

AWR TOP 5 Timed Events Analysis

Search Oracle Related Sites