ora_java: ITL lock analysis

This is something I wrote years back. It was wrote for oracle8i. But the principle should still apply to 10g.

Introduction
To take advantage of the processing power of multiple CPUs in SMP machine, today's applications should parallelize application, especially for batch job. The batch application needs to spawn multiple threads/processes, each process/thread use same logic and code to deals with different subset of work set. This architecture fully utilizes the cpus in the hardware box. In the meantime, it introduces various contentions, which should be considered and monitored during design phase and implementation phase. Among them, how to avoid lock is one of top issues designer/architect should think about. Oracle is known for its superb row level transaction lock mechanism, which minimizes the locking scope and makes developer's life much easier. However, given RDBMS is such a complicated software, it is understandable that it has lots of locking mechanism other than row level transaction lock. Unfortunately, some of the locks are in block level rather than row level, which means they lock more rows, hence application gets more chances to be blocked and deadlock is more likely to occur. ITL lock is such a block level lock. It is common in multi-threaded application, though we rarely see it in other applications. This article will discuss the cause of ITL lock, how to identify it and how to avoid it.

1. What is ITL?
In Oracle's data block, there is a data structure called transaction header. The transaction header consists of 24 bytes of control information followed by an array of one or more transaction slots.
This array is called the interested transaction list (ITL), and each slot may contain an interested transaction entry (ITE). The initial number of ITL slots is set by the INITRANS parameter for the segment
According to Steve Adams' excellent website http://www.ixora.com.au, this is what the ITL looks like in a block dump.

Itl Xid Uba Flag Lck Scn/Fsc
0x01 xid: 0x0000.00f.00000005 uba: 0x0040000b.0000.12 C--- 0 scn 0x0000.000000ac
0x02 xid: 0x0002.003.00000027 uba: 0x00c00004.0019.01 --U- 1 fsc 0x0000.00021b7c
Each ITL entry/slot contains the following fields.

transaction id 8 bytes This is comprised of the rollback segment number,
the slot number in the transaction table of that rollback segment,
and the number of times use of that transaction table has wrapped .

undo block address 8 bytes The location of the undo for the most recent change to this block by this transaction.
This is comprised of the DBA of the rollback segment block ,
the sequence number, and the record number for the change in that undo block

flags 1 nibble ---- = transaction is active, or committed pending cleanout
C--- = transaction has been committed and locks cleaned out
-B-- = this undo record contains the undo for this ITL entry
--U- = transaction committed (maybe long ago); SCN is an upper bound
both fast commit and delayed cleanout can set this flag
---T = transaction was still active at block cleanout SCN

locks 3 nibbles The number of row-level locks held in the block by this transaction.

SCN or free
space credit 6 bytes if the transaction has been cleaned out, this is the commit SCN or an upper bound thereof.
Otherwise the leading two bytes contain the free space credit for the transaction -
that is, the number of bytes freed in the block by the transaction.

Every transaction that doing INSERT/DELETE/UPDATE rows on this block needs and only needs one ITL slot in this block. When the transaction is committed, the ITL slot is released and is able to serve other transactions. When the transaction is rolled back, if the ITL slot was newly allocated for this transaction, the space (24 bytes) it takes will be returned as free space, otherwise, just as committed transaction, this ITL slot will be able to serve other transaction. If all existing ITL slots are being used and there is a transaction which hasn't have a corresponding ITL slot in this block wants to INSERT/DELETE/UPDATE the rows in this block, Oracle will try to find free space to use as this transaction's ITL. If it fails to find 24 bytes free space, lock occurs. The session that owns the transaction will wait for a one of the transactions that are holding ITL slots in this block. To make things worse, even when some of the transactions holding ITL slots commit/rollback since, the blocked session has to wait as long as the transaction he is waiting for hasn't committed/rollback. Obviously, this lock can lead to deadlock. And some of the facts are:
a) Index won't cause ITL shortage. Due to the structure of index, when there is no space, the block will get split. Interestingly, the default initrans for index is 2, don't be fooled to think it indicates index is easier to cause ITL contention. It is because it has to keep an extra slot for recursive transaction to split the block
b) Insert is unlikely to cause contention. If the block doesn't have free space, it will be moved out of the freelist. Logically, Oracle can insert a record into any block, whilst Oracle can't choose block when the dml is delete or update no matter if there isn't enough free space for a new ITL slot.
c) Delete and update can cause ITL lock. Among them, update is by far the most likely culprit. Frequently update can eat up the free space (which is 10% of block by default), and eventually cause Oracle can't find place for ITL.

This is a simple example of how to produce an ITL lock/deadlock.
a) create table test_itl(a number) pctfree 0;
b) insert into test_itl select rownum from all_objects where rownum<2000;
commit;
c) select min(a) from test_itl where dbms_rowid.rowid_block_number(rowid)<>
(select dbms_rowid.rowid_block_number(rowid) from test_itl where a=1);
d) open a session session1 in sqlplus;
update test_itl
set a=1
where a=1;
e) open another session session2 in sqlplus:
update test_itl
set a=
where a=;
f) go back to session 1,
update test_itl
set a=+1
where a=+1;
You'll see the session hang
g) go back to session 2,
update test_itl
set a=2
where a=2;
You'll see the session hang.
Then after a while(3 seconds), you'll find a new file is generated under $ORACLE_UDUMP, indicating a deadlock has occured.

2. How to identify ITL lock problem:
ITL lock is an enqueue wait. Every time it occurs, it will increment the enqueue statistics in Oracle's v$ views.
However, there are lots of reasons can cause enqueue wait, for instance, row level lock. We don't know if the enqueues we are seeing in v$ view are really caused by ITL lock. Fortunately, deadlock, which is worst case of lock, can help us identify the problem. It makes sense that if you do have severe ITL contention, you'll see some deadlock. Deadlock ORA-00060 will trigger Oracle to dump a trace file under $ORACLE_UDUMP, at the beginning of the trace file, you can see the offending sql statement. like:
*** SESSION ID:(10.20) 2004-05-21 10:37:57.716
DEADLOCK DETECTED
Current SQL statement for this session:
UPDATE TEST_ITL SET A=:b1 WHERE A = :b1

As we discussed earlier, ITL lock tends to be caused by update statement

Then below offending sql statement, there is deadlock graph, like:

Deadlock graph:
---------Blocker(s)-------- ---------Waiter(s)---------
Resource Name process session holds waits process session holds waits
TX-0010001c-000067a1 33 129 X 153 212 S
TX-003a0048-0000616e 153 212 X 94 170 S
TX-003c0038-00007f15 94 170 X 33 129 S

Now here, you should pay attention to couple of things:
a) Resource Name: it usually is either started with TX or TM. Normally, the deadlock caused by no index on foreign key waits for TM lock. So here we see it is TX, we can rule out foreign key problem.
b) Blocker(s) holds and Waiter(s) waits. We know for deadlock caused by row level lock, both holds and waits are exclusive mode, which is 'X'. Here holds mode is exclusive ('X') and waits mode is share ('S'). There for, row level lock caused deadlock is ruled out.

According Metalink Note:62354.1, there are 3 scenarios can cause this TX share mode waits on TX exclusive mode.
a) ITL shortage,
b) Waits due to Unique or Primary Key Constraint enforcement
For example, if a session deleted a parent record but hasn't commit, then the other session trying to insert a new record in child table referencing to the deleted row by the first session in parent table has to wait for first session to commit or rollback.
c) Waits due to rows being covered by the same BITMAP index fragment.

Actually, I believe there could be other reasons cause this lock scenario. For example if more than one sessions are inserting into a table with a primary key on a sequence generated id, when a session is in the process to split the index branch lock (which should be a right most branch block), other sessions want to write that index branch block have to wait in TX shared mode. But obviously this situation won't lead to deadlock. To ensure this is an ITL shortage caused deadlock, we have to look further into the trace file.

If you search "type: 17" in the trace file, sometimes the search fails, but sometimes you'll find the block dump on the block the lock occurred, like:
SO: 63d9cb508, type: 17, owner: 640f6f0d0, pt: 0, flag: INIT/-/-/0x00
(buffer) (CR) PR: 0x62bd55878 FLG: 0x100400
lock rls: 0x0, class bit: 0x0
kcbbfbp: [BH: 0x63bdc7210, LINK: 0x63d9cb540]
where: kdswh05: kdsgrp, why: 0
buffer tsn: 18 rdba: 0x044074bc (17/29884)
scn: 0x0000.119d2bc2 seq: 0x00 flg: 0x00 tail: 0x2bc20600
frmt: 0x02 chkval: 0x0000 type: 0x06=trans data

Block header dump: 0x044074bc
Object id on Block? Y
seg/obj: 0xcb61 csc: 0x00.119d2bc2 itc: 11 flg: - typ: 1 - DATA
fsl: 0 fnx: 0x0 ver: 0x01

Itl Xid Uba Flag Lck Scn/Fsc
0x01 xid: 0x003c.045.00007dc8 uba: 0x0a888023.00e9.17 C--- 0 scn 0x0000.11920ffc
0x02 xid: 0x000f.04b.000067aa uba: 0x0104b523.0078.08 C--- 0 scn 0x0000.11920f9a
.....
0x0a xid: 0x000f.000.000067ed uba: 0x0104b6f9.0078.12 C--- 0 scn 0x0000.11921023
0x0b xid: 0x003c.054.00007dac uba: 0x0a8882ab.00e9.22 C--- 0 scn 0x0000.11921084

data_block_dump
===============
tsiz: 0x1ec8
hsiz: 0x64
pbl: 0x5e72a0134
bdba: 0x044074bc
flag=-----------
ntab=1
nrow=41
frre=-1
fsbo=0x64
fseo=0x6b
avsp=0x7
tosp=0x7
0xe:pti[0] nrow=41 offs=0
0x12:pri[0] offs=0x1e2f
0x14:pri[1] offs=0x1d89
...
0x5e:pri[38] offs=0x249
0x60:pri[39] offs=0x142
0x62:pri[40] offs=0x6b
block_row_dump:
tab 0, row 0, @0x1e2f
tl: 153 fb: --H-FL-- lb: 0x0 cc: 12
col 0: [ 5] c4 08 25 0d 3c
col 1: [ 4] 49 4e 53 54
col 2: [ 5] c4 06 32 5c 55
col 3: [ 1] 45
col 4: [27]
58 45 5f 49 4e 53 54 52 55 4d 45 4e 54 2e 4c 4f 4e 47 5f 44 45 53 43 5f 54
58 54
col 5: [23]
4d 41 4e 45 4c 45 20 42 41 59 20 56 45 4e 54 55 52 45 53 20 49 4e 43
col 6: [23]
4d 41 4e 45 4c 45 20 42 41 59 20 56 45 4e 54 55 52 45 53 20 49 4e 43
col 7: [ 7] 78 66 0c 0e 0c 21 27
col 8: [13] 42 45 20 49 4e 53 54 52 55 4d 45 4e 54
col 9: [10] 32 30 33 30 33 33 35 38 35 35
col 10: [13] 42 45 20 49 4e 53 54 52 55 4d 45 4e 54
col 11: [ 7] 78 67 06 0c 18 33 04
tab 0, row 1, @0x1d89
tl: 166 fb: --H-FL-- lb: 0x0 cc: 12
col 0: [ 5] c4 08 25 0d 45
col 1: [ 4] 49 4e 53 54
col 2: [ 5] c4 06 32 5c 32
col 3: [ 1] 45
col 4: [27]
58 45 5f 49 4e 53 54 52 55 4d 45 4e 54 2e 53 48 52 54 5f 44 45 53 43 5f 54
58 54
col 5: [30]
43 41 4c 4c 20 4e 41 53 44 41 51 20 31 30 30 20 20 20 4a 41 4e 20 30 32 37
20 2a 2a 2a 2a
col 6: [30]
43 41 4c 4c 20 4e 41 53 44 41 51 20 31 30 30 20 20 20 4a 41 4e 20 30 32 37
20 2a 2a 2a 2a
col 7: [ 7] 78 66 0c 0e 0c 21 27
col 8: [13] 42 45 20 49 4e 53 54 52 55 4d 45 4e 54
col 9: [ 9] 33 36 32 31 33 36 30 38 37
col 10: [13] 42 45 20 49 4e 53 54 52 55 4d 45 4e 54
col 11: [ 7] 78 67 03 04 09 2a 2e
tab 0, row 2, @0x1cfa
...
end_of_block_dump

Here it tells us there are 11 ITL slots (itc:11) in this block and free space is 7 (tosp=0x7). Now if there is 12th transaction try to insert/update/delete in this block, there is no space to allocate a new ITL slot. Therefore, it is very likely an ITL shortage caused deadlock. On the other hand, because this is a CR read block image from the session that detect the deadlock, means the changes on the block made by other transactions yet to commit when the dump occurred aren't reflected in this block dump. It is possible that you can see lots of free space in block dump, but actually there is no available space in current mode block, mainly caused by update on rows increase the row size greatly. That is why the mode of each ITL slot is 'C', means they are committed and reusable. As we mentioned earlier, sometimes Oracle doesn't dump the offending block at all. If that is the case, we can dump the block by our own. If the offending sql is an insert statement, we can almost rule out ITL shortage. If it is update or delete, we are able to know which row it intended to change even if the sql is using bind variable, because you can find the value of bind variable in cursor section (search for Cursor Dump)of the dump file. By knowing the record, we can know the file number block number by select dbms_rowid.rowid_block_number(rowid) and dbms_rowid.rowid_relative_fno(rowid) on that row. Then we dump the block to see if there is plenty of free space. Like: alter system dump datafile block min block max .

Another way to identify ITL lock/deadlock is to look at the wait event history in the dump file which is generated by deadlock, if it previously has waited for ITL enqueue, then likely the deadlock is still on ITL.

The above talked about how to identify if the deadlock is caused by ITL shortage. A DBA should be proactive to get the information how likely the ITL problem could occur. If something is near broken, then DBA should take action immediately rather than waiting for lock/deadlock to occur. The idea is to get the average number ITL slots in each block and average free space, compare them with the initrans and initial free space (which is *pctfree):

select (-66-(2+avg_row_len)*(num_rows/blocks)-avg_space)/24 avg_itl_slots,
avg_space,
ini_trans,
*pct_free/100 ini_free_space
from user_tables order by avg_space

avg_itl_slots is actually the concurrent transactions high watermark on the block. If you find a table with little average space and avg_itl_slots is much bigger than ini_trans, the table tends to cause or already caused ITL contention. If you don't set freelists correctly and you populate a table in parallel, you'd see a very high avg_itl_slots, it is fairly safe and you shouldn't worry about it. Also be noted that the formula to calculate avg_itl_slots only can be used when the table's statistics was generated by analyze utility. If the statistics was generated by dbms_stats package, the formula should be modified to (-66-(2+3+avg_row_len)*(num_rows/blocks)-avg_space)/24, since dbms_stats doesn't include 3-byte row header when calculate average row length.

3. How to prevent ITL lock?

In previous section, we discussed for a DBA how to identify ITL lock. Here we'll talk about how to fix the problem and how to prevent the problem in first place, i.e. in design and implementation phase.

a) To be perfectly safe, you can set initrans to be the small one of the number of possible concurrent processes accessing each block and the number of rows each block could possibly hold. For example, you have a pretty fat table with 400 bytes for each row (you can get it by querying select avg_row_len from user_tables), and you block size is 8192, you don't need set initrans to be anything more than 20. Since 8i, you can change initrans on the fly. However, the setting will only be effective for the new allocated blocks, i.e., your existing blocks of the table will continue to suffer ITL shortage. So, you should:
alter table initrans
alter table move
rebuild all the indices of the table
analyze table/index appropriately.

The problem of this solution is you may waste some space and if clients decides to increase application's parallel degree, they have to remember alter initrans and rebuild table/indices.

b) Increase pctfree. Oracle would have more chances to succeed in getting free space in block when is asked for new ITL by transaction. Same as bouncing initrans, you need to rebuild table and indices after increasing pctfree. This solution has the same problem as increasing initrans.

c) Commit more frequently. Let's say your application spawn 20 threads/processes to do some work. The table has 100K blocks, each block hold average 100 rows. If the each session commit after updating 100 rows and The chance of 20 sessions updating a block at the same time will be way lower than each session commit every 1000 rows. There has already been a lot of discussion of how often the commit should be, in perspective of performance and data integrity. To me, I think it is up to your application. You should benchmark your application. If your benchmark shows 1000 rows/commit doesn't buy you anything than 100 rows/commit (which I think would be the case in most applications), barring the test result is affected by ITL lock, why would you insist on 1000 rows/commit? The drawback of this solution is there is no guarantee there would be no ITL shortage.

d) Reduce the parallel degree of application. It is obvious that 30 concurrent sessions are less likely to mess up with each other in on block than 60 concurrent sessions. Again same as in the discussion of commit frequency, you should do test in implementation phase. If the result turns out to be using 30 concurrent sessions isn't slower than 60 concurrent sessions, and you are sure ITL lock is a non-factor in the test, why would you insist on using 60 threads/processes?

e) Make each thread/process deal with different blocks. Normally, when assign work set, we only need to be sure it won't cause row level lock. Ideally, Application should try to make each thread/process update on different block. Thus, ITL lock will be eliminated, and the application would be more OPS/RAC ready. Of course, an application based on Oracle's proprietary block structure(rowid) is too extreme from software design perspective. On the other hand, depends on application, sometimes it is just impossible. For instance, it is apparent that you can't do it if your application needs to update a parent table and its child table as well. Since if you somehow manage to partition your application to make sure each parent table block is only updated by one thread, how can you make sure child rows of this parent rows in this block, which surely are scattered in lots of different blocks, are updated by one thread in the meantime? Another idea is to try to partition the related the tables. It is too complex and more importantly, just the same reason as threading by block, it isn't feasible in many occasions.

4. Miscellaneous.
Does Index Organized Table have ITL shortage problem? The answer is yes. You might be confused because we just claimed that index doesn't have this problem due to block split. IOT is a table organized like an index, it will do block split when there is no free space, how can it have ITL problem? Well, if the DML on IOT change the primary key, it'll incur block split should there is no free space. But considering the situation that you have a IOT which has five columns, the first four of them are primary key, if you only update the fifth column, it is just the do a normal update on a normal table, thus could cause wait on ITL, hence could could deadlock due to ITL shortage.

5. Conclusion.
The article shows how ITL shortage can cause lock, even deadlock, and how to eliminate it in design and implemention and how to identify it in production.

8 comments:

Unknown said...: Can we apply the same formula also for tables located in ASSM tablespaces?

Your post was very analytical and well explained

Thank you
khair; August 27, 2009 at 10:13 PM
Anonymous said...: I am final, I am sorry, but it does not approach me. There are other variants?; October 13, 2010 at 12:15 PM
Anonymous said...: You are not right. I suggest it to discuss.; October 15, 2010 at 7:41 AM
Anonymous said...: [url=http://dcxvssh.com]HpNCnI[/url] , cepvbStc - http://yuxeflk.com; December 12, 2012 at 4:52 AM
Anonymous said...: It's enormous that you are getting thoughts from this article as well as from our dialogue made at this place.

My website bucket truck for sale
Stop by my web blog :: bucket truck; February 3, 2013 at 11:26 AM
Anonymous said...: [url=http://aluejxfttk.com]oMYAmiaVGuPNKdHfg[/url] , ShjyKlbnZISQxUgV , http://pyfnknfrtw.com; February 13, 2013 at 7:14 AM
Anonymous said...: This ԁesign іѕ spectacular! You most сertainly knοw how to kеeρ a reаder
аmused. Between youг wit and your vidеos,
I was almoѕt moved tο start my own blog (well,
аlmost...HaHа!) Fаntаstіc jοb.
I reаlly enjoyed what you had to saу, and more than that, hοw you presented it.
Tοo cool!

Here iѕ my web pagе ... http://www.tensunitsforpain.com/
my web site: tens therapy units; February 23, 2013 at 1:51 AM
Anonymous said...: What's up, just wanted to mention, I enjoyed this post. It was inspiring. Keep on posting!

Check out my blog: muscle stimulator
My website :: tens; February 24, 2013 at 3:49 AM

ora_java

Wednesday, August 22, 2007

ITL lock analysis

8 comments:

Blog Archive

About Me