PX Deq Credit: send blkd caused by IDE (SQL Developer, Toad, PL/SQL Developer)
The popular IDE’s used by developers and DBA’s today have default settings to only return a subset of a queries results. This is a handy feature to get quick results on your screen before you have the IDE “complete” your query by asking for all rows, export the data, or something of that nature. In SQL Developer, the option is “Sql Array Fetch Size” and I am seeing mine is currently set to 50. This means SQL Developer will only get up to 50 rows at a time until I cause it to go fetch more. When a parallel query is executed through the IDE, the “main” session is turned into a QC (Query Coordinator) and it gets a specific number of PX slaves. Under the hood, the IDE is only fetching the first 50 rows, and waiting until you tell it to go get more. At this point, the PX slaves just sit there, waiting on the QC to say “gimme some more rows please”. So the problem happens when the user for some reason is satisfied with the first set of data on their screen and don’t need anymore, so they don’t cause the IDE to go get anymore data. In the background, the QC session is just waiting on the user, and the PX slaves are just waiting on their QC. I see on the database a user session that is INACTIVE with a wait event of “SQL*Net message from client” acting as a QC, and its slave sessions ACTIVE with a wait event of “PX Deq Credit: send blkd” or “PX Deq Credit: need buffer”.
I recently noticed this wait event as the top event on my DSS database. A quick search of what exactly this is will lead you to believe that it is an idle event, and shouldn’t warrant any attention. Oracle’s Metalink document 271767.1 “WAITEVENT: “PX Deq Credit: send blkd”" states this is an idle event, and a simple example they provide to reproduce it is to run “select /*+ parallel(sales, 10) +/ * from sales)”. (That typo in the hint is Oracle’s, not mine, I had to leave it for your entertainment purposes). This will cause the QC to become overloaded with messages from the slaves, theoretically 10 would be too many in this case, and the slaves begin waiting on the QC. In this situation, it might be possible to better tune the parallel settings in the database so this doesn’t happen as often.
In this typical case, the worst thing that seems to happen is that you get too many parallel slaves for a query waiting for a time, and once the query is done processing, so the slaves stop waiting, and move on to other sessions/work. So I assume the reason for the nonchalant behavior towards this event is justified in that case, BUT I have found this other case is affecting my database in a far greater fashion, because PX slaves are being occupied for hours at a time, causing other queries degree of parallelism to be downgraded or removed altogether. In a very busy database, this can make a big difference on performance.
Now I am going to show you how to reproduce this, so you can understand better what is happening. First put the below query into a script called pq.sql to monitor the parallel query sessions
col username for a12
col “QC SID” for A6
col “SID” for A6
col “QC/Slave” for A8
col “Req. DOP” for 9999
col “Actual DOP” for 9999
col “Slaveset” for A8
col “Slave INST” for A9
col “QC INST” for A6
set pages 300 lines 300
col wait_event format a30select
decode(px.qcinst_id,NULL,username,
‘ – ‘||lower(substr(pp.SERVER_NAME,
length(pp.SERVER_NAME)-4,4) ) )”Username”,
decode(px.qcinst_id,NULL, ‘QC’, ‘(Slave)’) “QC/Slave” ,
to_char( px.server_set) “SlaveSet”,
to_char(s.sid) “SID”,
to_char(px.inst_id) “Slave INST”,
decode(sw.state,’WAITING’, ‘WAIT’, ‘NOT WAIT’ ) as STATE,
case sw.state WHEN ‘WAITING’ THEN substr(sw.event,1,30) ELSE NULL end as wait_event ,
decode(px.qcinst_id, NULL ,to_char(s.sid) ,px.qcsid) “QC SID”,
to_char(px.qcinst_id) “QC INST”,
px.req_degree “Req. DOP”,
px.degree “Actual DOP”
from gv$px_session px,
gv$session s ,
gv$px_process pp,
gv$session_wait sw
where px.sid=s.sid (+)
and px.serial#=s.serial#(+)
and px.inst_id = s.inst_id(+)
and px.sid = pp.sid (+)
and px.serial#=pp.serial#(+)
and sw.sid = s.sid
and sw.inst_id = s.inst_id
order by
decode(px.QCINST_ID, NULL, px.INST_ID, px.QCINST_ID),
px.QCSID,
decode(px.SERVER_GROUP, NULL, 0, px.SERVER_GROUP),
px.SERVER_SET,
px.INST_ID
/
Now, using your IDE, ( I am using SQL Developer with the Sql Array Fetch Size set to 50 ) run a sample parallel query, and allow the first results to show up:
select /*+ parallel(bigtable, 4) +*/ * from bigtable;
Now, in another database session, run the pq.sql script to see the issue in action:
SQL> @pq
Username QC/Slave SlaveSet SID Slave INS STATE WAIT_EVENT QC SID QC INS Req. DOP Actual DOP
———— ——– ——– —— ——— ——– —————————— —— —— ——– ———-
USER111 QC 515 1 WAIT SQL*Net message from client 515
- p000 (Slave) 1 510 1 WAIT PX Deq Credit: send blkd 515 1 4 4
- p001 (Slave) 1 500 1 WAIT PX Deq Credit: send blkd 515 1 4 4
- p003 (Slave) 1 494 1 WAIT PX Deq Credit: send blkd 515 1 4 4
- p002 (Slave) 1 507 1 WAIT PX Deq Credit: send blkd 515 1 4 4
Nice post! :)
karlarao
May 7, 2010 at 3:54 am
This is great information and we saw this in a production DSS DB yesterday where someone sat in a TOAD session for 5 hours holding these idle parallel slaves in a wait state. So, the obvious next question is what can we do about it? Turning off parallelization in DSS environment is not a solution. I did not find a configurable option in TOAD to alleviate the problem and even if there was it would be difficult to get every client to do that. So, I’m thinking we have to do something at the DB server level. Session timeouts would work but are political footballs. Is there anyway to tell Oracle to timeout the slave processes and release them?
Scott Harden
May 21, 2010 at 2:32 pm
Thanks Scott. What I have implemented to help with this problem since this post is to monitor this situation myself. It is pretty easy to identify (A, qc is waiting on SQL*Net message from client. B, qc has at least one slave waiting on PX Deq Credit: send blkd. C, all qc slaves only waiting on PX Deq Credit: send blkd) I have put the below cursor into a PL/SQL package with various business rules that we agreed upon to actually kill the sessions after they meet certain time and specific user requirements.
The thing to remember here is that you don’t necessarily have a problem when you observe this event happening. You may have a problem if the number of parallel slaves is a default number and is therefore high in relation to the number of cores you have AND you are coming up on your maximum number of parallel slaves, then you may indeed have performance issues.
select * from (
select osuser, inst_id, sid, serial#, event, seconds_in_wait,
sum(case slave_event
when ‘PX Deq Credit: send blkd’ then 1
else 0 end) cnt_dq,
sum(case slave_event
when ‘PX Deq Credit: send blkd’ then 0
else 1 end) cnt_other
from (
select lower(gvs_qc.osuser) osuser, pxs_qc.inst_id, pxs_qc.sid, pxs_qc.serial#, qcw.event, qcw.seconds_in_wait,
pxs_slave.inst_id slave_inst_id, pxs_slave.sid slave_sid, pxs_slave.serial# slave_serial#, slavew.event slave_event
from gv$px_session pxs_qc
join gv$session_wait qcw on pxs_qc.inst_id = qcw.inst_id and pxs_qc.sid = qcw.sid
join gv$session gvs_qc on pxs_qc.inst_id = gvs_qc.inst_id and pxs_qc.sid = gvs_qc.sid
join gv$px_session pxs_slave on pxs_qc.inst_id = pxs_slave.qcinst_id and pxs_qc.sid = pxs_slave.qcsid and pxs_qc.serial# = pxs_slave.qcserial#
join gv$session_wait slavew on pxs_slave.inst_id = slavew.inst_id and pxs_slave.sid = slavew.sid
where pxs_qc.sid=pxs_qc.qcsid
and qcw.state = ‘WAITING’
and qcw.event = ‘SQL*Net message from client’
)
group by osuser, inst_id, sid, serial#, event, seconds_in_wait
)
where cnt_dq > 0 and cnt_other = 0;
sys
May 21, 2010 at 2:40 pm
Great post . Thats defintely an issue . I have seen alot of parallel queries running where QC and QS doing nothing .
Hassaan Yar Khan
July 4, 2011 at 5:48 am