In part 1 of this mini-series we looked at the effects of costing a tablescan serially and then parallel when the maxthr and slavethr statistics had not been set.
In part 2 we looked at the effect of setting just the maxthr - and this can happen if you don’t happen to do any parallel execution while the stats collection is going on.
In part 3 we’re going to look at the two variations the optimizer displays when both statistics have been set. So here are the starting system stats:
begin dbms_stats.delete_system_stats; dbms_stats.set_system_stats('MBRC', 64); dbms_stats.set_system_stats('MREADTIM', 10); dbms_stats.set_system_stats('SREADTIM', 5); dbms_stats.set_system_stats('CPUSPEED', 2000); dbms_stats.set_system_stats('MAXTHR', 262144); dbms_stats.set_system_stats('SLAVETHR', 65536); dbms_stats.set_system_stats('SLAVETHR', 47000); dbms_stats.set_system_stats('SLAVETHR', 16384); end; /
You’ll notice that I’ve shown three options for slavethr so, when running the tests, I will be commenting out two of them. The middle value is the important one as I’ve set it just below a critical breakpoint. You’ll recall that the optimizer is programmed to behave as if a parallel slave will operate at 90% of the speed of a serial process. If we take the 64 block read, at 8KB per block, completed in 10 ms, this represents 52428.8 bytes per ms. 90% of that is 47,186 bytes per ms – hence the choice for slavethr in the second of the tests.
Remember that a serial tablescan of my data had an I/O cost of 1,251 (or 1,250 is you ignore the “tablescan cost plus 1″ effect) and that we could investigate the parallel costs by reference to the original serial cost compared to the degree of parallelism. We’re going to do that again, but in this case I’m going to run my tablescan just once (at parallel degree 5) for each of the three values of slavethr (lowest to highest) in turn.
Here are the resulting execution plans:
slavethr=16384 ---------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib | ---------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 5 | 800 (0)| 00:00:05 | | | | | 1 | SORT AGGREGATE | | 1 | 5 | | | | | | | 2 | PX COORDINATOR | | | | | | | | | | 3 | PX SEND QC (RANDOM) | :TQ10000 | 1 | 5 | | | Q1,00 | P->S | QC (RAND) | | 4 | SORT AGGREGATE | | 1 | 5 | | | Q1,00 | PCWP | | | 5 | PX BLOCK ITERATOR | | 40000 | 195K| 800 (0)| 00:00:05 | Q1,00 | PCWC | | | 6 | TABLE ACCESS FULL| T1 | 40000 | 195K| 800 (0)| 00:00:05 | Q1,00 | PCWP | | ---------------------------------------------------------------------------------------------------------------- IO_COST CPU_COST COST ---------- ---------- ---------- 800 1333333 800 slavethr=47000 ---------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib | ---------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 5 | 279 (0)| 00:00:02 | | | | | 1 | SORT AGGREGATE | | 1 | 5 | | | | | | | 2 | PX COORDINATOR | | | | | | | | | | 3 | PX SEND QC (RANDOM) | :TQ10000 | 1 | 5 | | | Q1,00 | P->S | QC (RAND) | | 4 | SORT AGGREGATE | | 1 | 5 | | | Q1,00 | PCWP | | | 5 | PX BLOCK ITERATOR | | 40000 | 195K| 279 (0)| 00:00:02 | Q1,00 | PCWC | | | 6 | TABLE ACCESS FULL| T1 | 40000 | 195K| 279 (0)| 00:00:02 | Q1,00 | PCWP | | ---------------------------------------------------------------------------------------------------------------- IO_COST CPU_COST COST ---------- ---------- ---------- 279 1333333 279 slavethr=65536 ---------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib | ---------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 5 | 278 (0)| 00:00:02 | | | | | 1 | SORT AGGREGATE | | 1 | 5 | | | | | | | 2 | PX COORDINATOR | | | | | | | | | | 3 | PX SEND QC (RANDOM) | :TQ10000 | 1 | 5 | | | Q1,00 | P->S | QC (RAND) | | 4 | SORT AGGREGATE | | 1 | 5 | | | Q1,00 | PCWP | | | 5 | PX BLOCK ITERATOR | | 40000 | 195K| 278 (0)| 00:00:02 | Q1,00 | PCWC | | | 6 | TABLE ACCESS FULL| T1 | 40000 | 195K| 278 (0)| 00:00:02 | Q1,00 | PCWP | | ---------------------------------------------------------------------------------------------------------------- 13 rows selected. IO_COST CPU_COST COST ---------- ---------- ---------- 278 1333333 278
As a starting point, we can say that the modified cost is always going to be: 1250 * serial throughput rate / parallel throughput rate where, in this test suite, the serial throughput rate in bytes per ms is 64 * 8K / 10 = 52428.8
Working from the top down:
When slavethr = 16384 the aggregate throughput rate is 5 * 16384 = 81920, so the I/O cost should be 1250 * 52428.8/81920 = 800 (Q.E.D)
When slavethr = 41000 the aggregate throughput rate is 5 * 47000 = 235,000 so the I/O cost should be 1250 * 52428.8/205000 = 279 (Q.E.D) You’ll notice that this is very close to the figure I had from the first test when I didn’t have maxthr or slavethr set and the optimizer used its “90% of serial” trick.
When slavethr = 65536, something odd has happened – instead of a significant change in I/O cost, the result actually matches the figure we got slavethr wasn’t set. The rule is simple – if slavethr is larger than the throughput implied by mbrc (etc.) the optimizer ignores it and falls back to the “90% of serial” model.
Reminder.
I’ve been showing you how Oracle does the arithmetic with the statistics it has. It’s very important to remember that this is just arithmetic – it’s Oracle trying to work out the best (likely) execution plan given some assumptions about what ought to be the limiting factors when the query runs. In effect the arithmetic can have the effect of saying: “if we assume (based on the statistics) that we can’t do better than parallel 6 then the best plan is P” – but if the hint actually says /*+ parallel(t1 42) */ then at run time Oracle will take the plan that’s appropriate for running parallel 6 and try to run it at parallel 42 – and that may be a big mistake.
Warning: The manuals say that maxthr and slavethr are stored as bytes per second; it seems that they’re really bytes per millisecond in (at least) 10g and 11g, but change to bytes per second in 12c. If you upgrade to 12c, make sure you check your system statistics before and after the upgrade to make sure that you have allowed for this change otherwise you may find that Oracle becomes very unenthusiastic about running parallel queries.