Actually, there hasn’t been a “maxthr – 1″, I called the first part of this series“System Stats”. If you look back at it you’ll see that I set up some system statistics, excluding the maxthr and slavethr values, and described how the optimizer would calculate the cost of a serial tablescan, then I followed this up with a brief description of how the calculations changed if I hinted the optimizer into a parallel tablescan.
The worrying thing about the note was the fact that it looked as if I could get the optimizer to believe it was worth generating a plan that required more parallel execution slaves than were allowed by my setting of the parameter parallel_max_servers. In my example I had the parameter set to 40 but still found the optimizer calculating costs for a plan as if it could run parallel 42 – and I could have hinted parallel 200 and the optimizer would have created a plan appropriate for that degree of parallelism.
So how do you make sure that the optimizer doesn’t do anything quite so silly ? The answer lies in the system statistics slavethr and maxthr (and the latter most significantly). Let’s repeat the tests from the previous posting, but set (just) the maxthr. and see how this affects the total cost, I/O cost, and CPU cost of the queries. Here are my the settings for the system stats:
begin
dbms_stats.delete_system_stats;
dbms_stats.set_system_stats('MBRC',64);
dbms_stats.set_system_stats('MREADTIM',10);
dbms_stats.set_system_stats('SREADTIM',5);
dbms_stats.set_system_stats('CPUSPEED',2000);
dbms_stats.set_system_stats('MAXTHR',262144); -- 256KB
end;
/
Now the three execution plans, with the results of my query to pick out all the costs from one line of the plan table:
Serial
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 5 | 1280 (3)| 00:00:07 |
| 1 | SORT AGGREGATE | | 1 | 5 | | |
| 2 | TABLE ACCESS FULL| T1 | 40000 | 195K| 1280 (3)| 00:00:07 |
---------------------------------------------------------------------------
IO_COST CPU_COST COST
---------- ---------- ----------
1251 290857600 1280
Parallel 5
----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 5 | 278 (0)| 00:00:02 | | | |
| 1 | SORT AGGREGATE | | 1 | 5 | | | | | |
| 2 | PX COORDINATOR | | | | | | | | |
| 3 | PX SEND QC (RANDOM) | :TQ10000 | 1 | 5 | | | Q1,00 | P->S | QC (RAND) |
| 4 | SORT AGGREGATE | | 1 | 5 | | | Q1,00 | PCWP | |
| 5 | PX BLOCK ITERATOR | | 40000 | 195K| 278 (0)| 00:00:02 | Q1,00 | PCWC | |
| 6 | TABLE ACCESS FULL| T1 | 40000 | 195K| 278 (0)| 00:00:02 | Q1,00 | PCWP | |
----------------------------------------------------------------------------------------------------------------
IO_COST CPU_COST COST
---------- ---------- ----------
278 1333333 278
Parallel 42
----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 5 | 250 (0)| 00:00:02 | | | |
| 1 | SORT AGGREGATE | | 1 | 5 | | | | | |
| 2 | PX COORDINATOR | | | | | | | | |
| 3 | PX SEND QC (RANDOM) | :TQ10000 | 1 | 5 | | | Q1,00 | P->S | QC (RAND) |
| 4 | SORT AGGREGATE | | 1 | 5 | | | Q1,00 | PCWP | |
| 5 | PX BLOCK ITERATOR | | 40000 | 195K| 250 (0)| 00:00:02 | Q1,00 | PCWC | |
| 6 | TABLE ACCESS FULL| T1 | 40000 | 195K| 250 (0)| 00:00:02 | Q1,00 | PCWP | |
----------------------------------------------------------------------------------------------------------------
IO_COST CPU_COST COST
---------- ---------- ----------
250 158730 250
As before, we can see that when we hint /*+ parallel(t1 5) */ the CPU cost drops dramatically as the optimizer discards the cost of buffer cache activity; and the I/O cost drops, as before, to 278. So far the figures show no change from the previous experiment, so we can infer that the optimizer is again applying the fudge factor of 90% – a parallel slave can scan at 90% of the rate of the scattered read or to put it another way the optimizer is, in effect, deriving the value for slavethr as 0.9 * mbrc * 8KB / mreadtim (bytes per ms) – which in our case comes to: 47,186 bytes per ms. In simple terms, if you have a serial I/O cost for a tablescan then the parallel I/O cost will be approximately “serial I/O cost / (0.9 * selected degree)” – up to a point.
When we hint /*+ parallel(t1 42) */ the CPU cost again drops to the same value it did in the previous test – the factor of 42 has been applied as far as the CPU is concerned; but the I/O cost has only dropped to 250. Based on my comment in the previous paragraph, it should have dropped to something like: 1250 / (0.9 * 42) = 33, so we have a difference to explain – and that difference is due to the value of maxthr.
The serial cost comes from a calculation that says a single process can scan mbrc * blocksize / mreadtim bytes per millisecond (in our case that would be 52,428.8 bytes per ms); but we have the flat declaration from maxthr that across all parallel slaves we can handle 262,144 bytes per ms. We don’t have to worry about what notional degree of parallelism the optimizer is using (or even if it derives something to represent such a value) – we merely need to point out that if the serial cost at 52,428.8 bytes per ms is 1250, then the best we can do in parallel is 1,250 / (262144/52,428.8) = 250 (Q.E.D)
Simplifying the formula, the lowest I/O cost for parallel execution will be: (serial I/O cost * mbrc * block size) / (mreadtim * maxthr). Don’t forget, though, that the cost reported in the execution plan is the I/O cost plus the CPU cost (then for a quick approximation you can use the CPU % report from the execution plan to work backwards to the I/O cost).
Warning: although the optimizer derives some limiting factors when doing the calculations to choose an execution plan, at run-time it will still try to use the degree of parallelism specified in the hint – so you can end up optimising for one degree of parallelism and running at another. If you gather system stats on any sort of regular (or even irregular) basis, and expect to run parallel queries, you may find that execution plans may change randomly for no apparent reason simply because the value for maxthr doesn’t always get collected.
Next episode: what happens when both the slavethr and the maxthr are set.
Footnote:
The manuals state that maxthr and slavethr are measured in bytes per second; however all my experimentation showed that (at least on my laptop) they were measured in bytes per millisecond. That was true until I finally got around to repeating my test suite on 12c – and found that it assumed the values really were in bytes per second! Warning – if you upgrade from 11g to 12c and have system stats gathered, check what they are and gather them again to see what they look like, otherwise you find Oracle strangely reluctant to run any parallel queries.