Know Your Numbers. A Quick Analysis with AWR and Characteristic Numbers. Part I.

troype, 10/13/2014

SQL statements fired by ODI for ELT jobs sometimes can be extremely complicated. There are huge tables, dozens of joins, sorts, complex predicates with CASE expressions, aggregate and analytical functions and all that jazz. The same applies (and occasionally to a larger extent) to physical queries generated by a BI server.

Such queries and statements can present real performance challenges. And very often these challenges rush in in the least convenient times so they have to be dealt with quickly and efficiently. And (speaking of Murphy's Law) an AWR report is all evidence that we happen to have. The good news is that we can still get some useful information out of it by applying characteristic time analysis methods.

Simplistically speaking (and with a precision acceptable for a quick-and-dirty zero-approach estimate) a statement run time depends on the total amount of logical buffer gets and physical block reads and writes that are needed to produce a result set. Each operation needs time to finish. That time can fluctuate around some average value however it cannot fall short of some absolute minimum. That absolute minimum we call an operation's characteristic time. A term characteristic means that the value depends on hardware characteristics of a particular computer sub-system. For logical IO this would be CPU speed as well as memory bus frequency and memory activation + read/write times. It also depends, to some extent, on CPU cache efficiency however for volume data movement operations (such as ETL jobs or a report with heavy aggregations) CPU cache is not of much help (Unless a cache aware algorithm of some kind was used). For physical input/output system it depends on many factors. A good and thoughtful DBA should find a characteristic IO time per data block that is specific for her storage sub-system.

In an ideal world with no concurrency coming from competing processes an operation's response time will not exceed that characteristic time. With some degree of concurrency a read or a write system call may not be served immediately and the calling process has to wait. This results in Oracle wait event. The number and accumulated duration of wait events (measured for some predefined snapshot of time) characterize a workload and a level of concurrency that this particular sub-system (data storage in our case) has been experiencing. On the other hand knowing IO characteristic times, the number of physical blocks reads and writes, and iowait time we can figure out whether our IO sub-system was stressed without even looking at wait events (checking them however is always a good idea).

When taking concurrency matters into consideration it becomes clear that characteristic times become a bit fuzzy and turn into characteristic time ranges. An acceptable characteristic time range accounts for some reasonable level of workload and concurrency that is always present in a database instance. A rule of thumb is that when times per single operation fall within acceptable range we say things to be normal. Whatever performance problems we have observed in a particular query are most likely due to inefficient execution plan (or unrealistic customer expectation of the query's run time).

So une question du jour is what are those characteristic time ranges?

As already mentioned the answer is as fuzzy as the ranges themselves are. We believe a DBA who really likes her job should acquire them while tuning the database and debugging poorly performing queries in her environment. From our own experience a majority of average modern database environments will demonstrate the following characteristic time ranges:

In Part II we'll take a look at how characteristic times can help in performance analysis.