You are currently on IBM Systems Media’s archival website. Click here to view our new website.


Intelligent Queries

DB2 for i Learning Statistics Engine works smarter


In V7R1M0, as part of the Adaptive Query Processing (AQP) function, IBM made enhancements to the DB2 for i Statistics Engine that allow it to learn and continually improve its statistics estimates. These estimates are a primary input into the DB2 for i Query Optimizer's determination of the best query access plan for a given query. Thus, as the statistics become more accurate, the Query Optimizer has more knowledge, which improves its chances of finding the optimal performing query access plan for each query.

A DB2 for i statistic gives a projection of the number of rows a predicate of the query will return. Predicates range in complexity from simply comparing a literal to a column, to joining two columns, to complex joins of multiple tables, to combinations of simpler predicates connected with ANDs and ORs, to complex grouping, and ultimately to any predicate of any clause that SQL supports.

The DB2 for i Statistics Engine is a function in the SQL Query Engine (SQE) that was originally added in V5R2M0. Therefore, only queries that run in SQE are able to take advantage of the new Learning Statistics Engine.

Causes of Inaccurate Row Count Estimates

The Statistics Engine has hundreds of combinations of techniques to try to find an accurate row count estimate for an arbitrary predicate, and in general, does an excellent job at giving reasonable estimates. However, there are some reasons why an estimate may be inaccurate, and the Statistics Engine will never be smart enough to get them all correct the first time. Thus, the next best thing is to learn from the queries that are running and then give more informed estimates over time. For the interested reader, the common reasons for inaccurate estimates are:

  • Missing column statistics or indexes—on IBM i, the DB2 Statistics Engine automatically generates columns statistics as long as the system value QDBFSTCCOL allows it.
  • Data skew—where one or more values are present much more or less than an average value.
  • Stale column statistics—column statistics are automatically updated on IBM i, but there’s still the possibility of looking at a stat just before it’s updated. Note that index statistics are live and never stale on IBM i.
  • Complex derivations—for example, math or date formulas or complex string processing in the SQL predicate.
  • User Defined Functions (UDFs)—calls off to application code that may select any number of rows
  • Independent versus dependent data—for example, Phone Number and SSN would be independent whereas Model and Maker may be dependent.
  • Changing underlying data—the tables are being updated by a second query as the first query is trying to optimize or run.


Brian Muras, a software engineer in Rochester, Minn,, has worked on DB2 for i in various components including the Query Optimizer, Statistics Engine and DB2 WebQuery.



2019 Solutions Edition

A Comprehensive Online Buyer's Guide to Solutions, Services and Education.

Untangling Web Query

How metadata can reduce query and report complexity

Protection From Slow Queries

Introducing DB2 for i Adaptive Query Processing

Intelligent Queries

DB2 for i Learning Statistics Engine works smarter

IBM Systems Magazine Subscribe Box Read Now Link Subscribe Now Link iPad App Google Play Store
IBMi News Sign Up Today! Past News Letters