Statistics

Teradata optimizer comes up with an execution strategy for each SQL query. This execution strategy is based on the statistics collected on the tables used within the SQL query. Statistics on the table is collected using COLLECT STATISTICS command. Optimizer requires environment information and data demographics to come up with optimal execution strategy. The Purpose of  COLLECT STATISTICS is to gather and store demographic data for one or more columns or indices of a table or join index.In this process it collects data and stores the summary in the Data Dictionary (DD) inside USER DBC.
The optimizer uses this synopsis data to generate efficient table access and join plans.

Environment Information

  • Number of Nodes, AMPs and CPUs
  • Amount of memory

Data Demographics

  • The number of rows in the table
  • The average row size
  • The range of values for the column(s) in which statistics were collected
  • The number of rows per value for the column(s) in which statistics were collected
  • The number of NULLs for the column(s) in which statistics were collected
  • Information on all Indexes in which statistics were collected

There are three approaches to collect statistics on the table.

  • Random AMP Sampling
  • Full statistics collection (explained below)
  • Using SAMPLE option

Commands to Collect STATISTICS and HELP STATISTICS

COLLECT [SUMMARY] STATISTICS
INDEX (indexname) or COLUMN (columnname)
ON <tablename>;

HELP STATISTICS <tablename>;

Examples for a better Understanding

COLLECT STATISTICS on Emp_Table ;
COLLECT STATISTICS on Emp_Table COLUMN Dept_no ;
COLLECT STATISTICS on Emp_Table COLUMN(Emp_no, Dept_no);
COLLECT STATISTICS on Emp_Table INDEX Emp_no ;
COLLECT STATISTICS on Emp_Table INDEX (First_name, Last_name);

V14 and onwards:

COLLECT STATISTICS COLUMN(First_Name, Last_Name)
                                         ,COLUMN(First_Name)
,COLUMN(Dept_No) ON Employee_Table;
Teradata keeps all Collect Statistics information in DBC in the Data Dictionary tables. Those are
  1. DBC.Indexes (for multi-column indexes only)
  2. DBC.TVFields (for all columns and single column indexes)
  3. DBC.StatsTbl (Teradata V14 and beyond)

Leave a Reply

Your email address will not be published. Required fields are marked *