Teradata optimizer comes up with an execution strategy for each SQL query. This execution strategy is based on the statistics collected on the tables used within the SQL query. Statistics on the table is collected using COLLECT STATISTICS command. Optimizer requires environment information and data demographics to come up with optimal execution strategy. The Purpose of COLLECT STATISTICS is to gather and store demographic data for one or more columns or indices of a table or join index.In this process it collects data and stores the summary in the Data Dictionary (DD) inside USER DBC.
The optimizer uses this synopsis data to generate efficient table access and join plans.
Environment Information
- Number of Nodes, AMPs and CPUs
- Amount of memory
Data Demographics
- The number of rows in the table
- The average row size
- The range of values for the column(s) in which statistics were collected
- The number of rows per value for the column(s) in which statistics were collected
- The number of NULLs for the column(s) in which statistics were collected
- Information on all Indexes in which statistics were collected
There are three approaches to collect statistics on the table.
- Random AMP Sampling
- Full statistics collection (explained below)
- Using SAMPLE option
Commands to Collect STATISTICS and HELP STATISTICS
COLLECT [SUMMARY] STATISTICS
INDEX (indexname) or COLUMN (columnname)
ON <tablename>;HELP STATISTICS <tablename>;
Examples for a better Understanding
COLLECT STATISTICS on Emp_Table ;
COLLECT STATISTICS on Emp_Table COLUMN Dept_no ;
COLLECT STATISTICS on Emp_Table COLUMN(Emp_no, Dept_no);
COLLECT STATISTICS on Emp_Table INDEX Emp_no ;
COLLECT STATISTICS on Emp_Table INDEX (First_name, Last_name);
V14 and onwards:
,COLUMN(Dept_No) ON Employee_Table;
- DBC.Indexes (for multi-column indexes only)
- DBC.TVFields (for all columns and single column indexes)
- DBC.StatsTbl (Teradata V14 and beyond)