Statistics

Teradata optimizer comes up with an execution strategy for each SQL query. This execution strategy is based on the statistics collected on the tables used within the SQL query. Statistics on the table is collected using COLLECT STATISTICS command. Optimizer requires environment information and data demographics to come up with optimal execution strategy. The Purpose of  COLLECT STATISTICS is to gather and store demographic data for one or more columns or indices of a table or join index.In this process it collects data and stores the summary in the Data Dictionary (DD) inside USER DBC.
The optimizer uses this synopsis data to generate efficient table access and join plans.

Environment Information

  • Number of Nodes, AMPs and CPUs
  • Amount of memory

Data Demographics

  • The number of rows in the table
  • The average row size
  • The range of values for the column(s) in which statistics were collected
  • The number of rows per value for the column(s) in which statistics were collected
  • The number of NULLs for the column(s) in which statistics were collected
  • Information on all Indexes in which statistics were collected

There are three approaches to collect statistics on the table.

  • Random AMP Sampling
  • Full statistics collection (explained below)
  • Using SAMPLE option

Commands to Collect STATISTICS and HELP STATISTICS

COLLECT [SUMMARY] STATISTICS
INDEX (indexname) or COLUMN (columnname)
ON <tablename>;

HELP STATISTICS <tablename>;

Examples for a better Understanding

COLLECT STATISTICS on Emp_Table ;
COLLECT STATISTICS on Emp_Table COLUMN Dept_no ;
COLLECT STATISTICS on Emp_Table COLUMN(Emp_no, Dept_no);
COLLECT STATISTICS on Emp_Table INDEX Emp_no ;
COLLECT STATISTICS on Emp_Table INDEX (First_name, Last_name);

V14 and onwards:

COLLECT STATISTICS COLUMN(First_Name, Last_Name)
                                         ,COLUMN(First_Name)
,COLUMN(Dept_No) ON Employee_Table;
Teradata keeps all Collect Statistics information in DBC in the Data Dictionary tables. Those are
  1. DBC.Indexes (for multi-column indexes only)
  2. DBC.TVFields (for all columns and single column indexes)
  3. DBC.StatsTbl (Teradata V14 and beyond)

Importance of SQL performance tuning

SQL performance is vital for driving value from a data warehouse. With today’s growing query complexity, optimizing SQL can be a daunting task. This series provides a fundamental method for SQL tuning that has been used and refined through thousands of tuning exercises. The structure and process in this method generates a context that minimizes the complexities and gets to the root of the performance issues, which enables faster remediation. Further, the method provides a consistent, repeatable process that can be leveraged across the organization from the end user to database administrator. In addition to the structured method, the presenters will share insight on typically seen performance problems and provide sample remedies. This provides the reader with insight to commence their tuning efforts.

SQL tuning can be a daunting task. This method leads an analyst to diagnose & see the root cause of a performance issue in the process of developing a solution.
The reader will receive a proven tuning method that has been used to systematically diagnose & fix 1000s of SQL tuning issues.
SQL tuning diagnostics with typical problem types & sample remedies are provided in this series.

Performance – Teradata Manager

You can configure the Teradata Manager data collection service to collect performance data
with respect to:
• AMPUsage
• DBQL
• Te r a d a t a DWM

• Heartbeat queries
• Priority Scheduler
• ResUsage
• Spool space
• Table s pa ce

Teradata Manager data collection resources help in:
• Analyzing workload trends
• Analyzing historical resource utilization

Collecting historical data for performance

ResUsage History
Teradata Manager can summarize key ResUsage data up to 1 row per system per time
period specified. Teradata recommends retaining 3 to 6 months of detail to accommodate
various analysis tools, such as Teradata Manager itself, Visual Edge, and so on.

AMPUsage History
Teradata Manager can summarize to 1 row per system per account per time period
specified. Moreover, it can retain 1 day of detail. Teradata recommends deleting excess
detail to keep ongoing summary collection efficient.

DBQL History
Teradata Manager summaries key DBQL data to 1 row per user / account / application ID
/ client ID per time period. Teradata recommends retaining 13 months of copied detail.
That is, Teradata recommends copying detail to another table daily. You should delete the
source of copied detail daily to keep online summary collection efficient.

COLLECT STATISTICS statement

Using this statement you can collect demographic information about a specific column or index.
There is a huge difference between using COLLECT STATISTICS and random AMP sampling. You
must explicitly run the COLLECT STATISTICS statement, it will not happen automatically.

This is the most precise way of gathering statistics, but it comes at with a rather large performance price tag.

The statements that we are going to look at can be issued in Teradata BTEQ or Teradata SQL
Assistant.
Here is an example COLLECT STATISTICS statement:

COLLECT STATISTICS ON myTable COLUMN (myColumn);

You can even run an EXPLAIN on the COLLECT STATISTICS statement:

EXPLAIN COLLECT STATISTICS ON myTable COLUMN (myColumn);

You can collect statistics on indexes:

COLLECT STATISTICS ON myTable INDEX (myKey);

Teradata knows when a column is unique. The optimizer will assume an equality predicate
(myUniqueCol = „A Unique Value‟)  only returns one row. This means that statistics collected on
these columns have little impact on the optimizer. On the other hand, it is very important to collect
statistics on non-unique indexes and columns commonly included in WHERE clauses.

It is sometimes useful to collect statistics on multiple columns and treat them as a group. This can
help make queries with multiple predicates run much faster.  For example:

select count(*)
  from myTable
where color = ‘Blue’
   and material = ‘Burlap’;

This query may benefit from this COLLECT STATISTICS statement.

COLLECT STATISTICS ON myTable COLUMN (color, material);

If you have statistics that no longer reflect the contents of your table, you can drop them. Here is an
example of dropping the statistics collected with the previous COLLECT STATISTICS statement.

DROP STATISTICS ON myTable COLUMN (color, material);

If you would like to review which statistics have been collected for a table, you can use this
command.

HELP STATS myTable;

The HELP STATS statement will display the column names; the date and time the statistics were
collected; and the number of unique values contained in the columns.
Once you have collected the statistics for a table you can easily refresh them. I suggest that you issue
the HELP STATS statement to see which columns have statistics before relying on this. Here is the
command that will recollect statistics on the myTable table.

COLLECT STATISTICS myTable;

DQBL tables – Performance data collection

You can use DBQL tables to collect and evaluate:
• System throughput
• Response time
• Query details, such a query step level detail, request source, SQL, answer size, rejected
queries, resource consumption
• Objects accessed by the query

In addition to being able to capture the entire SQL statement, regardless of the length of the
SQL, DBQL also provides key insights into other aspects of a query such as whether it was
aborted, delayed by Teradata DWM, the start and end time, and so on.

DBQL operates asynchronously. As a result, the logging activity has a much lower impact on
the overall response time of given transactions.