There is a huge difference between using COLLECT STATISTICS and random AMP sampling. You
must explicitly run the COLLECT STATISTICS statement, it will not happen automatically.
This is the most precise way of gathering statistics, but it comes at with a rather large performance price tag.
The statements that we are going to look at can be issued in Teradata BTEQ or Teradata SQL
Assistant.
Here is an example COLLECT STATISTICS statement:
COLLECT STATISTICS ON myTable COLUMN (myColumn);
You can even run an EXPLAIN on the COLLECT STATISTICS statement:
EXPLAIN COLLECT STATISTICS ON myTable COLUMN (myColumn);
You can collect statistics on indexes:
COLLECT STATISTICS ON myTable INDEX (myKey);
Teradata knows when a column is unique. The optimizer will assume an equality predicate
(myUniqueCol = „A Unique Value‟) only returns one row. This means that statistics collected on
these columns have little impact on the optimizer. On the other hand, it is very important to collect
statistics on non-unique indexes and columns commonly included in WHERE clauses.
It is sometimes useful to collect statistics on multiple columns and treat them as a group. This can
help make queries with multiple predicates run much faster. For example:
select count(*)
from myTable
where color = ‘Blue’
and material = ‘Burlap’;
This query may benefit from this COLLECT STATISTICS statement.
COLLECT STATISTICS ON myTable COLUMN (color, material);
If you have statistics that no longer reflect the contents of your table, you can drop them. Here is an
example of dropping the statistics collected with the previous COLLECT STATISTICS statement.
DROP STATISTICS ON myTable COLUMN (color, material);
If you would like to review which statistics have been collected for a table, you can use this
command.
HELP STATS myTable;
The HELP STATS statement will display the column names; the date and time the statistics were
collected; and the number of unique values contained in the columns.
Once you have collected the statistics for a table you can easily refresh them. I suggest that you issue
the HELP STATS statement to see which columns have statistics before relying on this. Here is the
command that will recollect statistics on the myTable table.
COLLECT STATISTICS myTable;