View

Views are database objects that are built by the query. Views can be built using a single table or multiple tables by way of join. Their definition is stored permanently in data dictionary but they don’t store copy of the data. Data for the view is built dynamically. A view may contain a subset of rows of the table or a subset of columns of the table. You can use regular SELECT statement to retrieve data from Views. An existing view can be modified using REPLACE VIEW statement. An existing view can be dropped using DROP VIEW statement.

Create/Replace a View

CREATE/REPLACE VIEW <viewname>
AS
<select query>;

Drop a view

DROP VIEW <view name>;

Advantages of Views

  • Views provide additional level of security by restricting the rows or columns of a table.
  • Users can be given access only to views instead of base tables.
  • Simplifies the use of multiple tables by pre-joining them using Views.

 

EXPLAIN

EXPLAIN command returns the execution plan of parsing engine in English. It can be used with any SQL statement except on another EXPLAIN command. When a query is preceded with EXPLAIN command, the execution plan of the Parsing Engine is returned to the user instead of AMPs

Some Explain plans are:

  • Full Table Scan (FTS) – each and every row of the table is accessed (Example when there is no where condition)
  • Unique Primary Index – When the rows are accessed using Unique Primary Index, then it is one AMP operation
  • Unique Secondary Index – When the rows are accessed using Unique Secondary Index, it’s a two amp operation.

For more information refer primary vs secondary index

Command example:

EXPLAIN (provide select statement here with where condition, etc)

More details from the Explain plan

  • (Last Use) – A spool file is no longer needed and will be released when this step completes.
  • with no residual conditions – All applicable conditions have been applied to the rows.
  • END TRANSACTION – Transaction locks are released, and changes are committed.
  • eliminating duplicate rows -Duplicate rows only exist in spool files, not set tables. Doing a DISTINCT operation.
  • by way of a traversal of index #n extracting row ids only – A spool file is built containing the Row IDs found in a secondary index (index #n)
  • we do a SMS (set manipulation step) – Combining rows using a UNION, MINUS, or INTERSECT operator.
  • which is redistributed by hash code to all AMPs. –  Redistributing data in preparation for a join.
  • which is duplicated on all AMPs. – Duplicating data from the smaller table (in terms of SPOOL) in preparation for a join.
  • (one_AMP) or (group_AMPs) – Indicates one AMP or subset of AMPs will be used instead of all AMPs.

Compression

Compression is used to reduce the storage used by the tables. In Teradata, compression can compress up to 255 distinct values including NULL. Since the storage is reduced, Teradata can store more records in a block. This results in improved query response time since any I/O operation can process more rows per block. Compression can be added at table creation using CREATE TABLE or after table creation using ALTER TABLE command.

Limitations

  • Only 255 values can be compressed per column.
  • Primary Index column cannot be compressed.
  • Volatile tables cannot be compressed.

Multi-Value Compression (MVC)

The following table compresses the field DepatmentNo for values 1, 2 and 3. When compression is applied on a column, the values for this column is not stored with the row. Instead the values are stored in the Table header in each AMP and only presence bits are added to the row to indicate the value. Multi-Value compression can be used when you have a column in a large table with finite values.

CREATE SET TABLE student

(

StudentNo integer,

FirstName CHAR(30),

LastName CHAR(30),

BirthDate DATE FORMAT ‘YYYY-MM-DD-‘,

JoinedDate DATE FORMAT ‘YYYY-MM-DD-‘,

gender CHAR(1) ,

DepartmentNo CHAR(02) COMPRESS(1,2,3)

)

UNIQUE PRIMARY INDEX(EmployeeNo);

 

Statistics

Teradata optimizer comes up with an execution strategy for each SQL query. This execution strategy is based on the statistics collected on the tables used within the SQL query. Statistics on the table is collected using COLLECT STATISTICS command. Optimizer requires environment information and data demographics to come up with optimal execution strategy. The Purpose of  COLLECT STATISTICS is to gather and store demographic data for one or more columns or indices of a table or join index.In this process it collects data and stores the summary in the Data Dictionary (DD) inside USER DBC.
The optimizer uses this synopsis data to generate efficient table access and join plans.

Environment Information

  • Number of Nodes, AMPs and CPUs
  • Amount of memory

Data Demographics

  • The number of rows in the table
  • The average row size
  • The range of values for the column(s) in which statistics were collected
  • The number of rows per value for the column(s) in which statistics were collected
  • The number of NULLs for the column(s) in which statistics were collected
  • Information on all Indexes in which statistics were collected

There are three approaches to collect statistics on the table.

  • Random AMP Sampling
  • Full statistics collection (explained below)
  • Using SAMPLE option

Commands to Collect STATISTICS and HELP STATISTICS

COLLECT [SUMMARY] STATISTICS
INDEX (indexname) or COLUMN (columnname)
ON <tablename>;

HELP STATISTICS <tablename>;

Examples for a better Understanding

COLLECT STATISTICS on Emp_Table ;
COLLECT STATISTICS on Emp_Table COLUMN Dept_no ;
COLLECT STATISTICS on Emp_Table COLUMN(Emp_no, Dept_no);
COLLECT STATISTICS on Emp_Table INDEX Emp_no ;
COLLECT STATISTICS on Emp_Table INDEX (First_name, Last_name);

V14 and onwards:

COLLECT STATISTICS COLUMN(First_Name, Last_Name)
                                         ,COLUMN(First_Name)
,COLUMN(Dept_No) ON Employee_Table;
Teradata keeps all Collect Statistics information in DBC in the Data Dictionary tables. Those are
  1. DBC.Indexes (for multi-column indexes only)
  2. DBC.TVFields (for all columns and single column indexes)
  3. DBC.StatsTbl (Teradata V14 and beyond)

Secondary Index

Think of scenarios where table contains columns other than the index, using which the data is frequently accessed. Teradata will perform full table scan for those queries. Secondary indexes are needed to resolve this issue.

Secondary indexes are an alternate path to access the data. There are some differences between the primary index and the secondary index.

  • Secondary index is not involved in data distribution.
  • Secondary index values are stored in sub tables. These tables are built in all AMPs.
  • Secondary indexes are optional.
  • They can be created during table creation or after a table is created.
  • They occupy additional space since they build sub-table and they also require maintenance since the sub-tables need to be updated for each new row.

There are two types of secondary indexes:

 

Unique Secondary Index (USI) Non-Unique Secondary Index (NUSI)
A Unique Secondary Index allows only unique values for the columns defined as USI. Accessing the row by USI is a two amp operation. A Non-Unique Secondary Index allows duplicate values for the columns defined as NUSI. Accessing the row by NUSI is all-amp operation.
CREATE UNIQUE INDEX(CustNo) on Customer; CREATE INDEX(FirstName) on Customer;

HELP/SHOW TABLE command

For folks who ask is there something similar to a command “describe table” in Teradata:

use the HELP or SHOW TABLE commands as needed.

HELP TABLE

HELP TABLE DatabaseName.TableName returns column definition

SHOW TABLE

Once the table is created, you can use SHOW TABLE command to view the FULL Definition of the table.

SHOW TABLE Employee;

*** Text of DDL statement returned.

*** Total elapsed time was 1 second.

————————————————————————

CREATE SET TABLE EMPLOYEE ,FALLBACK , etc… shows the full statement here

 

SELECT * FROM dbc.Tables

HELP COLUMN x.*

Table Attributes

Table attributes can be categorized into these below. See color for reference:

  • Table Options – Specifies the physical attributes of the table such as Journal and Fallback.
  • Column Definition – Specifies the list of columns, data types and their attributes.
  • Index Definition – Additional indexing options such as Primary Index, Secondary Index and Partitioned Primary Index
CREATE [SET/MULTISET] TABLE DATABASENAME.TABLENAME,
[NO] FALLBACK,
[NO] BEFORE JOURNAL,
[NO] AFTER JOURNAL,
(
COLUMN1 DATATYPE,
COLUMN2 DATATYPE,
.
.
.
COLUMNZ DATATYPE
)
[UNIQUE] PRIMARY INDEX [INDEXNAME] (COLUMN1,COLUMN2...COLUMNZ)
[UNIQUE] INDEX [INDEXNAME] (COLUMN1.COLUMN2...COLUMNG)


[SET/MULTISET]: SET tables discard completely duplicate records however MULTISET allows multiple instance of rows. So it depends on the situation. If you know that you will be getting distinct rows , it is advisable to go with MULTISET table as it will not check for duplicate rows. Hence, saving time and enhancing performance. However if you know that you would be getting duplicate records however you need to process only one of the duplicates rows, Go with SET tables. TERADATA default is SET however ANSI default is MULTISET.

FALLBACK: FALLBACK is TERADATA mechanism to save table data in case of AMP Failure. If you define any table with FALLBACK option , a duplicate copy of table data is maintained in some other AMP. SO in case of AMP Failure, the FALLBACK AMP is used. For critical tables, it is recommended to use FALLBACK option. However it also comes with more Storage Space utilisation disadvantage. In TERADATA, default is NO FALLBACK.

PERMANENT JOURNALS:There are four types of Permanent Journals you can use: BEFORE, AFTER, DUAL BEFORE, DUAL AFTER. The BEFORE JOURNAL holds the image of impacted rows before any changes are made. AFTER JOURNAL holds the image of affected rows after changes are done. In DUAL BEFORE/AFTER Journal, two images are taken and are stored in two different AMP’s. As it was also discussed earlier that PERMANENT JOURNALS use PERMANENT SPACE so if these JOURNALS are not required any more, it is advisable to drop JOURNALS to save some space.

Types of Tables

Teradata supports different types of tables.

Permanent Table: This is the default table and it contains data inserted by the user and stores the data permanently.

Refer to temporary tables in Teradata for the 2 below

Volatile Table: The data inserted into a volatile table is retained only during the user session. The table and data is dropped at the end of the session. These tables are mainly used to hold the intermediate data during data transformation.

Global Temporary Table: The definition of Global Temporary table are persistent but the data in the table is deleted at the end of user session.

Derived Table: Derived table holds the intermediate results in a query. Their lifetime is within the query in which they are created, used and dropped.

Set Versus Multiset

Teradata also classifies the tables as SET or MULTISET tables based on how the duplicate records are handled. A table defined as SET table doesn’t store the duplicate records, whereas the MULTISET table can store duplicate records.

Data types

Every column in a table is associated with a name and data type. The data type tells Teradata how much physical storage to set aside for the column, as well as the form in which to store the data.

 

Category Data Type Length(bytes) Range of Values
Binary String BYTE fixed Max 64,000
Binary String VARBYTE Variable Max 64,000
Character String CHAR Fixed Format 1-64,000
Character String VARCHAR Variable 1-64,000
Character String LONG VARCHAR Variable VARCHAR(32000) or VARCHAR(64000) depending on Teradata version
Date/Time DATE default format YYYY-MM-DD Values for this type can range from 0001-01-01 through 9999-12-31.  Stored internally as a numeric value and is displayed in a site-defined format
Date/Time TIME 6 or 8 HHMMSS.nnnnnn or HHMMSS.nnnnnn+HHMM
Date/Time TIMESTAMP 10 or 12 YYMMDDHHMMSS.nnnnnn or YYMMDDHHMMSS.nnnnnn +HHMM
Numeric BYTEINT 1 -128 to +127
Numeric SMALLINT 2 -32768 to +32767
Numeric INTEGER 4 -2,147,483,648 to
+2147,483,647
Numeric BIGINT 8
9,233,372,036,854,775,808 to
+9,233,372,036,854,775,807
Numeric DECIMAL(n,m) The range for precision is 1 through 18.  n is the total number of digits (precision). m is the number of digits to the right of the decimal point (scale).
Numeric NUMERIC 1-16
Numeric FLOAT 64-bit IEEE format Values can range from approximately 2.226 x 10 -308 to 1.797 x 10308
BINARY & LARGE OBJECT BLOB  Binary BLOB(n) where n <= 2097088000 (default)
BINARY & LARGE OBJECT BYTE  Binary BYTE(n) where n <= 64000, default is 1
BINARY & LARGE OBJECT VARBYTE  Binary VARBYTE(n) where n <= 64000, where n needs to be specified
BINARY & LARGE OBJECT GRAPHIC CHARACTER
BINARY & LARGE OBJECT VARGRAPHIC  CHARACTER

Most of the column data types are based on ANSI SQL. The examples of ones that are Teradata Extension
to ANSI SQL are:

  • Byte, Varbyte
  • Byteint
  • Date
  • LONG VARCHAR
  • GRAPHIC, VARGRAPHIC and LONG VARGRAPHIC
  • PERIOD

Performance Monitoring

There are numerous applications to monitor Teradata performance:

 

System Summary
 Account String Expansion (ASE)  Account String Expansion (ASE) is a mechanism that enables AMP usage and I/O statistics to be collected. ASE supports performance monitoring for an account string.
 TDPTMON  Teradata Director Program (TDP) User Transaction Monitor (TDPTMON) is a client routine that enables a system programmer to write code to track TDP elapsed time statistics.
 System Management Facility  The System Management Facility (SMF) is available in the Multiple Virtual Storage (MVS) environment only. This facility collects data about Teradata Database performance, accounting, and usage.
Performance Monitor/Application Programming Interface  The PM/API provides hooks into the Performance Monitor and Production Control (PM and PC) functions resident within the Teradata Database. PM and PC data is available through a log-on partition called MONITOR using a specialized PM/API subset of the Call-Level Interface version 2 (CLIv2) routines.