Ibiz expertify, Author at Teradata

December 6, 2016

View

Views are database objects that are built by the query. Views can be built using a single table or multiple tables by way of join. Their definition is stored permanently in data dictionary but they don’t store copy of the data. Data for the view is built dynamically. A view may contain a subset of rows of the table or a subset of columns of the table. You can use regular SELECT statement to retrieve data from Views. An existing view can be modified using REPLACE VIEW statement. An existing view can be dropped using DROP VIEW statement.

Create/Replace a View

CREATE/REPLACE VIEW <viewname>
AS
<select query>;

Drop a view

DROP VIEW <view name>;

Advantages of Views

Views provide additional level of security by restricting the rows or columns of a table.
Users can be given access only to views instead of base tables.
Simplifies the use of multiple tables by pre-joining them using Views.

December 6, 2016

EXPLAIN

EXPLAIN command returns the execution plan of parsing engine in English. It can be used with any SQL statement except on another EXPLAIN command. When a query is preceded with EXPLAIN command, the execution plan of the Parsing Engine is returned to the user instead of AMPs

Some Explain plans are:

Full Table Scan (FTS) – each and every row of the table is accessed (Example when there is no where condition)
Unique Primary Index – When the rows are accessed using Unique Primary Index, then it is one AMP operation
Unique Secondary Index – When the rows are accessed using Unique Secondary Index, it’s a two amp operation.

For more information refer primary vs secondary index

Command example:

EXPLAIN (provide select statement here with where condition, etc)

More details from the Explain plan

(Last Use) – A spool file is no longer needed and will be released when this step completes.
with no residual conditions – All applicable conditions have been applied to the rows.
END TRANSACTION – Transaction locks are released, and changes are committed.
eliminating duplicate rows -Duplicate rows only exist in spool files, not set tables. Doing a DISTINCT operation.
by way of a traversal of index #n extracting row ids only – A spool file is built containing the Row IDs found in a secondary index (index #n)
we do a SMS (set manipulation step) – Combining rows using a UNION, MINUS, or INTERSECT operator.
which is redistributed by hash code to all AMPs. – Redistributing data in preparation for a join.
which is duplicated on all AMPs. – Duplicating data from the smaller table (in terms of SPOOL) in preparation for a join.
(one_AMP) or (group_AMPs) – Indicates one AMP or subset of AMPs will be used instead of all AMPs.

December 6, 2016

Compression

Compression is used to reduce the storage used by the tables. In Teradata, compression can compress up to 255 distinct values including NULL. Since the storage is reduced, Teradata can store more records in a block. This results in improved query response time since any I/O operation can process more rows per block. Compression can be added at table creation using CREATE TABLE or after table creation using ALTER TABLE command.

Limitations

Only 255 values can be compressed per column.
Primary Index column cannot be compressed.
Volatile tables cannot be compressed.

Multi-Value Compression (MVC)

The following table compresses the field DepatmentNo for values 1, 2 and 3. When compression is applied on a column, the values for this column is not stored with the row. Instead the values are stored in the Table header in each AMP and only presence bits are added to the row to indicate the value. Multi-Value compression can be used when you have a column in a large table with finite values.

CREATE SET TABLE student

(

StudentNo integer,

FirstName CHAR(30),

LastName CHAR(30),

BirthDate DATE FORMAT ‘YYYY-MM-DD-‘,

JoinedDate DATE FORMAT ‘YYYY-MM-DD-‘,

gender CHAR(1) ,

DepartmentNo CHAR(02) COMPRESS(1,2,3)

)

UNIQUE PRIMARY INDEX(EmployeeNo);

December 6, 2016

Statistics

Teradata optimizer comes up with an execution strategy for each SQL query. This execution strategy is based on the statistics collected on the tables used within the SQL query. Statistics on the table is collected using COLLECT STATISTICS command. Optimizer requires environment information and data demographics to come up with optimal execution strategy. The Purpose of COLLECT STATISTICS is to gather and store demographic data for one or more columns or indices of a table or join index.In this process it collects data and stores the summary in the Data Dictionary (DD) inside USER DBC.
The optimizer uses this synopsis data to generate efficient table access and join plans.

Environment Information

Number of Nodes, AMPs and CPUs
Amount of memory

Data Demographics

The number of rows in the table
The average row size
The range of values for the column(s) in which statistics were collected
The number of rows per value for the column(s) in which statistics were collected
The number of NULLs for the column(s) in which statistics were collected
Information on all Indexes in which statistics were collected

There are three approaches to collect statistics on the table.

Random AMP Sampling
Full statistics collection (explained below)
Using SAMPLE option

Commands to Collect STATISTICS and HELP STATISTICS

COLLECT [SUMMARY] STATISTICS
INDEX (indexname) or COLUMN (columnname)
ON <tablename>;

HELP STATISTICS <tablename>;

Examples for a better Understanding

COLLECT STATISTICS on Emp_Table ;
COLLECT STATISTICS on Emp_Table COLUMN Dept_no ;
COLLECT STATISTICS on Emp_Table COLUMN(Emp_no, Dept_no);
COLLECT STATISTICS on Emp_Table INDEX Emp_no ;
COLLECT STATISTICS on Emp_Table INDEX (First_name, Last_name);

V14 and onwards:

COLLECT STATISTICS COLUMN(First_Name, Last_Name)

,COLUMN(First_Name)
,COLUMN(Dept_No) ON Employee_Table;

Teradata keeps all Collect Statistics information in DBC in the Data Dictionary tables. Those are

DBC.Indexes (for multi-column indexes only)
DBC.TVFields (for all columns and single column indexes)
DBC.StatsTbl (Teradata V14 and beyond)

December 6, 2016

Secondary Index

Think of scenarios where table contains columns other than the index, using which the data is frequently accessed. Teradata will perform full table scan for those queries. Secondary indexes are needed to resolve this issue.

Secondary indexes are an alternate path to access the data. There are some differences between the primary index and the secondary index.

Secondary index is not involved in data distribution.
Secondary index values are stored in sub tables. These tables are built in all AMPs.
Secondary indexes are optional.
They can be created during table creation or after a table is created.
They occupy additional space since they build sub-table and they also require maintenance since the sub-tables need to be updated for each new row.

There are two types of secondary indexes:

Unique Secondary Index (USI)	Non-Unique Secondary Index (NUSI)
A Unique Secondary Index allows only unique values for the columns defined as USI. Accessing the row by USI is a two amp operation.	A Non-Unique Secondary Index allows duplicate values for the columns defined as NUSI. Accessing the row by NUSI is all-amp operation.
CREATE UNIQUE INDEX(CustNo) on Customer;	CREATE INDEX(FirstName) on Customer;

December 5, 2016

HELP/SHOW TABLE command

For folks who ask is there something similar to a command “describe table” in Teradata:

use the HELP or SHOW TABLE commands as needed.

HELP TABLE

HELP TABLE DatabaseName.TableName returns column definition

SHOW TABLE

Once the table is created, you can use SHOW TABLE command to view the FULL Definition of the table.

SHOW TABLE Employee;

*** Text of DDL statement returned.

*** Total elapsed time was 1 second.

————————————————————————

CREATE SET TABLE EMPLOYEE ,FALLBACK , etc… shows the full statement here

SELECT * FROM dbc.Tables

HELP COLUMN x.*

December 5, 2016

Table Attributes

Table attributes can be categorized into these below. See color for reference:

Table Options – Specifies the physical attributes of the table such as Journal and Fallback.
Column Definition – Specifies the list of columns, data types and their attributes.
Index Definition – Additional indexing options such as Primary Index, Secondary Index and Partitioned Primary Index

CREATE [SET/MULTISET] TABLE DATABASENAME.TABLENAME,
[NO] FALLBACK,
[NO] BEFORE JOURNAL,
[NO] AFTER JOURNAL,
(
COLUMN1 DATATYPE,
COLUMN2 DATATYPE,
.
.
.
COLUMNZ DATATYPE
)
[UNIQUE] PRIMARY INDEX [INDEXNAME] (COLUMN1,COLUMN2...COLUMNZ)
[UNIQUE] INDEX [INDEXNAME] (COLUMN1.COLUMN2...COLUMNG)

[SET/MULTISET]: SET tables discard completely duplicate records however MULTISET allows multiple instance of rows. So it depends on the situation. If you know that you will be getting distinct rows , it is advisable to go with MULTISET table as it will not check for duplicate rows. Hence, saving time and enhancing performance. However if you know that you would be getting duplicate records however you need to process only one of the duplicates rows, Go with SET tables. TERADATA default is SET however ANSI default is MULTISET.

FALLBACK: FALLBACK is TERADATA mechanism to save table data in case of AMP Failure. If you define any table with FALLBACK option , a duplicate copy of table data is maintained in some other AMP. SO in case of AMP Failure, the FALLBACK AMP is used. For critical tables, it is recommended to use FALLBACK option. However it also comes with more Storage Space utilisation disadvantage. In TERADATA, default is NO FALLBACK.

PERMANENT JOURNALS:There are four types of Permanent Journals you can use: BEFORE, AFTER, DUAL BEFORE, DUAL AFTER. The BEFORE JOURNAL holds the image of impacted rows before any changes are made. AFTER JOURNAL holds the image of affected rows after changes are done. In DUAL BEFORE/AFTER Journal, two images are taken and are stored in two different AMP’s. As it was also discussed earlier that PERMANENT JOURNALS use PERMANENT SPACE so if these JOURNALS are not required any more, it is advisable to drop JOURNALS to save some space.

December 4, 2016

Types of Tables

Teradata supports different types of tables.

 Permanent Table: This is the default table and it contains data inserted by the user and stores the data permanently.

Refer to temporary tables in Teradata for the 2 below

 Volatile Table: The data inserted into a volatile table is retained only during the user session. The table and data is dropped at the end of the session. These tables are mainly used to hold the intermediate data during data transformation.

 Global Temporary Table: The definition of Global Temporary table are persistent but the data in the table is deleted at the end of user session.

 Derived Table: Derived table holds the intermediate results in a query. Their lifetime is within the query in which they are created, used and dropped.

Set Versus Multiset

Teradata also classifies the tables as SET or MULTISET tables based on how the duplicate records are handled. A table defined as SET table doesn’t store the duplicate records, whereas the MULTISET table can store duplicate records.

December 4, 2016

Data types

Every column in a table is associated with a name and data type. The data type tells Teradata how much physical storage to set aside for the column, as well as the form in which to store the data.

Category	Data Type	Length(bytes)	Range of Values
Binary String	BYTE	fixed	Max 64,000
Binary String	VARBYTE	Variable	Max 64,000
Character String	CHAR	Fixed Format	1-64,000
Character String	VARCHAR	Variable	1-64,000
Character String	LONG VARCHAR	Variable	VARCHAR(32000) or VARCHAR(64000) depending on Teradata version
Date/Time	DATE	default format YYYY-MM-DD	Values for this type can range from 0001-01-01 through 9999-12-31. Stored internally as a numeric value and is displayed in a site-defined format
Date/Time	TIME	6 or 8	HHMMSS.nnnnnn or HHMMSS.nnnnnn+HHMM
Date/Time	TIMESTAMP	10 or 12	YYMMDDHHMMSS.nnnnnn or YYMMDDHHMMSS.nnnnnn +HHMM
Numeric	BYTEINT	1	-128 to +127
Numeric	SMALLINT	2	-32768 to +32767
Numeric	INTEGER	4	-2,147,483,648 to +2147,483,647
Numeric	BIGINT	8	– 9,233,372,036,854,775,808 to +9,233,372,036,854,775,807
Numeric	DECIMAL(n,m)	The range for precision is 1 through 18.	n is the total number of digits (precision). m is the number of digits to the right of the decimal point (scale).
Numeric	NUMERIC	1-16
Numeric	FLOAT	64-bit IEEE format	Values can range from approximately 2.226 x 10 ^-308 to 1.797 x 10³⁰⁸
BINARY & LARGE OBJECT	BLOB	Binary	BLOB(n) where n <= 2097088000 (default)
BINARY & LARGE OBJECT	BYTE	Binary	BYTE(n) where n <= 64000, default is 1
BINARY & LARGE OBJECT	VARBYTE	Binary	VARBYTE(n) where n <= 64000, where n needs to be specified
BINARY & LARGE OBJECT	GRAPHIC	CHARACTER
BINARY & LARGE OBJECT	VARGRAPHIC	CHARACTER

Most of the column data types are based on ANSI SQL. The examples of ones that are Teradata Extension
to ANSI SQL are:

Byte, Varbyte
Byteint
Date
LONG VARCHAR
GRAPHIC, VARGRAPHIC and LONG VARGRAPHIC
PERIOD

November 27, 2016

Performance Monitoring

There are numerous applications to monitor Teradata performance:

System	Summary
Account String Expansion (ASE)	Account String Expansion (ASE) is a mechanism that enables AMP usage and I/O statistics to be collected. ASE supports performance monitoring for an account string.
TDPTMON	Teradata Director Program (TDP) User Transaction Monitor (TDPTMON) is a client routine that enables a system programmer to write code to track TDP elapsed time statistics.
System Management Facility	The System Management Facility (SMF) is available in the Multiple Virtual Storage (MVS) environment only. This facility collects data about Teradata Database performance, accounting, and usage.
Performance Monitor/Application Programming Interface	The PM/API provides hooks into the Performance Monitor and Production Control (PM and PC) functions resident within the Teradata Database. PM and PC data is available through a log-on partition called MONITOR using a specialized PM/API subset of the Call-Level Interface version 2 (CLIv2) routines.