Today’s data warehouses relies on very fast, reliable tools that help in retrieving and managing data. And fast mean you should be able to load very large volume of data from any sources at any given time to updated with you business demands. Day by day demands from more data sources, growing data volumes, dynamically changing business requirements, and user demands for fresher data will increase and it will put your data loading into your data warehouse a big challenge. But no need to worry Teradata offers a powerful, comprehensive suite of data load and unload utilities for managing your changing data load requirements.
Teradata is so advanced in the data-loading department that other database vendors can’t hold a candle to it. A Teradata data warehouse brings enormous amounts of data into the system. This is an area that most companies overlook when purchasing a data warehouse. Most company officials think loading of data is simply that – just loading data, but in reality its more than that. Teradata provides several load and unload utilities for your specific requirements. I am going to explain one by one about Teradata Data Loading Utilities Features, Uses and Best Practices.
1. BTEQ(Basic Teradata Query)
BTEQ(Basic Teradata Query) or pronounced as Bee-Tek is a general-purpose, command-based tool which provides an interactive or batch interface that allows you to submit SQL statements, import and export data, and generate reports.It imports and exports data at row level. It provides report formatting and queried data returns to screen, file or printer. Basically there are four types of BTEQ commands and they are:-
- Session control commands – Begins and ends BTEQ sessions, and controls session characteristics
- File control commands – Specifies input and output formats and identifies information sources and destinations
- Sequence control commands – Controls the sequence in which other BTEQ commands and SQL statements are executed
- Format control commands – Controls the format of screen and printer output.
also there are four different BTEQ modes and they are :-
Export Data – BTEQ allows for multiple techniques to export data. We usually think of an export as moving data off of Teradata to a normal flat file. That is example number one and that is called RECORD Mode.
Export Report – BTEQ can actually take your SQL output report and include the Headers and export all together. It looks like an electronic report. That is EXPORT REPORT mode. This is also called Field Mode.
Export INDICDATA – When there are NULL’s in your data and you export them to a mainframe the actual mainframe application could run into problems so INDICDATA warns of NULL values.
Export DIF – The last mode is DIF and this is used when you want certain flat files to be able to be used by PC applications that utilize the Data Interchange Format.
2. Teradata FastLoad
FastLoad as name suggest can load vast amounts of data from flat files from a host into empty tables in Teradata with lightning-like speed. And this lightening speed is possible because it does not use transient journal. FastLoad was developed to loads millions of record into Empty teradata tables. FastLoad loads data into empty tables in from of 64K blocks and its only use is INSERT.
FastLoad divides its job into two phases, Phase 1 or Acquisition Phase and Phase 2 or Application Phase.
In Phase 1 data is retrieved from the mainframe or server and move it over the network inside Teradata. The data moves in 64 K blocks and is stored in worktables on the AMPs. When all of the data has been moved from the server or mainframe flat file then in Phase 2 each AMP will hash its worktable rows so each row transfers to the worktables on the proper destination AMP.
Rules before using TeraData FastLoad :-
- The target tables must be empty
- No Secondary Indexes are allowed on the Target Table
- No Referential Integrity is allowed
- No Triggers are allowed at load time
- No Join Indexes are allowed on the Target Table
- No AMPs may go down while FastLoad is processing.
- No more than one data type conversion is allowed per column during a FastLoad.
As FastLoad does not support indexes, triggers so its better practice to drop these before doing FastLoad , after loading the table you can again enable them.
3. Teradata MultiLoad
FastLoad is used to only load the data in tables where as TeraData Multiload Utility can Load, update and delete large tables in Teradata in a bulk mode . MultiLoad has the capability to load multiple tables at one time from either a LAN or
Channel environment. This feature rich utility can perform multiple types of DML tasks, including INSERT, UPDATE, DELETE and UPSERT on up to five empty or populated target tables at a time. These DML functions may be run either solo or in combinations, against one or more tables. For these reasons, MultiLoad is the utility of choice when it comes to loading populated tables in the batch environment. As the volume of data being loaded or updated in a single block, the performance of MultiLoad improves. MultiLoad runs on a variety of client platforms, operates in a fail-safe mode and is fully recoverable.
Rules before using TeraData MultiLoad :-
- Unique Secondary Indexes are not supported on a Target Table
- Referential Integrity is not supported
- No Triggers at load time
- No concatenation of input files is allowed
- No Join Indexes
So as MultiLoad does not support indexes, triggers so its better practice to drop these before doing MultiLoad , after loading the table you can again enable them.
4. TPump ( Teradata Parallel Data Pump)
Both FastLoad and MultiLoad assemble massive volumes of data rows into 64K blocks and then moves those blocks. TPump does NOT move data in the large blocks. Instead, it loads data one row at a time, using row hash locks. Because it locks at this level, and
not at the table level like MultiLoad, TPump can make many simultaneous, or concurrent, updates on a table.
TPump is used to perform Insert, Updates, Deletes and Upserts from Flat Files to populated Teradata Tables at Row Level. There are benefits certain benefits of using TPump Utility. TPump Utility can have Secondary Indexes, Referential Integrity, Triggers, Join Indexes, and also Pump data at varying rates.
But TPump also have certain limitations and they are:-
- No concatenation of input data files is allowed
- TPump will not process aggregates, arithmetic functions or exponentiation
- The use of the SELECT function is not allowed
- No more than four IMPORT commands may be used in a single load task
- Dates before 1900 or after 1999 must be represented by the yyyy format for the year portion of the date, not the default format of yy
- On some network attached systems, the maximum file size when using TPump is 2GB
- TPump performance will be diminished if Access Logging is used
5. TPT ( Teradata Parallel Transporter)
The Teradata Parallel Transporter (TPT) utility combines BTEQ, FastLoad, MultiLoad, Tpump, and FastExport utilities into one comprehensive language utility. This allows TPT to insert data to tables, export data from tables, and update tables.
TPT works around the concept of Operators and Data Streams. There will be an Operator to read Source data, pass the contents of that Source to a data stream where another operator will be responsible for taking the Data Stream and loading it to disk. Below picture represents Teradata Parallel Transporter Infrastructure.
In other utilities multiple data sources are usually processed in a serial manner, Teradata Parallel Transporter can access multiple data sources in parallel. Teradata Parallel Transporter also allows different specifications for different data sources and, if their data is UNION-compatible, merges them together.
Below picture represents the Teradata Parallel Transporter Infrastructure.
There are four main components of Teradata Parallel Transporter and they are :-
Load Operator :- It uses the FastLoad protocol and is a parallel load utility designed to move large volumes of data collected from data sources on channel- and network-attached clients into empty tables in the Teradata Database.
Update Operator :- It uses the MultiLoad protocol and it can updates, inserts and upserts large volumes of data into empty or populated tables and/or deletes data from the tables.
Export Operator :- It uses the FastExport protocol and it can exports large data sets from Teradata tables or views and imports the data to a client’s system for processing or generating large reports or for loading data into a smaller database.
Stream Operator :- It uses the TPump protocol and it can loads near real-time data into the data warehouse. TPump can be used to insert, update, upsert and delete data in the Teradata Database, particularly for environments where data warehouse maintenance overlaps normal working hours. Because the TPump protocol uses row hash locks, users can run queries even while they are updating the Teradata system.
SO these are the Teradata Data loading utilities which we can use to efficiently load the data in big Teradata tables. Below table summarizes the Teradata Data Loading Utilities.
I tried my best to explain Teradata Data Loading Utilities Features, Uses and Best Practices. Please add important points from your side in the comments. Also if you are interested to submit any article, you can do so by visiting here Join CodingByte