Sunday, 28 August 2011

ETL Overview.......

ETL is very important process in Data Warehousing which involves following three major steps:-

Extraction :-
This is the first step in the ETL Process which involves extracting the data from the source systems. Source systems here refer to Relational Databases , Flat Files and many other systems like ERP, SAP, CRM etc. Thus extraction involves fetching the data from different sources and converting it to single consistent format so that it is ready for next stage which is transform.

This is second step in the ETL in which number of Business rules are applied to the extracted data so that data can be loaded in the required format in the target database. Transform is the most time consuming stage in ETL which requires a lot of processing and calculations need to be done. Some of the transformations normally done in transform stage are listed below:
• Joining data from multiple sources
• Aggregation
• Calculating new values from the source data values.
• Splitting a column in multiple columns
• Normalization and De- Normalization
• Cleaning the source data
• Translating source data as per business rules and requirements

Load step takes the data from the transform step and loads it into the target which is mostly the Data Warehouse. Depending upon the business requirements this process may load overwrite the existing data or may maintain the historic data for analysis purpose. Load period also varies from daily ,weekly to monthly as per the requirements.

No comments:

Post a Comment