ETL Interview Questions and Answers
1) What is ETL?
Ans: ETL refers to Extracting, Transforming and Loading of Data from any outside system to the required place. These are the basic 3 steps in the Data Integration process. Extracting means locating the Data and removing from the source file,
transforming is the process of transporting it to the required target file and Loading the file in the target system in the format applicable.
2) Why ETL testing is required? Ans.
Ans: To keep a check on the Data which are being transferred from one system to the other. To keep a track on the efficiency and speed of the process.
To be well acquainted with the ETL process before it gets implemented into your business and production.
3) What are ETL tester responsibilities? Ans.
- Requires in depth knowledge on the ETL tools and processes.
- Needs to write the SQL queries for the various given scenarios during the testing phase.
- Should be able to carry our different types of tests such as Primary Key, defaults and keep a check on the other functionality of the ETL process.
- Quality Check
4) What are Dimensions?
Ans: Dimensions are the groups or categories through which the summarized data are sorted.
5) What is Staging area referring to?
Ans: Staging area is the place where the data is stored temporarily in the process of Data Integration. Here, the data s cleansed and checked for any duplication.
6) Explain ETL Mapping Sheets.
Ans: ETL mapping sheets contains all the required information from the source file including all the rows and columns. This sheet helps the experts in writing the SQL queries for the ETL tools testing.
7) Mention few Test cases and explain them. Ans.
Ans: Mapping Doc Validation – Verifying if the ETL information is provided in the Mapping Doc.
Data Check – Every aspect regarding the Data such as Data check, Number Check, Null check are tested in this case
Correctness Issues – Misspelled Data, Inaccurate data and null data are tested.
8) List few ETL bugs.
Ans: Calculation Bug, User Interface Bug, Source Bugs, Load condition bug, ECP related bug.
In addition to the above ETL testing questions, there may be other vital questions where you may be asked to mention the ETL tools which you have used earlier. Also, you might be asked about any debugging issues you have faced in your earlier experience or about any real time experience
9) What is Data warehouse?
Ans: Data warehouse is a database used for reporting and data analysis.
10) What are the characteristics of a Data Warehouse?
Ans: Subject Oriented, Integrated, Time-variant and Non-volatile
11) What is the difference between Data Mining and Data Warehousing?
Ans: Data mining – Analyzing data from different perspectives and concluding it into useful decision making information. It can be used to increase revenue, cost cutting, increase productivity or improve any business process. There are lot of tools available in market for various industries to do data mining. Basically, it is all about finding correlations or patterns in large relational databases.
Data warehousing – Data warehousing comes before data mining. It is the process of compiling and organizing data into one database from various source systems where as data mining is the process of extracting meaningful data from that database (data warehouse).
12) What are the main stages of Business Intelligence.
Ans: Data Sourcing –> Data Analysis –> Situation Awareness –> Risk Assessment –> Decision Support
13) What tools you have used for ETL testing?
1. Data access tools e.g., TOAD, WinSQL, AQT etc. (used to analyze content of tables)
2. ETL Tools e.g. Informatica, DataStage
3. Test management tool e.g. Test Director, Quality Center etc. ( to maintain requirements, test cases, defects and traceability matrix)
14) What is OLAP?
Ans: OLAP stands for Online Analytical Processing. It uses database tables (Fact and Dimension tables) to enable multidimensional viewing, analysis and querying of large amount of data.
15) What is OLTP?
Ans: OLTP stands for Online Transaction Processing Except data warehouse databases the other databases are OLTPs. These OLTP uses normalized schema structure. These OLTP databases are designed for recording the daily operations and transactions of a business.
16) What are Dimensions?
Ans: Dimensions are categories by which summarized data can be viewed. For example a profit Fact table can be viewed by a time dimension.
17) What are Confirmed Dimensions?
Ans: The Dimensions which are reusable and fixed in nature Example customer, time, geography dimensions.
18) What are Fact Tables?
Ans: A Fact Table is a table that contains summarized numerical (facts) and historical data. This Fact Table has a foreign key-primary key relation with a dimension table. The Fact Table maintains the information in 3rd normal form.
A star schema is defined is defined as a logical database design in which there will be a centrally located fact table which is surrounded by at least one or more dimension tables. This design is best suited for Data Warehouse or Data Mart.
19) What are the types of Facts?
Ans: The types of Facts are as follows.
- Additive Facts: A Fact which can be summed up for any of the dimension available in the fact table.
- Semi-Additive Facts: A Fact which can be summed up to a few dimensions and not for all dimensions available in the fact table.
- Non-Additive Fact: A Fact which cannot be summed up for any of the dimensions available in the fact table.
20) What are the types of Fact Tables?
Ans: The types of Fact Tables are:
- Cumulative Fact Table: This type of fact tables generally describes what was happened over the period of time. They contain additive facts.
- Snapshot Fact Table: This type of fact table deals with the particular period of time. They contain non-additive and semi-additive facts.
21) What is Grain of Fact?
Ans: The Grain of Fact is defined as the level at which the fact information is stored in a fact table. This is also called as Fact Granularity or Fact Event Level.
22) What is Factless Fact table?
Ans: The Fact Table which does not contains facts is called as Fact Table. Generally when we need to combine two data marts, then one data mart will have a fact less fact table and other one with common fact table.
23) What are Measures?
Ans: Measures are numeric data based on columns in a fact table.
24) What are Cubes?
Ans: Cubes are data processing units composed of fact tables and dimensions from the data warehouse. They provided multidimensional analysis.
25) What are Virtual Cubes?
Ans: These are combination of one or more real cubes and require no disk space to store them. They store only definition and not the data.
26) What is a Star schema design?
Ans: A Star schema is defined as a logical database design in which there will be a centrally located fact table which is surrounded by at least one or more dimension tables. This design is best suited for Data Warehouse or Data Mart.
27) What is Snow Flake schema Design?
Ans: In a Snow Flake design the dimension table (de-normalized table) will be further divided into one or more dimensions (normalized tables) to organize the information in a better structural format. To design snow flake we should first design star schema design.
28) What is Operational Data Store [ODS] ?
Ans: It is a collection of integrated databases designed to support operational monitoring. Unlike the OLTP databases, the data in the ODS are integrated, subject oriented and enterprise wide data.
29) What is Denormalization?
Ans: Denormalization means a table with multi duplicate key. The dimension table follows Denormalization method with the technique of surrogate key.
30) What is Surrogate Key?
Ans: A Surrogate Key is a sequence generated key which is assigned to be a primary key in the system (table).[/signinlocker]
Hey there! I shoot some hoops when I’m not drowned in the books, sitting by the side of brooks.