Monday, July 11, 2016

Data Warehouse Interview Questions

DataWarehousing Questions

What is a Data Warehouse?

A data warehouse is a storage of historical data for Data Analytics, such as reporting, analysis, Integration and data management. A data warehouse is a repository of data.

Benefits of Data Warehouse?

It intergrates data and stores the data historicaly so the data can be analysis for performance, trends, predictions over a period of time

What is Business Intelligence?

Business Intelligence is presenting historical data in a meaningful way (often by using different data visualization techniques). Raw data stored in databases can turn into valuable information through the ETL Process.

What is the difference between OLTP and OLAP?

OLTP is the transaction system that collects business data. Whereas OLAP is the reporting and analysis system on that data.

OLTP systems are optimized for INSERT, UPDATE operations and therefore highly normalized. OLAP systems are deliberately denormalized for fast data retrieval through SELECT operations.



What is data mart?

Data marts are generally designed for a single subject area. Stored in data warehouse and each department may have separate data marts. These marts can be built on top of a data warehouse.

Difference between a datawarehouse and a data mart 

A data warehouse is a central repository for all an organization's data. 
A data mart,  is to meet the particular demands of a specific group of users within the organization, such as human resource management.

What is ER model?

ER model or entity-relationship model is a particular methodology of data modeling wherein the goal of modeling is to normalize the data by reducing redundancy. This is different than dimensional modeling where the main goal is to improve the data retrieval mechanism.

What is Dimensional Modeling?

Dimensional model consists of dimension and fact tables. Fact tables store different transactional measurements and the foreign keys from dimension tables that qualifies the data. The goal of Dimensional model is not to achieve high degree of normalization but to facilitate easy and faster data retrieval.

What is Dimension?

Dimension tables describe the quantified data on the fact tables, giving context on its fields. They contain descriptive attributes which provide more information related to the fact table.

What is Fact?

A fact is something that is quantifiable (Or measurable). Facts are typically (but not always) numerical values that can be aggregated.

What are additive, semi-additive and non-additive measures?

  • Non-additive Measures
Non-additive measures are those which can not be used inside any numeric aggregation function (e.g. SUM(), AVG() etc.). One example of non-additive fact is any kind of ratio or percentage. Example, 5% profit margin, revenue to asset ratio etc. A non-numerical data can also be a non-additive measure when that data is stored in fact tables, e.g. some kind of varchar flags in the fact table.

  • Semi Additive Measures

Semi-additive measures are those where only a subset of aggregation function can be applied. Let’s say account balance. A sum() function on balance does not give a useful result but max() or min() balance might be useful. 

  • Additive Measures

Additive measures can be used with any aggregation function like Sum(), Avg() etc. Example is Sales Quantity etc.

What is Star-Schema?

This schema is used in data warehouse models where one centralized fact table references number of dimension tables so as the keys (primary key) from all the dimension tables flow into the fact table (as foreign key) where measures are stored.

What is Snow-Flake Schema?

This is another logical arrangement of tables in dimensional modeling where a centralized fact table references number of other dimension tables; however, those dimension tables are further normalized into multiple related tables.

Snow-flake increases degree of normalization in the design.

What are the different types of Dimension?

In a data warehouse model, dimension can be of following types,
  • Conformed Dimension
  • Junk Dimension
  • Degenerated Dimension
  • Role Playing Dimension
Based on how frequently the data inside a dimension changes, we can further classify dimension as
  • Unchanging or static dimension (UCD)
  • Slowly changing dimension (SCD)
  • Rapidly changing Dimension (RCD)

What is a 'Conformed Dimension'?

A conformed dimension is the dimension that is shared across multiple subject area. Consider 'Customer' dimension. Both marketing and sales department may use the same customer dimension table in their reports. Similarly, a 'Time' or 'Date' dimension will be shared by different subject areas.

What is degenerated dimension?

A degenerated dimension is a dimension that is derived from fact table and does not have its own dimension table, EG transaction number, receipt number, Invoice number etc. does not have any more associated attributes and hence can not be designed as a dimension table.

What is junk dimension?

A junk dimension is a grouping of typically low-cardinality attributes (flags, indicators etc.) so that those can be removed from other tables and can be junked into an abstract dimension table. The only purpose of this table is to store all the combinations of the dimensional attributes which you could not fit into the different dimension tables otherwise.

What is a role-playing dimension?

Dimensions are often reused for multiple applications within the same database with different contextual meaning. For instance, a "Date" dimension can be used for "Date of Sale", as well as "Date of Delivery", or "Date of Hire". This is often referred to as a 'role-playing dimension'

What is SCD?

SCD stands for slowly changing dimension, i.e. the dimensions where data is slowly changing. These can be of many types, e.g. Type 0, Type 1, Type 2, Type 3 and Type 6, although Type 1, 2 and 3 are most common.

Describe different types of slowly changing Dimension (SCD)

  • Type 0: - Always stays with old record. Values are not updated

A Type 0 dimension is where dimensional changes are not considered. It just means that, even if the value of the attributes change, history is not kept and the table holds all the previous data.

  • Type 1: - Record is updated with new data, history is overridden

A type 1 dimension is where history is not maintained and the table always shows the recent data. This effectively means that such dimension table is always updated with recent data whenever there is a change, and because of this update, we lose the previous values.

  • Type 2: - A separate row is created to keep history. NULL end date for new record

A type 2 dimension table tracks the historical changes by creating separate rows in the table with different surrogate keys. Note that separate surrogate keys are generated for the two records. NULL end date in the second row denotes that the record is the current record.

  • Type 3: - History stored in separate column instead of separate row. Good when date of change is not required.

A type 3 dimension stored the history in a separate column instead of separate rows. So unlike a type 2 dimension which is vertically growing, a type 3 dimension is horizontally growing. This is only good when you need not store many consecutive histories and when date of change is not required to be stored.

  • Type 6: - Just adds column to Type 3 stating if record is current or not

A type 6 dimension is a hybrid of type 1, 2 and 3 (1+2+3) which acts very similar to type 2, but only you add one extra column to denote which record is the current record.

What is a fact-less-fact?

A fact table that does not contain any measure is called a fact-less fact. This table will only contain keys from different dimension tables. This is often used to resolve a many-to-many cardinality issue.

A fact-less-fact table can only answer 'optimistic' queries (positive query) but can not answer a negative query.

Questions from : https://dwbi.org/data-modelling/dimensional-model/58-top-50-dwbi-interview-questions-with-answers