Components and Analysis of Star Schema Design
What are the main components of a star schema?
A star schema has four main components. These are listed below:
- Fact table
- Dimension tables
- Attribute hierarchies.
Let us talk about them one by one:
- These are numerical values.
- Represents the performance measures of the business activity.
- Examples: Productivity, expenses, sales, profit, prices, quantity.
- Facts are also called measures. They are stored in the fact table.
Numerical values are stored in fact table whereas texts are stored in dimension tables. Numerical values give the statistical measure of the business performance. Sales figures are easily understandable than the text, also it is difficult to describe sales in text. Since, facts are used for measuring the performance of the business, they are often called as measures.
What is a fact table?
- A fact table is also called a detail table.
- The fact table is the mid-point of the star schema.
- Fact tables contain the primary key and facts or measures.
Fact table consists of facts and keys. These are main details of the business. Fact table shows the summary of the business performance, so called detail table. It always lie in the centre or middle of the star schema. It is surrounded by dimension tables. Primary key of fact table is also part of it.
Properties of Fact table:
- The fact table is updated periodically by inserting aggregated data from operational databases.
- Facts calculated during the query execution are called metrics.
- Every fact table has dimension tables.
- Fact table supports the summarization of data.
- Contains at least one fact or measure.
- Primary key – the union of primary keys of all dimension tables.
- A fact table is always in BCNF, but not a dimension table.
- A row in fact table contains at least one fact and primary keys of its dimension tables.
- Contains different types of measures: Additive, semi-additive and non-additive.
- Have few columns and many rows – relatively long and narrowly shaped table.
- The level of detail of a single record in a fact table is called the granularity of the fact table.
- Most useful facts are numeric, continuously valued and additive.
Sales figure is one of the typical attribute of fact table. So, this figure needs to be updated after a period of time, like monthly or quarterly. In star schema, dimension tables surround the central fact table. Fact table always has atleast one fact otherwise it will not exist. It does not have a surrogate key, instead the union of primary keys of all dimension tables will make the primary key of fact table. Fact table is always in normalized form of BCNF. Since, fact table has its primary key and few facts, it has less number of attributes but large number of rows. If the sales figures gets updated daily, then granularity of fact table is a day.
What is a Dimension table?
- Dimension tables contain a primary key and only those attributes that are used in the decision-making process.
- Example: Product dimension, Location dimension, Time dimension.
- Dimension table is connected to fact table by primary key to the foreign key relationship.
- Filtering and grouping are supported by dimension tables.
- A dimension is usually descriptive information that qualifies a fact.
As the name indicates, dimension table contains the supportive attributes for business entity. Every dimension table consist of a primary key and necessary attributes. Customer table is one example of dimension table. Every dimension table is joined to fact table by foreign key relationship.
Properties of dimension tables:
- Dimension tables are called look-up or reference tables.
- Dimension table is not normalised.
- Contains one primary key, which makes a part of the primary key of the fact table.
- Dimensions don’t change or change slowly over time.
- They contain few rows and many columns.
- Most star schemas include a time dimension.
- Dimension tables are not joined to each other rather each dimension table is joined to fact table using PK-FK join.
- The primary key is often the surrogate key.
These are called reference table, because they support facts stored in fact table. The summary of business entity is displayed in fact table, while dimension tables support the fact table. Dimension table is not normalized because it will lead to break down of dimension tables into multiple tables. This will introduce more number of joins in the schema, the star query will become more complex and execution will take longer time. Dimension tables are joined to fact table but not with each other, because there is no need of this. Surrogate key is added to each dimension table for the purpose of identifying each record uniquely.
What are the attributes?
- These are the columns in the dimension table.
- Examples of Customer Dimension table: Name of customer, age, gender, marital status etc.
- These are mainly descriptive values.
Attributes are the column names. For a customer table, the details which will describe the customer accurately are called customer attributes. Name, gender, age and marital status of customer are few examples of customer attributes. Most of the time, these are descriptive values like name and address.
What is an attribute hierarchy?
- Attributes can be arranged in a hierarchy structure.
- Relationship between hierarchy levels is N:1.
- Determines a sequence of functional dependencies.
- Example: Product -> Product_type, Product_type -> Industry
- Time dimension hierarchy: Date -> Week -> Month -> Quarter -> Year.
- Location hierarchy: Shop -> City -> District -> State -> Country
- Attribute hierarchy is used to analyse the data at various aggregation levels, usually starting from a higher one.
- Not all columns of dimension tables are used in attribute hierarchy, some columns are just for mere dimension description.
Hierarchy of attribute is useful when we want the information on a finer or coarser granularity. We can find the total sales occurred in a particular quarter. If time hierarchy is present then, we can find the sales occurred in particular month of the same quarter. If still the hierarchy goes further down then we can find the total sales occurred in particular week of that month. Similarly, we can go up the hierarchy. It is not a necessity for all dimension tables to have an attribute hierarchy.
What are various data aggregation ways?
- An attribute hierarchy defines the way of data hierarchy.
- Roll-Up – getting the data of coarser granularity or higher level in the hierarchy.
- Drill-Down – getting the data of finer granularity.
What is a Star Query?
- A star query is a join between the fact table and a number of dimension tables.
- These are very complex queries.
- Take a long time to execute.
A SQL query to execute upon a star schema is called star query. The name star query is used because we are using the query on a star schema. Star schema contains many joins between fact and dimension tables, so star query is a complex query. Due to many join relationships in the star query, the execution takes very long time i.e. hours.