Dimensional Data Model
Dimensional modeling (Data Model) is a design technique that puts the data in a standard framework and provides easy access. You must create a database model in order to provide quick access and to get the information you need for reporting. Two kinds of schemas are used when designing data models, either a star schema or a snowflake schema.
Conceptual Model Design
A conceptual data model identifies the highest-level relationships between the different entities. Features of conceptual data model include:
- Includes the important entities and the relationships among them
- No attribute is specified
- No primary key is specified
- Includes the important entities and the relationships among them
- No attribute is specified
- No primary key is specified
Logical Model Design
The Logical Model is a complete model and has all the information for building the database entities.
The Logical Model include:
- Includes all entities and relationships among them
- All attributes for each entity are specified
- The primary key for each entity is specified
- Foreign keys (keys identifying the relationship between different entities) are specified
- All attributes for each entity are specified
- The primary key for each entity is specified
- Foreign keys (keys identifying the relationship between different entities) are specified
In this model we don't have informations related to a specific database (Oracle, DB2, SQL Server, MySQL, etc. ). For instance, a VARCHAR2 column can have a generic name "String(30)".
Physical Model Design
The Physical Model Design is the Logical Model adapted to a specific database.
For instance, instead a generic "String(30)" we have VARCHAR2(30).
Data Integrity
Data integrity is a term used to refer to the accuracy and reliability of data.
OLTP ( On Line Transaction Processing )
We can divide IT systems into transactional (OLTP) and analytical (OLAP). In general we can assume that OLTP systems provide source data to data warehouses, whereas OLAP systems help to analyze it.
OLAP ( On Line Analytical Processing )
We can divide IT systems into transactional (OLTP) and analytical (OLAP). In general we can assume that OLTP systems provide source data to data warehouses, whereas OLAP systems help to analyze it.
MOLAP ( Multidimensional OLAP )
In MOLAP, data is stored in a multidimensional cube. The storage is not in the relational database, but in proprietary formats.
Advantages:
- Excellent performance: MOLAP cubes are built for fast data retrieval
- Good data compression techniques
- Can perform complex calculations: All calculations have been pre-generated (results returned quickly) when the cube is created.
Disadvantages:
- Limited in the amount of data a cube can handle
- A cube rebuilt could be very long
- Requires additional investment: to buy the proprietary format + investments in human resources
Examples of commercial products that use MOLAP are Cognos Powerplay, Oracle Database OLAP Option, Microsoft Analysis Services, Essbase, TM1, Lilith Hicare and Daptech Keystone. There is also an open source MOLAP server Palo.
ROLAP ( Relational OLAP )
In ROLAP, data is stored in the relational database.
Advantages:
- Can handle large amounts of data: limited to the database storage limit
- Can leverage functionalities inherent in the relational database
- Cost less than the MOLAP
Disadvantages:
- Performance can be slow: the measures are not pre-generated
- Limited by SQL functionalities
HOLAP ( Hybrid OLAP )
HOLAP technologies attempt to combine the advantages of MOLAP and ROLAP. For summary-type information, HOLAP leverages cube technology for faster performance.
Cube
Cubes are logical representation of multidimensional data. The name of "cube" is visually related to a 3 dimensional model, but we can have more than 3 dimensions.
Here is a 3 dimensional cube:
You can see that a cube has some dimensions. In this case we have 3 dimensions: product, location, time. In real life we can have more than 3 dimensions. In that case, inside a cube we can have another cubes.
A 3 dimensional cube is made of cubes. Each cube can be imagines as a sum of cubes.
Dimension
In the pink cube you can see, there are 3 dimensions: product, location, time. A dimension is a structure that categorizes data in order to enable end users to answer business questions.
Measure
Each cube store a value at the intersection of each dimension and that value is named measure. In that cube, 200, 100 are measures.
Hierarchy
A hierarchy defines a set of parentage relationships between all or some of a dimension's members.
Fact table
The star shows how the data is stored. The middle is the central table (the fact table) containing the basic information and the points are dimension tables that show different views of the data. The central table has foreign keys to the dimension tables.
Dimension tables
The star shows how the data is stored. The middle is the central table (the fact table) containing the basic information and the points aredimension tables that show different views of the data. The central table has foreign keys to the dimension tables.
Star schema (= a form of dimensional model )
The star shows how the data is stored. The middle is the central table (the fact table) containing the basic information and the points are dimension tables that show different views of the data. The central table has foreign keys to the dimension tables. There are no foreign keys on the dimension tables.
Snowflake schema (= a form of dimensional model )
The snowflake shows how the data is stored. The middle is the central table (the fact table) containing the basic information and the points are dimension tables that show different views of the data. The central table has foreign keys to the dimension tables. There are also foreign keyson the dimension tables. In snowflake schema, as opposed to its counter part star schema, relational keys are present inside dimensions also.