Data Modeling Fundamentals
One of the first steps while working on any project that handles large amount of data, is to decide the structure and flow of data stored. A good data model is a means to a good foundation of any project, where as a badly structured data model can lead to a lot of chaos. Data Modeling is the blueprint of any database. One doesn’t build a house before finalizing on the blueprint. Although, it doesn’t store any actual data, rather, it provides insights to the data requirements of any project while also catering to abstraction from the database implementation.
History & Need of Data Modelling
Peter Chen, in 1976, published his paper “Entity- Relationship Model — towards unified data” which marked the start of data modeling. Now this paper was published at a time where Relational Database Management Systems (RDBMS) didn’t exist, explaining the need of the hour. Most of the data at the time, were stored in the mainframe, minicomputer file systems and were cryptic. Many files existed on tapes and punch cards as well and data definitions were buried deep under layers of programs.
The Building Blocks of Data Modeling
There are four pillars on which the data models are defined:
- Major Data Subjects
- Attributes of the Data Subjects
- Relationship among Data Subjects
- Business Rule for our Data
Although the blocks are self explanatory, let’s go over these building blocks.
Synonym to “entities”, “Objects” or “class”, data subjects are the main characters of the database. For instance, “Student”, “Teacher”, “Staff” are the data subjects of a school management data model.
Typically associated with the data subjects, these define the qualities of the data subject. To illustrate, “Student Id”, “Name”, “Class” are some attributes that can be associated to a student data subject. There’s also a multi-valued-attribute in data modelling, which is brought into picture when a specific entity needs to store more than one instance of an attribute. For example, a student can have multiple email ids that needs to be store. We generally use a double circle representation in an ER model to represent the same.
As the name suggests, it defines the relationship between two data subjects.
a teacher “teaches” a student, can be one such relationship. Now the interesting thing here is, that there are multiple kinds of relationships. It can be unidirectional like “teaches”, can be bidirectional like “classmate of”, can even be hierarchical like that between the principal, HOD, professors and assistant professors. Another kind of relationship that may exist is multiple relationship among two data subjects. For example, a teacher can have both “teaches” and “is HOD for” relationship with a student.
Business Rules for Data
Last but not least, there are different business aspects, rules and guidelines to set for a data model. There are a number of questions that have to kept in mind while data modeling.
- What will be the Cardinality of the data model?
- Is defining a relationship between data subjects mandatory or optional?
- Will there be permissible attribute value allowed?
- What will be the “data change dynamics” policy.
flowchart created using https://miro.com/app/
The classic ER model was noted to be a enhanced and extended version of Chen’s paper “Entity- Relationship Model — towards unified data”. Whereas, the post Classic ER model used Crow’s foot notation, instead of the traditional ovals and diamonds. Although in element, both Data Modeling Specifications are similar semantically.
Overall system modeling has a category Semantic data modeling, which was partially developed by the US Airforce with the ideology of one data modeling methodology. On the other side of the system modeling equation, we have UML, Unified modeling language with its roots in Object oriented systems and is adopted by Object Management Group (OMG). It is a general purpose modeling language and notation for SWE.
Notations to keep in mind
What we’re trying to conclude here is that we want to get data modelling as close as possible to the real world case and avoid being constrained by things like the rules of relational database management systems.
Follow me on my journey as a tech enthusiast @girl_in_stem on Instagram.