Normalization in DBMS: Types, Purpose, & More
Edgar F. Codd proposed the concept of normalization and the first normal form (1NF) in 1970. This was followed by the introduction of other normal forms. But how is this concept relevant to a DBMS? In a database, normalization is integral to ensure data integrity and efficiency. Anyone working with database management systems must understand the concept of normalization. This blog will explain normalization in DBMS, why there is a need for it, different types of normal forms, and practical examples to illustrate the process.
What is Normalization?
Normalization in DBMS is the process of structuring data and the attributes of a database to store the data logically and avoid redundancy. Data redundancy refers to the same data stored in several places in a database. In normalization, larger tables are divided into smaller ones and linked using relationships. To eliminate redundancy, we use various normal forms in DBMS. You can learn more about programming and managing databases with an SQL course.
Purpose of Normalization in DBMS
Data redundancy in database management systems leads to anomalies that make it hard to manage the database. Normalization is necessary to remove these anomalies and instill more consistency.
The different types of anomalies are:
- Insertion Anomalies: This refers to the situation when we are unable to insert data in a database due to some attributes not being present.
- Deletion Anomalies: This refers to the situation in which due to the deletion of one part of the data, the other necessary information is also deleted.
- Updation Anomalies: This refers to the situation when the updation of one data value requires the updation of multiple rows. It happens because the same data items are repeated with the same values but are not linked with each other.
Types of Normalization in DBMS
Various levels of normalization having their own set of guidelines is what is normal form in DBMS. These normal forms apply to individual relationships.
There is a concept of functional dependency related to the concept of normalization. It is a relationship between two sets of attributes of a relational table where one set ascertains the value of the other set of attributes. It is represented by X->Y. Here, X is known as a determinant and Y is a dependent.
Let us now discuss these normal forms to understand normalization in DBMS with examples.
First Normal Form (1NF)
It is the basic level of normalization. A relation is in first normal form if it contains an atomic value or every attribute contains a single-valued attribute and not a multi-valued or composite attribute. It helps in the elimination of duplicate data and simplification of queries.
If there arises a case where there is a multi-valued attribute, you can create a new row for each value of that attribute.
For example, a table <FruitRecord> contains information about fruits acquired from different states to store in a warehouse.
Fruit ID | Fruit Name | States |
101 | Mango | Uttar Pradesh, Andhra Pradesh, Karnataka |
102 | Oranges | Madhya Pradesh |
103 | Apple | Himachal Pradesh |
The “States” column in the above table is a multi-valued attribute. Hence, this relation is not in 1NF. To convert the table into 1NF, we will enter separate rows for each mango-producing state, and the following will be the result.
Fruit ID | Fruit Name | States |
101 | Mango | Uttar Pradesh |
101 | Mango | Andhra Pradesh |
101 | Mango | Karnataka |
102 | Oranges | Madhya Pradesh |
103 | Apple | Himachal Pradesh |
Get a confirmed ₹35,000 total stipend with our data science course with placement guarantee.
Second Normal Form (2NF)
A relational table must follow two rules to be in the second normal form.
- It must be in the first normal form.
- It should not contain any partial dependency.
Partial dependency is a condition where non-prime attributes or attributes not part of the candidate key are not fully functionally dependent on one of the primary keys. Therefore, no partial dependency will mean that all the non-prime attributes are fully functionally dependent on the primary key.
To remove the partially dependent attributes, we can move them to another table where they suit well.
For example, there is a table with information about customers of an insurance agency and the plans they have subscribed to. The following is the sample table:
Customer ID | Plan ID | Customer Name | Plan Name |
101 | A1 | Rahul | Health Benefit Plan |
102 | A2 | Seema | Life Insurance Plan |
103 | A3 | Neha | Vehicle Insurance Plan |
In the above table, the prime attributes are “Customer ID” and “Plan ID”. Because “Customer Name” can be determined by the “Customer ID” and “Plan Name” by “Plan ID”, there is partial dependency in the table.
To remove the dependency and convert it into a second normal form, we can divide this table into smaller tables. The following will be the result:
<CustomerInfo>
Customer ID | Customer Name |
101 | Rahul |
102 | Seema |
103 | Neha |
<CustomerPlan>
Customer ID | Plan ID |
101 | A1 |
102 | A2 |
103 | A3 |
<PlanDetail>
Plan ID | Plan Name |
A1 | Health Benefit Plan |
A2 | Life Insurance Plan |
A3 | Vehicle Insurance Plan |
Third Normal Form (3NF)
A relational table is in 3NF when it fulfills the following rules:
- It must be in the second normal form or 2NF.
- No non-prime attribute should transitively depend on the primary key.
- Also, for each functional dependency X->Z, at least one of the following conditions is true:
- The superkey of the table is X.
- The prime attribute of the table is Z.
Note that a superkey is a set of one or more attributes that uniquely identify a row in a table.
To remove the transitive dependency, we divide the table to exclude the transitively dependent attributes and put them into a new table with a copy of the determinant.
The following example showcases the table <WarehouseDetail> and we will normalize it to the third normal form.
Warehouse Code | Warehouse Name | Warehouse Pincode | Warehouse City |
101 | JC Warehouse | 462001 | Bhopal |
102 | AD Warehouse | 320008 | Ahmedabad |
103 | HG Warehouse | 600005 | Chennai |
In the above table, there is a transitive dependency because of two conditions:
- Warehouse Code-> Warehouse Pincode
- Warehouse Pincode-> Warehouse City
Warehouse Pincode is not a superkey, and Warehouse City is not a prime attribute.
To convert it into the third normal form, we will divide the table into two. The following will be the result:
<WarehouseDetail>
Warehouse Code | Warehouse Name | Warehouse Pincode |
101 | JC Warehouse | 462001 |
102 | AD Warehouse | 320008 |
103 | HG Warehouse | 600005 |
<WarehouseLocation>
Warehouse Pincode | Warehouse City |
462001 | Bhopal |
320008 | Ahmedabad |
600005 | Chennai |
Explore the best business analyst course with placement on Internshala and start you journey in data science.
Boyce-Codd Normal Form (BCNF)
Boyce-Codd Normal Form or BCNF is a stronger or advanced version of the third normal form. It ensures that each determinant in the table is a candidate key. The following are the rules for a relational table to be in BCNF:
- The table must be in the third normal form.
- X is the superkey of the table for every non-trivial functional dependency X->Y. Therefore, X cannot be a non-prime attribute if Y is a prime attribute.
Take, for example, the following table.
<StudentProjectLead>
Student Code | Project ID | Project Leader |
101 | P1 | Rahul |
102 | P2 | Neha |
103 | P3 | Seema |
The above table violates the rules of the BCNF as “Project ID” is a prime attribute but “Project Leader” is a non-prime attribute.
We will divide this table into two to convert it into Boyce-Codd normal form.
<StudentProject>
Student Code | Project ID |
101 | P1 |
102 | P2 |
103 | P3 |
<ProjectLead>
Project Leader | Project ID |
Rahul | P1 |
Neha | P2 |
Seema | P3 |
Fourth Normal Form (4NF)
A table in the fourth normal form will adhere to the rules of the Boyce-Codd normal form and will have no multi-valued dependencies.
For example, the following table has multi-valued dependencies.
<StudentCourses>
Student ID | Courses |
101 | Math, History |
102 | Science |
103 | Math, English |
To convert the above table to 4NF, we will divide it into three tables.
<Student>
Student ID |
101 |
102 |
103 |
<Course>
Courses |
Math, History |
Science |
Math, English |
<StudentCourseNew>
Student ID | Course |
101 | Math |
101 | History |
102 | Science |
103 | Math |
103 | English |
Fifth Normal Form (5NF)
It is the highest level of normalization. The table here is divided into smaller ones to ensure there is no data redundancy, there is data integrity, and the table does not contain any join dependency. The joining is also without the loss of data.
A join dependency is when a table can be reconstructed by joining multiple smaller tables.
The following example will illustrate this normal form:
Suppose there are three tables.
<Books>
ISBN | Title | Author |
978-0451450524 | “Neuromancer” | William Gibson |
978-0547928210 | “The Hobbit” | J.R.R. Tolkien |
978-1402833317 | “1984” | George Orwell |
<Authors>
Author | Birth_Year |
William Gibson | 1948 |
J.R.R. Tolkien | 1892 |
George Orwell | 1903 |
<Publishers>
ISBN | Publisher |
978-0451450524 | Ace Books |
978-0547928210 | Houghton Mifflin Harcourt |
978-1402833317 | Rosettabooks |
Now, we will identify the join dependencies. There is one between the “Books” and “Authors” table based on the “Author” column. Another join dependency is between the “Books” and “Publishers” table based on the “ISBN” column.
The next step is to remove redundancy based on these join dependencies we have identified. So we will create the following tables:
<Books_Authors>
ISBN | Author |
978-0451450524 | William Gibson |
978-0547928210 | J.R.R. Tolkien |
978-1402833317 | George Orwell |
<Books_Publishers>
ISBN | Publisher |
978-0451450524 | Ace Books |
978-0547928210 | Houghton Mifflin Harcourt |
978-1402833317 | Rosettabooks |
To maintain the relationship between the divided smaller tables we will create a metadata table.
<Book_Metadata>
ISBN | Title | Author | Publisher |
978-0451450524 | “Neuromancer” | William Gibson | Ace Books |
978-0547928210 | “The Hobbit” | J.R.R. Tolkien | Houghton Mifflin Harcourt |
978-1402833317 | “1984” | George Orwell | Rosettabooks |
With these various normal forms, you will get a database devoid of redundancy and efficient in maintaining data integrity. The concept of normalization is useful for those who are looking forward to working as a data scientist or similar jobs.
Conclusion
Normalization in DBMS is an essential process for removing redundancy and optimizing database storage. It removes anomalies to make the data more consistent and we get to increase the efficiency of database operations. The various normal forms in DBMS given above will help you optimize your data.