Normalization in DBMS with Examples

Praveen Kumar

Edgar F. Codd proposed the concept of normalization and the first normal form (1NF) in 1970. This was followed by the introduction of other normal forms. But how is this concept relevant to a DBMS? In a database, normalization is integral to ensure data integrity and efficiency. Anyone working with database management systems must understand the concept of normalization. This blog will explain normalization in DBMS, why there is a need for it, different types of normal forms, and practical examples to illustrate the process.

Table of Contents

What is Normalization?

Normalization is the process of structuring data and the attributes of a database to store the data logically and avoid redundancy. Data redundancy refers to the same data stored in several places in a database. In normalization, larger tables are divided into smaller ones and linked using relationships. To eliminate redundancy, we use various normal forms in DBMS. You can learn more about programming and managing databases with an SQL course.

Purpose of Normalization

Data redundancy in database management systems leads to anomalies that make it hard to manage the database. Normalization is necessary to remove these anomalies and instill more consistency.

The different types of anomalies are:

Insertion Anomalies: This refers to the situation when we are unable to insert data in a database due to some attributes not being present.
Deletion Anomalies: This refers to the situation in which due to the deletion of one part of the data, the other necessary information is also deleted.
Updation Anomalies: This refers to the situation when the updation of one data value requires the updation of multiple rows. It happens because the same data items are repeated with the same values but are not linked with each other.

Types of Normalization

Various levels of normalization having their own set of guidelines is what is normal form in DBMS. These normal forms apply to individual relationships.

There is a concept of functional dependency related to the concept of normalization. It is a relationship between two sets of attributes of a relational table where one set ascertains the value of the other set of attributes. It is represented by X->Y. Here, X is known as a determinant and Y is a dependent.

Let us now discuss these normal forms to understand normalization in DBMS with examples.

First Normal Form (1NF)

It is the basic level of normalization. A relation is in first normal form if it contains an atomic value or every attribute contains a single-valued attribute and not a multi-valued or composite attribute. It helps in the elimination of duplicate data and simplification of queries.

If there arises a case where there is a multi-valued attribute, you can create a new row for each value of that attribute.

For example, a table <FruitRecord> contains information about fruits acquired from different states to store in a warehouse.

Fruit ID	Fruit Name	States
101	Mango	Uttar Pradesh, Andhra Pradesh, Karnataka
102	Oranges	Madhya Pradesh
103	Apple	Himachal Pradesh

The “States” column in the above table is a multi-valued attribute. Hence, this relation is not in 1NF. To convert the table into 1NF, we will enter separate rows for each mango-producing state, and the following will be the result.

Fruit ID	Fruit Name	States
101	Mango	Uttar Pradesh
101	Mango	Andhra Pradesh
101	Mango	Karnataka
102	Oranges	Madhya Pradesh
103	Apple	Himachal Pradesh

Get a confirmed ₹35,000 total stipend with our data science course with placement guarantee.

Second Normal Form (2NF)

A relational table must follow two rules to be in the second normal form.

It must be in the first normal form.
It should not contain any partial dependency.

Partial dependency is a condition where non-prime attributes or attributes not part of the candidate key are not fully functionally dependent on one of the primary keys. Therefore, no partial dependency will mean that all the non-prime attributes are fully functionally dependent on the primary key.

To remove the partially dependent attributes, we can move them to another table where they suit well.

For example, there is a table with information about customers of an insurance agency and the plans they have subscribed to. The following is the sample table:

Customer ID	Plan ID	Customer Name	Plan Name
101	A1	Rahul	Health Benefit Plan
102	A2	Seema	Life Insurance Plan
103	A3	Neha	Vehicle Insurance Plan

In the above table, the prime attributes are “Customer ID” and “Plan ID”. Because “Customer Name” can be determined by the “Customer ID” and “Plan Name” by “Plan ID”, there is partial dependency in the table.

To remove the dependency and convert it into a second normal form, we can divide this table into smaller tables. The following will be the result:

<CustomerInfo>

Customer ID	Customer Name
101	Rahul
102	Seema
103	Neha

<CustomerPlan>

Customer ID	Plan ID
101	A1
102	A2
103	A3

<PlanDetail>

Plan ID	Plan Name
A1	Health Benefit Plan
A2	Life Insurance Plan
A3	Vehicle Insurance Plan

Third Normal Form (3NF)

A relational table is in 3NF when it fulfills the following rules:

It must be in the second normal form or 2NF.
No non-prime attribute should transitively depend on the primary key.
Also, for each functional dependency X->Z, at least one of the following conditions is true:
- The superkey of the table is X.
- The prime attribute of the table is Z.

Note that a superkey is a set of one or more attributes that uniquely identify a row in a table.

To remove the transitive dependency, we divide the table to exclude the transitively dependent attributes and put them into a new table with a copy of the determinant.

The following example showcases the table <WarehouseDetail> and we will normalize it to the third normal form.

Warehouse Code	Warehouse Name	Warehouse Pincode	Warehouse City
101	JC Warehouse	462001	Bhopal
102	AD Warehouse	320008	Ahmedabad
103	HG Warehouse	600005	Chennai

In the above table, there is a transitive dependency because of two conditions:

Warehouse Code-> Warehouse Pincode
Warehouse Pincode-> Warehouse City

Warehouse Pincode is not a superkey, and Warehouse City is not a prime attribute.

To convert it into the third normal form, we will divide the table into two. The following will be the result:

<WarehouseDetail>

Warehouse Code	Warehouse Name	Warehouse Pincode
101	JC Warehouse	462001
102	AD Warehouse	320008
103	HG Warehouse	600005

<WarehouseLocation>

Warehouse Pincode	Warehouse City
462001	Bhopal
320008	Ahmedabad
600005	Chennai

Explore the best business analyst course with placement on Internshala and start you journey in data science.

Boyce-Codd Normal Form (BCNF)

Boyce-Codd Normal Form or BCNF is a stronger or advanced version of the third normal form. It ensures that each determinant in the table is a candidate key. The following are the rules for a relational table to be in BCNF:

The table must be in the third normal form.
X is the superkey of the table for every non-trivial functional dependency X->Y. Therefore, X cannot be a non-prime attribute if Y is a prime attribute.

Take, for example, the following table.

<StudentProjectLead>

Student Code	Project ID	Project Leader
101	P1	Rahul
102	P2	Neha
103	P3	Seema

The above table violates the rules of the BCNF as “Project ID” is a prime attribute but “Project Leader” is a non-prime attribute.

We will divide this table into two to convert it into Boyce-Codd normal form.

<StudentProject>

Student Code	Project ID
101	P1
102	P2
103	P3

<ProjectLead>

Project Leader	Project ID
Rahul	P1
Neha	P2
Seema	P3

Fourth Normal Form (4NF)

A table in the fourth normal form will adhere to the rules of the Boyce-Codd normal form and will have no multi-valued dependencies.

For example, the following table has multi-valued dependencies.

<StudentCourses>

Student ID	Courses
101	Math, History
102	Science
103	Math, English

To convert the above table to 4NF, we will divide it into three tables.

<Student>

Student ID

101

102

103

<Course>

Courses

Math, History

Science

Math, English

<StudentCourseNew>

Student ID	Course
101	Math
101	History
102	Science
103	Math
103	English

Fifth Normal Form (5NF)

It is the highest level of normalization in dbms. The table here is divided into smaller ones to ensure there is no data redundancy, there is data integrity, and the table does not contain any join dependency. The joining is also without the loss of data.

A join dependency is when a table can be reconstructed by joining multiple smaller tables.

The following example will illustrate this normal form:

Suppose there are three tables.

<Books>

ISBN	Title	Author
978-0451450524	“Neuromancer”	William Gibson
978-0547928210	“The Hobbit”	J.R.R. Tolkien
978-1402833317	“1984”	George Orwell

<Authors>

Author	Birth_Year
William Gibson	1948
J.R.R. Tolkien	1892
George Orwell	1903

<Publishers>

ISBN	Publisher
978-0451450524	Ace Books
978-0547928210	Houghton Mifflin Harcourt
978-1402833317	Rosettabooks

Now, we will identify the join dependencies. There is one between the “Books” and “Authors” table based on the “Author” column. Another join dependency is between the “Books” and “Publishers” table based on the “ISBN” column.

The next step is to remove redundancy based on these join dependencies we have identified. So we will create the following tables:

<Books_Authors>

ISBN	Author
978-0451450524	William Gibson
978-0547928210	J.R.R. Tolkien
978-1402833317	George Orwell

<Books_Publishers>

ISBN	Publisher
978-0451450524	Ace Books
978-0547928210	Houghton Mifflin Harcourt
978-1402833317	Rosettabooks

To maintain the relationship between the divided smaller tables we will create a metadata table.

<Book_Metadata>

ISBN	Title	Author	Publisher
978-0451450524	“Neuromancer”	William Gibson	Ace Books
978-0547928210	“The Hobbit”	J.R.R. Tolkien	Houghton Mifflin Harcourt
978-1402833317	“1984”	George Orwell	Rosettabooks

With these various normal forms, you will get a database devoid of redundancy and efficient in maintaining data integrity. The concept of normalization is useful for those who are looking forward to working as a data scientist or similar jobs.

Conclusion

Normalization in DBMS is an essential process for removing redundancy and optimizing database storage. It removes anomalies to make the data more consistent and we get to increase the efficiency of database operations. The various normal forms in DBMS given above will help you optimize your data.

Normalization in DBMS with Examples

What is Normalization?

Purpose of Normalization

Types of Normalization

First Normal Form (1NF)

Second Normal Form (2NF)

Third Normal Form (3NF)

Boyce-Codd Normal Form (BCNF)

Fourth Normal Form (4NF)

Fifth Normal Form (5NF)

Conclusion

Related Post

Programming

Business & Management

Core Engineering

Data Science

Design

Artificial Intelligence

Creative Arts

Language

Career Development

Architecture

Placement Guarantee Courses

IITM Pravartak Certification

What is Normalization?

Purpose of Normalization

Types of Normalization

First Normal Form (1NF)

Second Normal Form (2NF)

Third Normal Form (3NF)

Boyce-Codd Normal Form (BCNF)

Fourth Normal Form (4NF)

Fifth Normal Form (5NF)

Conclusion

Related Post

What is SQLite?

SQL Roadmap for Beginners: A Comprehensive Guide to Mastering SQL

What is a database? Importance, Types, and Components

Top 45 Database Interview Questions and Answers