Normalization is an important concept in database management systems (DBMS) that helps to ensure data consistency and minimize data redundancy. It is a process of organizing data in a database to reduce redundancy and dependency, and to improve data integrity and efficiency. Normalization is achieved by applying a set of rules, called normal forms, which define how data should be organized in a database.

 

There are different levels of normalization, ranging from first normal form (1NF) to fifth normal form (5NF). In this article, we will focus on the first three normal forms – 1NF, 2NF, and 3NF – as well as Boyce-Codd normal form (BCNF).

 

  1. First Normal Form (1NF)

 

The first normal form (1NF) is the most basic level of normalization. It requires that all data in a table be atomic, meaning that each attribute contains only a single value. This eliminates the problem of repeating groups or multivalued attributes, where a single attribute contains multiple values. To convert a table into 1NF, we need to ensure that each cell in the table contains only a single value.

 

For example, consider a table that stores information about employees, including their name, address, and phone number. This table is not in 1NF because the address and phone number attributes contain multiple values:

 

Employee ID

Name Address Phone Number

1

John Doe

123 Main St, Anytown, USA

555-1234

2

Jane Doe

456 Oak St, Anytown, USA

555-5678

3 Bob Smith 789 Maple St, Anytown, USA

555-9999

 

To convert this table into 1NF, we need to separate the address and phone number attributes into separate tables:

 

Employee ID

Name

1

John Doe

2

Jane Doe

3

Bob Smith

 

 

Employee ID

Address

1

123 Main St, Anytown, USA

2

456 Oak St, Anytown, USA

3

789 Maple St, Anytown, USA

 

 

Employee ID

Phone Number

1

555-1234

2

555-5678

3

555-9999

 

By separating the address and phone number attributes into separate tables, we have eliminated the problem of multivalued attributes and brought the table into 1NF.

 

  1. Second Normal Form (2NF)

 

The second normal form (2NF) builds on the first normal form by requiring that all non-key attributes be functionally dependent on the entire primary key. In other words, each non-key attribute must be dependent on the entire primary key, not just part of it.

 

To understand this better, let’s consider an example. Consider a table that stores information about customer orders, including the order number, customer ID, and product ID:

 

Order Number

Customer ID Product ID Product Name Quantity

1

101 001 Laptop 2

2

102 002 Smartphone

3

3 103 001 Laptop

1

4 104 003

 

 

The second normal form (2NF) is based on the concept of full functional dependency. A relation is said to be in 2NF if it is in 1NF and every non-key attribute is fully dependent on the primary key.

 

To explain this, let’s consider a new example of a table named ‘Orders’. The table has the following attributes:

 

Order_ID | Product | Customer | Price | Date | Quantity

 

Here, the primary key is ‘Order_ID’. The other attributes are ‘Product’, ‘Customer’, ‘Price’, ‘Date’ and ‘Quantity’. There are no partial dependencies, but there is a transitive dependency between ‘Order_ID’ and ‘Price’ through the attribute ‘Quantity’.

 

In the above table, ‘Order_ID’ determines ‘Customer’, ‘Date’ and ‘Product’, while ‘Product’ determines ‘Price’. However, ‘Quantity’ is dependent on both ‘Order_ID’ and ‘Product’. In other words, ‘Price’ is indirectly dependent on ‘Order_ID’.

 

To eliminate this transitive dependency, we can create a new table called ‘Order_Details’ with the attributes ‘Order_ID’, ‘Product’ and ‘Quantity’. This table will have a composite primary key consisting of ‘Order_ID’ and ‘Product’. We can also add the attribute ‘Price’ in this table, which will eliminate the transitive dependency.

 

  • Third Normal Form (3NF)

 

The third normal form (3NF) is based on the concept of transitive dependency. A relation is said to be in 3NF if it is in 2NF and every non-key attribute is non-transitively dependent on the primary key.

 

To explain this, let’s consider a new example of a table named ‘Students’. The table has the following attributes:

 

Roll_No | Name | Department | Course | Course_Instructor | Instructor_Email

 

Here, the primary key is ‘Roll_No’. The other attributes are ‘Name’, ‘Department’, ‘Course’, ‘Course_Instructor’ and ‘Instructor_Email’. There are no partial dependencies, but there is a transitive dependency between ‘Department’ and ‘Instructor_Email’ through the attribute ‘Course_Instructor’.

 

In the above table, ‘Roll_No’ determines ‘Name’ and ‘Department’, while ‘Course’ determines ‘Course_Instructor’. However, ‘Instructor_Email’ is dependent on ‘Course_Instructor’, which in turn is dependent on ‘Department’. In other words, ‘Instructor_Email’ is indirectly dependent on ‘Roll_No’.

 

To eliminate this transitive dependency, we can create a new table called ‘Instructors’ with the attributes ‘Course_Instructor’ and ‘Instructor_Email’. This table will have a primary key consisting of ‘Course_Instructor’. We can also add the attribute ‘Department’ in this table, which will eliminate the transitive dependency.

 

  • Boyce-Codd Normal Form (BCNF)

 

The Boyce-Codd normal form (BCNF) is a higher level of normalization and is based on the concept of functional dependency. A relation is said to be in BCNF if it is in 3NF and every determinant is a candidate key.

 

To explain this, let’s consider a new example of a table named ‘Employees’. The table has the following attributes:

 

Emp_ID | Name | Dept | Manager_ID | Manager_Name | Manager_Dept

 

Here, the primary key is ‘Emp_ID’. The other attributes are ‘Name’, ‘Dept’, ‘Manager_ID’, ‘Manager_Name’ and ‘Manager_Dept’. There are no partial dependencies, but there is a functional dependency between ‘Manager_ID’ and ‘Manager_Name’ and ‘Manager_Dept’.

 

In the above table, ‘Emp_ID’ determines ‘Name’ and ‘Dept’, while BCNF (Boyce-Codd Normal Form)

 

BCNF is a higher level of normalization that was introduced by Raymond Boyce and Edgar Codd. A relation is in BCNF if every determinant is a candidate key. If a relation is not in BCNF, it will have anomalies. A relation that is in BCNF is guaranteed to be free of all insertion, deletion, and update anomalies.

 

Example:

 

Consider the following relation:

 

R (A, B, C, D)

 

The functional dependencies are:

 

A → B

B → C

C → D

 

Here, the candidate keys are {A}, {B}, {C}.

 

Let’s check whether the relation is in BCNF or not.

For A → B, A is the determinant and {A, B} is the candidate key. Hence, this dependency satisfies the BCNF condition.

For B → C, B is the determinant and {B, C} is the candidate key. Hence, this dependency satisfies the BCNF condition.

For C → D, C is the determinant and {C, D} is the candidate key. Hence, this dependency satisfies the BCNF condition.

Therefore, the relation R is in BCNF.

 

Normalization is an essential concept in database design, as it helps to eliminate data redundancy, inconsistencies, and anomalies. The process of normalization involves breaking down a relation into smaller relations to ensure that each relation has a unique purpose and that there is no redundancy. The goal is to create a set of relations that are free from anomalies and that can be easily maintained and updated. The four normal forms – 1NF, 2NF, 3NF, and BCNF – help to achieve this goal by setting rules for the dependencies between attributes in a relation. By following these rules, we can create a well-organized and efficient database.

 

TalentEdge offers various online certification courses for working professionals from top institutes, some of the courses are:

 

 

Want to know how can this course help in your profile?