Normalization is an important concept in database management systems (DBMS) that helps to ensure data consistency and minimize data redundancy. It is a process of organizing data in a database to reduce redundancy and dependency, and to improve data integrity and efficiency. Normalization is achieved by applying a set of rules, called normal forms, which define how data should be organized in a database.
There are different levels of normalization, ranging from first normal form (1NF) to fifth normal form (5NF). In this article, we will focus on the first three normal forms – 1NF, 2NF, and 3NF – as well as Boyce-Codd normal form (BCNF).
- First Normal Form (1NF)
The first normal form (1NF) is the most basic level of normalization. It requires that all data in a table be atomic, meaning that each attribute contains only a single value. This eliminates the problem of repeating groups or multivalued attributes, where a single attribute contains multiple values. To convert a table into 1NF, we need to ensure that each cell in the table contains only a single value.
For example, consider a table that stores information about employees, including their name, address, and phone number. This table is not in 1NF because the address and phone number attributes contain multiple values:
Employee ID | Name | Address | Phone Number |
1 | John Doe | 123 Main St, Anytown, USA | 555-1234 |
2 | Jane Doe | 456 Oak St, Anytown, USA | 555-5678 |
3 | Bob Smith | 789 Maple St, Anytown, USA | 555-9999 |
To convert this table into 1NF, we need to separate the address and phone number attributes into separate tables:
Employee ID | Name |
1 | John Doe |
2 | Jane Doe |
3 | Bob Smith |
Employee ID | Address |
1 | 123 Main St, Anytown, USA |
2 | 456 Oak St, Anytown, USA |
3 | 789 Maple St, Anytown, USA |
Employee ID | Phone Number |
1 | 555-1234 |
2 | 555-5678 |
3 | 555-9999 |
By separating the address and phone number attributes into separate tables, we have eliminated the problem of multivalued attributes and brought the table into 1NF.
- Second Normal Form (2NF)
The second normal form (2NF) builds on the first normal form by requiring that all non-key attributes be functionally dependent on the entire primary key. In other words, each non-key attribute must be dependent on the entire primary key, not just part of it.
To understand this better, let’s consider an example. Consider a table that stores information about customer orders, including the order number, customer ID, and product ID:
Order Number | Customer ID | Product ID | Product Name | Quantity |
1 | 101 | 001 | Laptop | 2 |
2 | 102 | 002 | Smartphone | 3 |
3 | 103 | 001 | Laptop | 1 |
4 | 104 | 003 |
The second normal form (2NF) is based on the concept of full functional dependency. A relation is said to be in 2NF if it is in 1NF and every non-key attribute is fully dependent on the primary key.
To explain this, let’s consider a new example of a table named ‘Orders’. The table has the following attributes:
Order_ID | Product | Customer | Price | Date | Quantity
Here, the primary key is ‘Order_ID’. The other attributes are ‘Product’, ‘Customer’, ‘Price’, ‘Date’ and ‘Quantity’. There are no partial dependencies, but there is a transitive dependency between ‘Order_ID’ and ‘Price’ through the attribute ‘Quantity’.
In the above table, ‘Order_ID’ determines ‘Customer’, ‘Date’ and ‘Product’, while ‘Product’ determines ‘Price’. However, ‘Quantity’ is dependent on both ‘Order_ID’ and ‘Product’. In other words, ‘Price’ is indirectly dependent on ‘Order_ID’.
To eliminate this transitive dependency, we can create a new table called ‘Order_Details’ with the attributes ‘Order_ID’, ‘Product’ and ‘Quantity’. This table will have a composite primary key consisting of ‘Order_ID’ and ‘Product’. We can also add the attribute ‘Price’ in this table, which will eliminate the transitive dependency.
- Third Normal Form (3NF)
The third normal form (3NF) is based on the concept of transitive dependency. A relation is said to be in 3NF if it is in 2NF and every non-key attribute is non-transitively dependent on the primary key.
To explain this, let’s consider a new example of a table named ‘Students’. The table has the following attributes:
Roll_No | Name | Department | Course | Course_Instructor | Instructor_Email
Here, the primary key is ‘Roll_No’. The other attributes are ‘Name’, ‘Department’, ‘Course’, ‘Course_Instructor’ and ‘Instructor_Email’. There are no partial dependencies, but there is a transitive dependency between ‘Department’ and ‘Instructor_Email’ through the attribute ‘Course_Instructor’.
In the above table, ‘Roll_No’ determines ‘Name’ and ‘Department’, while ‘Course’ determines ‘Course_Instructor’. However, ‘Instructor_Email’ is dependent on ‘Course_Instructor’, which in turn is dependent on ‘Department’. In other words, ‘Instructor_Email’ is indirectly dependent on ‘Roll_No’.
To eliminate this transitive dependency, we can create a new table called ‘Instructors’ with the attributes ‘Course_Instructor’ and ‘Instructor_Email’. This table will have a primary key consisting of ‘Course_Instructor’. We can also add the attribute ‘Department’ in this table, which will eliminate the transitive dependency.
- Boyce-Codd Normal Form (BCNF)
The Boyce-Codd normal form (BCNF) is a higher level of normalization and is based on the concept of functional dependency. A relation is said to be in BCNF if it is in 3NF and every determinant is a candidate key.
To explain this, let’s consider a new example of a table named ‘Employees’. The table has the following attributes:
Emp_ID | Name | Dept | Manager_ID | Manager_Name | Manager_Dept
Here, the primary key is ‘Emp_ID’. The other attributes are ‘Name’, ‘Dept’, ‘Manager_ID’, ‘Manager_Name’ and ‘Manager_Dept’. There are no partial dependencies, but there is a functional dependency between ‘Manager_ID’ and ‘Manager_Name’ and ‘Manager_Dept’.
In the above table, ‘Emp_ID’ determines ‘Name’ and ‘Dept’, while BCNF (Boyce-Codd Normal Form)
BCNF is a higher level of normalization that was introduced by Raymond Boyce and Edgar Codd. A relation is in BCNF if every determinant is a candidate key. If a relation is not in BCNF, it will have anomalies. A relation that is in BCNF is guaranteed to be free of all insertion, deletion, and update anomalies.
Example:
Consider the following relation:
R (A, B, C, D)
The functional dependencies are:
A → B
B → C
C → D
Here, the candidate keys are {A}, {B}, {C}.
Let’s check whether the relation is in BCNF or not.
For A → B, A is the determinant and {A, B} is the candidate key. Hence, this dependency satisfies the BCNF condition.
For B → C, B is the determinant and {B, C} is the candidate key. Hence, this dependency satisfies the BCNF condition.
For C → D, C is the determinant and {C, D} is the candidate key. Hence, this dependency satisfies the BCNF condition.
Therefore, the relation R is in BCNF.
Normalization is an essential concept in database design, as it helps to eliminate data redundancy, inconsistencies, and anomalies. The process of normalization involves breaking down a relation into smaller relations to ensure that each relation has a unique purpose and that there is no redundancy. The goal is to create a set of relations that are free from anomalies and that can be easily maintained and updated. The four normal forms – 1NF, 2NF, 3NF, and BCNF – help to achieve this goal by setting rules for the dependencies between attributes in a relation. By following these rules, we can create a well-organized and efficient database.
TalentEdge offers various online certification courses for working professionals from top institutes, some of the courses are:
- Digital Marketing Courses
- Finance Courses
- Data Science Courses
- Supply Chain Management Courses
- Human Resource Management Courses
- Risk Management Courses
- IT & Technology Courses
- Business Management Courses
- Brand Management Courses
- Project Management Courses
- General Management Courses
- Entrepreneurship Courses
- Healthcare Courses