Relational databases Normal Forms explained

Dominik Szczepaniak
6 min readOct 8, 2022

The article has been originally posted on my Polish blog — devszczepaniak.pl.

This article covers the topic of normal forms in relational databases. The article includes practical examples of normalized tables for each normal form.

First, let’s explain what normalization is. Usually, normalization is understood as activities aimed at achieving the optimum level of order in a particular area under given circumstances. In the context of computer sciences and work with data normalization is focused on ordering datasets to process them easier and/or optimize their size. In our case normalization affects data stored in relational databases. Normalizing databases brings a bunch of benefits:

  • Decreases the risk of data inconsistency.
  • Simplifies operations performed on a database (adding, reading, updating, deleting).
  • Allows grouping data easier, for example by extracting some data to separate tables.
  • Decreases the eventual data size by removing duplicated data.

Normalizing databases also have some cons. Strictly obeying normalization rules may cause that data in particular tables will be tightly coupled with each other that should be represented via relationships. A simple SELECT query may be transferred into a query with multiple JOIN clauses and with intermediate tables. It may decrease the performance of the query as well as its readability and maintainability. Degraded performance equals longer execution time. Depending on application type and business requirements, data duplication or inconsistency may be a better trade-off than e.g. a significantly higher query execution time. For this reason, it is crucial to normalize databases consciously and when the benefits outweigh the costs.

Also, in the context of data normalization, it is worth recalling the words of Donald Knuth:

Premature optimization is the root of all evil.

Nevertheless, there are cases when normalizing data is necessary or at least highly recommended. For example, almost always it makes sense to split contact details into separate columns such as street, city, postal code, and so on.

Normal forms in relational databases

Normalizing relational databases involves changing the structure of tables in a database into such a form that they meet the rules described by normal forms. The most commonly used and known normal forms are:

  • The 1st Normal Form (1NF).
  • The 2nd Normal Form (2NF).
  • The 3rd Normal Form. (3NF).

In addition, in the literature, you can find a few less-known normal forms:

  • Boyce Codd’s Normal Form.
  • The 4th Normal Form (4NF).
  • The 5th Normal Form (5NF).

In this article, I’m going to focus on the first three normal forms, as they are common, and using them will bring you the most benefits.

The 1st Normal Form

To demonstrate the 1st Normal Form let’s first take a look at an example. The example table is a menu from an example restaurant:

The 1st Normal Form says that each value in the database should be atomic. In other words, the value should be indivisible. The presented example does not meet this rule. First of all, each row in the table shows a list of ingredients. Moreover, some of the ingredients are repeated in several rows. If you‘d like to modify the menu, for example, replace the cheese with Mozarella di Buffala, you will need to modify all the rows in the table. Searching for a table with such a structure for the ingredients also can be cumbersome.

In the presented case, the solution is to use a many-to-many relationship and divide the presented table into two separate tables. The first table should contain the menu, and the second table should be a table with ingredients. Thus, the ingredient becomes a separate entity in the database, it has a primary key that allows to identify of each ingredient. There’s also a need to add an additional, table that will keep relations between menu positions and ingredients. The presented example normalized to the 1st Normal Form looks as follows:

Other examples, where it is useful to use the 1st Normal Form are separating the first and last name fields into two separate ones or the aforementioned division of the address into street and city.

The 2nd Normal Form

The first mandatory condition to meet the 2nd Normal Form is satisfying the 1st Normal Form. The second condition is that all columns in a table must depend on the primary key. The example presented below presents a table that does not meet the 2nd Normal Form requirements:

The presented table contains orders from an online store. The 1st Normal Form is satisfied as each column is atomic. Nevertheless, the presented table contains far too much-unrelated data. First of all, the customer data i.e. first name, last name, address, and city should be moved to a separate table. For simplicity, let’s assume that there can be only one product per order. In this case, instead of the product name and its price, the table should contain a foreign key referring to a record in the table with products. The table with orders after normalization looks as follows:

All data not related to a primary key has been moved to other tables, while only foreign keys are stored in the order table.

The 3rd Normal Form

The first mandatory condition to satisfy the 3rd Normal Form is satisfying the 2nd Normal Form. The second obligatory condition to meet is that a non-key column must not depend on another non-key column. Below is an example of a table that will undergo normalization:

The presented table contains a list of contractors with values such as company name, address, city, phone, and dial code. The column is the one with the dialing code. The dialing code does not depend on the primary key, but on the city column, which is not a primary key. In this situation, the city and dial code should be extracted into a separate table, and in the presented table, add a foreign key pointing to the table containing the matching city and dial code.

Summary

I hope you learned something interesting through this article, and normalizing relational databases will no longer be a mystery to you. I encourage you to read the sources and additional materials and leave a like and comment.

Sources and extra links

--

--

Dominik Szczepaniak

Professionally Software Engineer at CKSource. Privately a blogger, a fan of Italian cuisine, and a fan of cycling and weight training.