Skip to main content

Normalization

  • Normalization is the process of organizing the data in the database.
  • Normalization is used to minimize the redundancy from a relation or set of relations. It is also used to eliminate the undesirable characteristics like Insertion, Update and Deletion Anomalies.
  • Normalization divides the larger table into the smaller table and links them using relationship.
  • The normal form is used to reduce redundancy from the database table.

Why Data Normalization?

  1. Increased consistency. Information is stored in one place and one place only, reducing the possibility of inconsistent data.

  2. Easier object-to-data mapping. Highly-normalized data schemas in general are closer conceptually to object-oriented schemas because the object-oriented goals of promoting high cohesion and loose coupling between classes results in similar solutions (at least from a data point of view).

You typically want to have highly normalized operational data stores (ODSs) and data warehouses (DWs).

The primary disadvantage of normalization is slower reporting performance. You will want to have a denormalized schema to support reporting, particularly in data marts.

Anamolies of DB

Example: Suppose a manufacturing company stores the employee details in a table named employee that has four attributes: emp_id for storing employee's id, emp_name for storing employee's name, emp_address for storing employee's address and emp_dept for storing the department details in which the employee works. At some point of time the table looks like this

emp_idemp_nameemp_addressemp_dept
101RickDelhiD001
101RickDelhiD002
123MaggieAgraD890
166GlennChennaiD900
166GlennChennaiD004

The above table is not normalized. We will see the problems that we face when a table is not normalized.

Update anomaly

In the above table we have two rows for employee Rick as he belongs to two departments of the company. If we want to update the address of Rick then we have to update the same in two rows or the data will become inconsistent. If somehow, the correct address gets updated in one department but not in other then as per the database, Rick would be having two different addresses, which is not correct and would lead to inconsistent data

Insert anomaly

Suppose a new employee joins the company, who is under training and currently not assigned to any department then we would not be able to insert the data into the table if emp_dept field doesn't allow nulls

Delete anomaly

Suppose, if at a point of time the company closes the department D890 then deleting the rows that are having emp_dept as D890 would also delete the information of employee Maggie since she is assigned only to this department

Normalization

  • First normal form (1NF)
  • Second normal form (2NF)
  • Third normal form (3NF)
  • Boyce & Codd normal form (BCNF)
  • Fourth normal form (4NF)
  • Fifth normal form (5NF)

First Normal Form (1NF)

An attribute (column) of a table cannot hold multiple values. It should hold only atomic values.

Sample Employee table, it displays employees are working with multiple departments.

EmployeeAgeDepartment
Melvin32Marketing, Sales
Edward45Quality Assurance
Alex36Human Resource

Employee table following 1NF:

EmployeeAgeDepartment
Melvin32Marketing
Melvin32Sales
Edward45Quality Assurance
Alex36Human Resource

Second Normal Form (2NF)

  • Should be in 1NF
  • All non-key attributes are fully functional dependent on the primary key

Sample Products table:

productIDproductBrand
1MonitorApple
2MonitorSamsung
3ScannerHP
4Head phoneJBL

Product table following 2NF:

Products Category table:

productIDproduct
1Monitor
2Scanner
3Head phone

Brand table:

brandIDbrand
1Apple
2Samsung
3HP
4JBL

Products Brand table:

pbIDproductIDbrandID
111
212
323
434

image

Third Normal Form (3NF)

  • Should be already in 2NF
  • No transition dependency exists

no column entry should be dependent on any other entry (value) other than the key for the table.

If such an entity exists, move it outside into a new table.

Boyce-Codd Normal Form (BCNF)

3NF and all tables in the database should be only one primary key.

It is an advance version of 3NF that's why it is also referred as 3.5NF. BCNF is stricter than 3NF. A table complies with BCNF if it is in 3NF and for every functional dependency X->Y, X should be the super key of the table.

Fourth Normal Form (4NF)

Tables cannot have multi-valued dependencies on a Primary Key.

Fifth Normal Form (5NF)

A composite key shouldn't have any cyclic dependencies.

https://www.w3schools.in/dbms/database-normalization

http://www.bkent.net/Doc/simple5.htm

Denormalization

The majority of modern applications need to be able to retrieve data in the shortest time possible. And that's when you can consider denormalizing a relational database. As the name suggests, denormalization is the opposite of normalization. When you normalize a database, you organize data to ensure integrity and eliminate redundancies. Database denormalization means you deliberately put the same data in several places, thus increasing redundancy.

The main purpose of denormalization is to significantly speed up data retrieval

Techniques

  • Storing derivable data
  • Using pre-joined tables
  • Using hardcoded values
  • Keeping details with the master
  • Repeating a single detail with its master
  • Adding short-circuit keys

https://rubygarage.org/blog/database-denormalization-with-examples