Normalization

Normalization is the process of organizing the data in the database.
Normalization is used to minimize the redundancy from a relation or set of relations. It is also used to eliminate the undesirable characteristics like Insertion, Update and Deletion Anomalies.
Normalization divides the larger table into the smaller table and links them using relationship.
The normal form is used to reduce redundancy from the database table.

Why Data Normalization?

Increased consistency. Information is stored in one place and one place only, reducing the possibility of inconsistent data.
Easier object-to-data mapping. Highly-normalized data schemas in general are closer conceptually to object-oriented schemas because the object-oriented goals of promoting high cohesion and loose coupling between classes results in similar solutions (at least from a data point of view).

You typically want to have highly normalized operational data stores (ODSs) and data warehouses (DWs).

The primary disadvantage of normalization is slower reporting performance. You will want to have a denormalized schema to support reporting, particularly in data marts.

Anamolies of DB

Example: Suppose a manufacturing company stores the employee details in a table named employee that has four attributes: emp_id for storing employee's id, emp_name for storing employee's name, emp_address for storing employee's address and emp_dept for storing the department details in which the employee works. At some point of time the table looks like this

emp_id	emp_name	emp_address	emp_dept
101	Rick	Delhi	D001
101	Rick	Delhi	D002
123	Maggie	Agra	D890
166	Glenn	Chennai	D900
166	Glenn	Chennai	D004

The above table is not normalized. We will see the problems that we face when a table is not normalized.

Update anomaly

In the above table we have two rows for employee Rick as he belongs to two departments of the company. If we want to update the address of Rick then we have to update the same in two rows or the data will become inconsistent. If somehow, the correct address gets updated in one department but not in other then as per the database, Rick would be having two different addresses, which is not correct and would lead to inconsistent data

Insert anomaly

Suppose a new employee joins the company, who is under training and currently not assigned to any department then we would not be able to insert the data into the table if emp_dept field doesn't allow nulls

Delete anomaly

Suppose, if at a point of time the company closes the department D890 then deleting the rows that are having emp_dept as D890 would also delete the information of employee Maggie since she is assigned only to this department

Normalization

First normal form (1NF)
Second normal form (2NF)
Third normal form (3NF)
Boyce & Codd normal form (BCNF)
Fourth normal form (4NF)
Fifth normal form (5NF)

First Normal Form (1NF)

An attribute (column) of a table cannot hold multiple values. It should hold only atomic values.

Sample Employee table, it displays employees are working with multiple departments.

Employee	Age	Department
Melvin	32	Marketing, Sales
Edward	45	Quality Assurance
Alex	36	Human Resource

Employee table following 1NF:

Employee	Age	Department
Melvin	32	Marketing
Melvin	32	Sales
Edward	45	Quality Assurance
Alex	36	Human Resource

Second Normal Form (2NF)

Should be in 1NF
All non-key attributes are fully functional dependent on the primary key

Sample Products table:

productID	product	Brand
1	Monitor	Apple
2	Monitor	Samsung
3	Scanner	HP
4	Head phone	JBL

Product table following 2NF:

Products Category table:

productID	product
1	Monitor
2	Scanner
3	Head phone

Brand table:

brandID	brand
1	Apple
2	Samsung
3	HP
4	JBL

Products Brand table:

pbID	productID	brandID
1	1	1
2	1	2
3	2	3
4	3	4

Third Normal Form (3NF)

Should be already in 2NF
No transition dependency exists

no column entry should be dependent on any other entry (value) other than the key for the table.

If such an entity exists, move it outside into a new table.

Boyce-Codd Normal Form (BCNF)

3NF and all tables in the database should be only one primary key.

It is an advance version of 3NF that's why it is also referred as 3.5NF. BCNF is stricter than 3NF. A table complies with BCNF if it is in 3NF and for every functional dependency X->Y, X should be the super key of the table.

Fourth Normal Form (4NF)

Tables cannot have multi-valued dependencies on a Primary Key.

Fifth Normal Form (5NF)

A composite key shouldn't have any cyclic dependencies.

https://www.w3schools.in/dbms/database-normalization

http://www.bkent.net/Doc/simple5.htm

Denormalization

The majority of modern applications need to be able to retrieve data in the shortest time possible. And that's when you can consider denormalizing a relational database. As the name suggests, denormalization is the opposite of normalization. When you normalize a database, you organize data to ensure integrity and eliminate redundancies. Database denormalization means you deliberately put the same data in several places, thus increasing redundancy.

The main purpose of denormalization is to significantly speed up data retrieval

Techniques

Storing derivable data
Using pre-joined tables
Using hardcoded values
Keeping details with the master
Repeating a single detail with its master
Adding short-circuit keys

https://rubygarage.org/blog/database-denormalization-with-examples

Why Data Normalization?​

Anamolies of DB​

Update anomaly​

Insert anomaly​

Delete anomaly​

Normalization​

First Normal Form (1NF)​

Second Normal Form (2NF)​

Third Normal Form (3NF)​

Boyce-Codd Normal Form (BCNF)​

Fourth Normal Form (4NF)​

Fifth Normal Form (5NF)​

Denormalization​

Techniques​

Why Data Normalization?

Anamolies of DB

Update anomaly

Insert anomaly

Delete anomaly

Normalization

First Normal Form (1NF)

Second Normal Form (2NF)

Third Normal Form (3NF)

Boyce-Codd Normal Form (BCNF)

Fourth Normal Form (4NF)

Fifth Normal Form (5NF)

Denormalization

Techniques