We will be looking at the process to convert our E/R diagram (our entities and attributes) to a relational database design. We want to create our well-structured relations (tables). The process we will use to convert our E/R to tables is called Normalization. Normalization is the process to create a well-structured set of tables (relations) for our database from our Entities on our E/R diagram. If you remember, a set of well-structured relations means that we have no anomalies in our tables.
The normalization process actually has 6 steps - but we only do the first 3 in most of our database creations. The following are the steps in normalization:
1. Remove any multi-valued attributes
2. Remove any partial functional dependencies
3. Remove any transitive dependencies
As we correct the problems from #1 above (removing multi-valued attributes) - we have a table in 1st Normal Form (1NF)
As we correct the problems from #2 above (removing any partial functional dependencies) - we have a table in 2nd Normal form (2NF)
As we correct the problems from #3 above (removing any transitive dependencies) - we have a table in 3rd Normal Form (3NF)
Multi-valued attribute: this occurs when a single cell of a two-dimensional table contains more than one value: for example - if we have an attribute for EMPLOYEE called Emp_Skill - we could have more than one thing listed. if you remember - each cell must at atomic - contain only one value! To correct this problem in our table we do the following:
For any row of data that contains a cell (column value) that has more than one value - we repeat the row (having identical data in the columns EXCEPT for the different multi-valued cell values. So, if we have an EMPLOYEE instance of John Smith who has 3 different skills - we would add two additional rows to my EMPLOYEE table that has attributes of Emp_SocSec, Emp_LName, Emp_Fname, Emp_Street, Emp_City, Emp_State, Emp_Zip, Emp_Phone, Emp_Skill. The columns would have the same data for John Smith in all of the attributes EXCEPT for the attribute Emp_Skill which would have each individual skill for John Smith listed in a different row. When we do this our table is in 1st Normal Form (1NF). Many tables don't have multivalued attributes so you don't have to do anything for this step as the table is already in 1NF.
Partial Functional Dependencies: this occurs because of a general rule that we MUST have for all tables - each attribute in a table MUST be functionally dependent on the Primary Key attribute. In other words - the Primary Key determines what the VALUE of each other attribute has to be. For example, if in our PATIENT table we have our attribute Pat_SocSec as our primary key attribute - then the value of what's inside Pat_SocSec determines the values of each of the other attributes for that row. Patient instance 389567890 (social security number for patient X) - then the Pat_Lname HAS to be the last name of the person who has social security number 389567890. the primary key determines the values of all of the other attributes. This is part of relational algebra: A -> B (where A is the primary Key, and B stands for another attribute in the table. If we have a table with 5 attributes: A, B, C, D, E and A is our primary Key attribute, then A -> B, C, D, E (the value of A determines what the values of B, C, D, E MUST be).
Sometimes - we have a composite primary key (a primary key made up of more than one attribute) and therefore, it is possible that some of the attributes in the table are determined by all of the attributes that make up the primary key, or some attributes may be determined by on part (partial) of the primary key attributes. For example, if we have a table that has 7 attributes (A, B, C, D, E, F, G) and both attributes A and B together make up the primary key - then it's possible to have the following partial dependencies:
A,B -> C, D (A and B together determine the values of C and D)
B -> E (B by itself determines the value of E)
A -> F, G (A by itself determines the values of F, G)
This a the partial functional dependency problem. In order to have a table in 2nd normal form we need to do the following to fix the problem:
For every partial functional determinent (our primary key attributes that separately determine other attributes as in our example above), we create a NEW table that contains the primary key attribute and any attributes that it determines. So, where we started out with one table where we had a composite primary key, we will end up with multiple NEW tables.
For our example above our original table would have had the 7 attributes: A, B, C, D, E, F, G. When we create our new tables based on the partial functional dependences - we have three tables: one table A, B, C, D; another table B, E; and another table A, F, G. Notice that our 2nd table has primary key B (which will tie back to our table that has both A and B as primary keys - B is tied to B), and the 3rd table with attributes A, F, G with primary key A is also tied back to our table that has both A and B as primary keys - A is tied to A). It is okay to have part of the same primary keys in more than one table. So our tables are:
A -> B, C, D
B -> E
A -> F, G
We ONLY have a problem with partial functional dependencies IF we have a composite primary key - otherwise your table should be in 2nd Normal form (2NF).
Transitive Dependencies: transitive dependencies are a bit unusual in that it occurs when a non primary key attribute determines another non primary key attribute in your table. For example: if we have table with attributes A, B, C, D, E, F, G, H and the table is in 1NF and 2NF and A is our primary key, then
A -> B, C, D, E, F, G, H. However, as we examine the data that would go into this table we also see that attribute G also determines what the value of H MUST be. So, the table is in 1NF and 2NF, but we have a transitive dependency as well as G -> H.
In order to fix the transitive dependency problem we do the following:
We create a new table with the transitive dependency attributes. In our example above we would then have a table G, H where G is the primary key. But we don't want to lose the connectivity with our other attributes from our original table that would now have A, B, C, D, E, F as attributes. We MUST leave the transitive attribute - G in our original table so we have connectivity with the new table. By leaving the attribute G also in your original table - G becomes a Foreign Key - a primary key in one table and a non primary key in another table!!
Our two tables would be: A-> B, C, D, E, F, G
G -> H
It is possible that you don't have any transitive dependencies and thus your table is in 3NF
Last Step - Relating Tables:
the tables in a database need to be related to other tables in the database. This allows us to properly search our tables, insert properly into the tables, and delete rows from our tables. Tables are ONLY related (or tied together) through Keys - either where Primary Keys are the same, but more than likely through Primary Key to Foreign Key.
We go back and look at our original E/R diagram to see which tables (entities) were related to each other with our Cardinalities. We need to be sure that our tables in our database after normalization will also still be related (tied to each other).
We follow a process to use our cardinalities as follows:
1. If two entities have a 1 to 1 cardinality (either a mandatory or optional) on each side; take the primary key attribute from either table and put it as a Foreign key attribute in the other table
2. If two entities have a 1 to many (either a mandatory or optional) take the primary key from the "1" side and put it as a foreign key in the many
3. If two entities have a many to many relationship (mandatory or optional) you will need to create a NEW table taking the primary key from both tables and put putting them into the new table. They are the only attributes in the table and it is a composite primary key. Your two tables are tied together through this new table.
Foreign Keys are crucial in typing together tables!!
Addiing any additional attributes to your tables: You can add Date and Time attributes to any table if you believe you will need these. They might not have been in your original E/R diagram set of attributes, but you may add these to any table.
You have now created your database schema (design schema). A schema is a type of model. Our set of well structured relations (tables) is our design database schema.