Overview
After creating our logical model through the process of Normalization, we have our set of well-structured relations that we need to actually put into a database. The Physical Design will be to choose a particular database organization - in our case the Relational Database Model, decide on database sizes and usage and begin the process for "loading" the data into the database.
Physical Design
As per the review in Chapter 1 - there are two major types of file organizations - flat files and database. Most of the data before the 1980s was stored into flat files. These data files still exist today and are used all the time. The advent of the database models has grown substantially since it was first introduced. The most popular type of database model is the Relational Database Model. Hierarchy databases are still used, but not to a large extent. The network model has all but disappeared, and the object-oriented model only exists in a few places.
Getting Ready to Input Data
After getting the logical model completed with the set of well-structured relations (tables), there are THREE more steps to complete.
First, you need to examine all of your names of the attributes included in your relations. You need to be sure you don't have one of the following conditions:
a. synonyms - where two or more attributes having different names, yet have the same meaning
b. homonyms - where the same name is used for attributes that have different meanings
If you find any synonyms or homonyms, you need to correct the attribute names. You can also create an alias if two departments want to call the same data by different names (synonym). An alias is an alternate name that is deliberately allowed in the database. **Be sure you have the correct data-type for each attribute!!
Second, you need to determine what the storage size will be for your database. The size of the database is calculated by looking at the size of each relation (table). The size of a table is calculated as follows:
Add up the number of total "characters" for all of the attributes for one row. For example, if Cust_ID has 6 characters, Cust_lastname has 15 characters, Cust_firstname has 12 characters, Cust_street has 20 characters, Cust_city has 15 characters, Cust_state has 2 characters and Cust_zip has 5 characters (where characters could be alph or numeric), then the total number of characters for the row would be 75 characters.
Then you need to think about the data volume - how many rows will this table POSSIBLY have (you need to consider growth potential). If we determine that we will have 2000 customers, then the CUSTOMER table would have a total storage volume (in characters) of 2000 X 75 = 150,000. You need to do this calculation for EACH of your relations (tables) to get the TOTAL data volume for the database. **remember to include growth potential for the foreseable future.
Third, you need to examine the usage analysis for each table to be stored in the database. The usage analysis is where we try to figure out how many "hits" this particular table will have. "Hits" are either a query about data that's in the table, an insert of new rows, changing existing attribute data values, or deleting rows. You will need to have a good idea about the type of "hits" that the table will have, and how often the different types will occur.
In order to get an idea of this usage analysis, look at each table for a time frame of one month. Determine if this will be a very active table with a lot of hits, or a more inactive table with few hits. Calculate your "best guess" with looking at the table (for instance CUSTOMER and PRODUCT tables will be quite active) and determine how many of each type of "hit" will occur in a month's time. Do this for each table in your set of relations.
Then, after getting the total for each table for the month, prioritize your tables by their data usage with the tables with the most "hits" at the top of your list. You need this information for the physical storage of your tables. Tables that are very active are normally placed on physical storage where they can be accessed as quickly as possible. Tables that aren't as active can be placed in locations that might take more time to access. This prioritized information is given to your IT technical database support personnel who will establish where particular tables will be physically stored.
Physical Storage
The actual physical storage of the database follows most of the same principles that we had for flat files. We will have indexing structures to help find the data. We will have compression techniques for very large sets of data (such as pictures). Data integrity will be maintained. Missing data will need to be checked.
Terms such as:
1. physical record - the amount of data that is written out to the physical device at one time (the data is normally groups of rows (records)).
2. page - the amount of data read or written by an operating system in one secondary memory (disk) input or output I/O operation. For I/O with a magnetic tape, the equivalent term is record block
3. blocking factor - the number of physical records per page
As part of physical storage, we want to optimize the data retrieval process. As mentioned previously, we usually store tables that will be very active (with a lot of "hits") in physical places that can be access very quickly. However, it might be that only part of a table (only some rows or some attributes (columns)) are really active, while other parts of the table are not.
We can actually split up the tables either by rows or by columns for optimization of retrieval.
1. horizontal partitioning is where we distribute the rows of a table into several separate physical spaces (actually separate files). We will need to establish indexes to find where these separate partitions are stored. To the user, however, it is still one table.
2. vertical partitioning is where we distribute the columns of a table into several separate physical records - again like the other partitioning.
RAID - Redundant Arrays of Inexpensive Disks
Another way of physically storing the data is by using lots of parallel disks that can be accessed at the same time - either writing new data or retrieving data. RAIDs are a set, or array, of physical disk drives that appear to the database user (and programs) as if they form one large logical storage unit