One of the basic things about "files" is to understand that we have two general types of data files - flat files and databases. Definitions that you need to know are:
1. field - an individual piece of information - such as last name, first name, city, zip, number of dependents, etc.
2. record - a group of related fields - such as a record for STUDENT (all of the fields pertaining to student information) or COURSE (all of the fields pertaining to courses)
3. data file - a group of related records - the whole group of students at Kennesaw State University - there will be an individual record for each enrolled student - and the whole group of those records is the data file
We define the "type" of the flat file by two things:
a. How the records are physically stored in the file (data structure)
b. How we retrieve the records from the file
The types of flat files (data structures) are:
1. Sequential files - where one record is physically stored one right after the other (think old cassette tape) - to get to record number 6 - you have to physically "read" through records 1 through 5 to get there (no jumping); we retrieve the records one record at a time in order (sequentially); these can be stored on any physical device (tape, disk, drum).
2. Indexed files - records are usually stored in groups (called blocks) on disks or drums (not tape); Indexed files have two major parts - the Index (that points you to the beginning of a particular block of records) and the records themselves stored in blocks. An Indexed Sequential file is one where the blocks of records have the records sequentially stored using a "primary key" field (such as Social Security number, student record number, etc.). We retrieve the records by using the index to point to a particular group of records (block) and then sequentially go through the records until we find the one we want. MUCH faster to retrieve than sequential files (think CDs).
3. Direct files - these files are also stored on disks or drums (not tape) and have an index and records stored in blocks. However, in this case, we usually create the index to point directly to each record. More complex in some ways than the Indexed sequential file structure.
Each of these flat file types is still used by companies. Look at the material included in this module about additional information for these types of files.
One of the problems with flat files is that when a new program is created, developers usually create new flat files to support the program - instead of reusing files already created. This resulted in massive amounts of redundant data (for instance - the social security number (field) might be found in 10-15 different data files). To combat this and other problems, new data structures must be created - thus, Databases!!
Databases were created to eliminate many of the problems of traditional flat files. As there are different types of flat files (data structures - file organizations), there are also different types of databases.
"Hierarchy" Database | The file structure for this type of database was an indexed file. The largest hierarchy database was created by IBM - IMS and is still in existence today. |
"Network" Database | (nothing to do with networking). The file structure for this type of database was a linked list. This type of database did not last very long as the IBM hierarchy database structure took over |
Relational" Database | The file structure for this type of database was a two-dimensional table (rows and columns). This is the most common type of database structure. |
Object-Oriented Database | The file structure for this type of database is an object. |
NoSQL Database | A NoSQL database has a hierarchy similar to a file folder system, and the data within it is unstructured or non-relational. This lack of structure allows them to process larger amounts of data quickly and makes it easier to expand in the future. Cloud computing regularly makes use of NoSQL databases. |