There is a good chance that any programming language you can think of is Turing complete. Actually, even some of the extremely basic tools you may be using may be Turing complete. However, being complete does not mean being good at any task: it just means that any computable problem can be solved, but does not imply anything in terms of efficiency, comfort, or usability.
In theory, pretty much any programming language can be used to
But to obtain a system that is fast in reading and writing on the disk, convenient to search in the data, and that provides as many “built-in” tools as possible, one should use a specialized tool.
In those lecture notes, we will introduce one of this tool–the SQL programming language– and the theory underneath it–the relational model–. We will also observe that a careful design is a mandatory step before implementing a catalog, and that how good a catalog is can be assessed, and introduce the tools to do so. Finally, we will discuss how an application interacting with a database can be implemented and secured, and the alternatives to SQL offered by the NoSQL approach, as well as the limitations and highlights of both models.
A database (DB) is a collection of related data. Data (= information, can be anything, really) + management (= logical organization of the data), through Database Management System.
A DBMS has multiple components, as follows:
Note that
The term “meta-data” has numerous definition (“data about the data”): we use it here to refer to the description of the organization of the data, and not e.g. to statistical data about the data.
A DBMS contains a general purpose software that is used to
You can think of a tool to
Exactly like a program can have
a DBMS offers multiple (sub)tasks and can be interacted with different persons with different roles.
Role | Task |
---|---|
Client | Specify the business statement, the specifications |
DB Administrator | Install, configure, secure and maintain up-to-date the DBMS |
Designer | Lay out the global organization of the data |
Programmer | Implement the database, work on the programs that will interface with it |
User | Provide, search, and edit the data (usually) |
In those lecture notes, the main focus will be on design and implementation, but we will have to do a little bit of everything, without forgetting which role we are currently playing.
From the business statement to the usage, a project generally follows one of this path:
Note that reverse-engineering can sometimes happen, i.e., if you are given a poor implementation and want to extract a relational model from it, to normalize it.
Let us consider the following:
STUDENT
Name | Student_number | Class | Major |
---|---|---|---|
Morgan | 18 | 2 | IT |
Bob | 17 | 1 | CS |
COURSE
Course_name | Course_number | Credit_hours | Department |
---|---|---|---|
Intro. to CS | 1301 | 4 | CS |
DB Systems | 3401 | 3 | CS |
Principles of Scripting and Automation | 2120 | 3 | AIST |
SECTION
Section_identifier | Course_num | Semster | Year | Instructor |
---|---|---|---|---|
2910 | 1301 | Fall | 2019 | Kate |
9230 | 2103 | Spring | 2020 | Todd |
GRADE_REPORT
Student_number | Section_identifier | Grade |
---|---|---|
17 | 2910 | A |
18 | 2910 | B |
PREREQUISITE
Course_number | Prerequisite_number |
---|---|
2120 | 1301 |
1302 | 1301 |
You can describe the structure as a collection of relations, and a collection of columns:
RELATIONS
Relation Name | Number of Columns |
---|---|
STUDENT | 4 |
COURSE | 4 |
SECTION | 5 |
GRADE_REPORT | 3 |
PREREQUISITE | 2 |
COLUMNS
Column Name | Datatype | Belongs to relation |
---|---|---|
Name | String | STUDENT |
Student_number | Integer | STUDENT |
Class | String | STUDENT |
Major | String | STUDENT |
Course_name | String | STUDENT |
Course_number | Integer | COURSE |
Credit_hours | Integer | COURSE |
Department | String | COURSE |
… | … | … |
Prerequisite_number | Integer | PREREQUISITE |
“What is the name of the course whose number is 1301?”,Note that this last query is a bit different, as it forces us to look up information in multiple relations.
“What courses is Kate teaching this semester?”,
“Does Bob meets the pre-requisite for 2910?”
Why are the files separated like that? Why do not we store the section with the course with the students? For multiple reasons:
In separating the datae, we also need to remember to be careful about consistency and referential integrity, which is a topic we will discuss in detail.
There is a gradation, from really abstract specification that is easy to modify, to more solidified description of what needs to be coded. When we will be discussing high-level models, we will come back to those notions. The global idea is that it is easier to move things around early in the conception, and harder once everything is implemented.
Exercise 1.1
What is the difference between a database and the meta-data of the database?
Exercise 1.2
Is a pile of trash a database? Why, or why not?
Exercise 1.3
Define the word “miniworld”.
Exercise 1.4
Expand the acronym “DBMS”.
Exercise 1.5
Name two DBMS.
Exercise 1.6
Name the four different kinds of action that can be performed on data.
Exercise 1.7
Assign each of the following task to one of the “character” (administrator, client, etc.) we introduced:
Task | Assigned to |
---|---|
Install a DBMS on a server. | |
Sketch the schema so that the data will not be redundant. | |
Write client-side application that uses the DBMS API. | |
Establish the purpose of the database. |
Exercise 1.8
List some of the tasks assigned to the Database Administrator.
Exercise 1.9
Why do DBMS include concurrency control?
Exercise 1.10
Do I have to change my DBMS if I want to change the structure of my data?
Exercise 1.11
Exercise 1.12
Assume that I have a file where one record corresponds to one student. Should the information about the classes a student is taking (e.g. room, instructor, code, etc.) being stored in the same file? Why, or why not?
Exercise 1.13
Which one comes first, the physical design, the conceptual design, or the logical design?
Exercise 1.14
What is a virtual data? How can I access it?
Solution 1.1
The data is the information we want to store, the meta-data is its organization, how we are going to store it. Meta-data is information about the data, but of no use on its own.
Solution 1.2
No, because it lacks a logical structure.
Solution 1.3
The mini-world is the part of the universe we want to represent in the database. It is supposed to be meaningful and will serve a purpose.
Solution 1.4
Database Management System
Solution 1.5
Oracle RDBMS, IBM DB2, Microsoft SQL Server, MySQL, PostgreSQL, Microsoft Access, etc., are valid answers. Are not valid “SQL”, “NoSQL”, “Relational Model”, or such: we are asking for the names of actual softwares!
Solution 1.6
The four actions are:
Solution 1.7
We can have something like:
Task | Assigned to |
---|---|
Install a DBMS on a server. | Administrator, IT service |
Sketch the schema so that the data will not be redundant. | Designer |
Write client-side application that uses the DBMS API. | Programmer, Developer |
Establish the purpose of the database. | Client, business owner |
Solution 1.8
The database administrator is in charge of installing, configuring, securing and keeping up-to-date the database management system. They also control the accesses and the performance of the system, troubleshoot it, and create backup of the data.
Solution 1.9
DBMS have concurrency control to ensure that several users trying to update the same data will do so in a controlled manner. It is to avoid inconsistency to appear in the data.
Solution 1.10
Normally no, data and programs are independent. But actually, this is true only if the model does not change: shifting to a “less structured model”, e.g., one of the NoSQL models, can require to change the DBMS.
Solution 1.11
The application should not be sensible to the “internals” of the definition and organization of the data. It matters because having this independence means that changing the data will not require to change the programs.
Solution 1.12
If we were to store all the information about the classes in the student records, then we would have to store it as many time as its number of students! It is better to store it in a different file, and then to “link” the two files, to avoid redundancy.
Solution 1.13
The conceptual design.
Solution 1.14
It is a set of information that is derived from the database but not directly stored in it. It is accessed through queries. For instance, we can infer the age of a person if their date of birth is in the database, but strictly speaking the age is not an information stored in the database.
What is independence between program and data? Why does it matter?
Problem 1.1 (Define a database for CAMPUS
)
Define a CAMPUS
database organized into three files as follows:
BUILDING
file storing the name and GPS coordinates of each building.ROOM
file storing the building, number and floor of each room.PROF
file storing the name, phone number, email and room number where the office is located for each professor.Pb 1.1 – Question 1
A database catalog is made of two part: a table containing the relations’ name and their number of columns, and a table containing the columns’ name, their data type, and the relation to which they belong. Refer to the example we made previously or consult, e.g., (Elmasri and Navathe 2010, Figure 1.3) or (Elmasri and Navathe 2015, Figure 1.3). Write the database catalog corresponding to the CAMPUS
database.
Pb 1.1 – Question 2
Invent data for such a database, with two buildings, three rooms and two professors.
Pb 1.1 – Question 3
Answer the following, assuming all the knowledge you have of the situation comes from the CAMPUS
database, which is an up-to-date and accurate representation of its miniworld:
Solution to Problem 1.1 (Define a database for CAMPUS
)
Pb 1.1 – Solution to Q. 1
The database catalog should be similar to the following:
RELATIONS
Relation name | Number of columns |
---|---|
BUILDING |
3 |
ROOM |
3 |
PROF |
4 |
COLUMNS
Column name | Datatype | Belongs to relation |
---|---|---|
Building_Name | Character(30) | Building |
GPSLat | Decimal(9,6) | Building |
GPSLon | Decimal(9,6) | Building |
Building_Name | Character(30) | ROOM |
Room_Number | Integer(1) | ROOM |
Floor | Integer (1) | ROOM |
Prof_Name | Character (30) | PROF |
Phone | Integer (10) | PROF |
Character (30) | PROF |
|
Room_Number | Integer (1) | PROF |
Pb 1.1 – Solution to Q. 2
For the data, you could have:
BUILDING
file, we could have:(Allgood Hall, 33.47520, -82.02503) (Institut Galilé, 48.959001, 2.339999)
ROOM
file, we could have:(Allgood Hall, 128, 1) (Institut Galilé, 205, 3) (Allgood Hall, 228, 2)
PROF
file, we could have:(Aubert, 839401, dae@ipn.net, 128) (Mazza, 938130, Dm@fai.net, 205)
Pb 1.1 – Solution to Q. 3
If everything we knew about the campus came from that database, then