Introduction

Resources

The Need for a Specialized Tool

There is a good chance that any programming language you can think of is Turing complete. Actually, even some of the extremely basic tools you may be using may be Turing complete. However, being complete does not mean being good at any task: it just means that any computable problem can be solved, but does not imply anything in terms of efficiency, comfort, or usability.

In theory, pretty much any programming language can be used to

But to obtain a system that is fast in reading and writing on the disk, convenient to search in the data, and that provides as many “built-in” tools as possible, one should use a specialized tool.

In those lecture notes, we will introduce one of this tool–the SQL programming language– and the theory underneath it–the relational model–. We will also observe that a careful design is a mandatory step before implementing a catalog, and that how good a catalog is can be assessed, and introduce the tools to do so. Finally, we will discuss how an application interacting with a database can be implemented and secured, and the alternatives to SQL offered by the NoSQL approach, as well as the limitations and highlights of both models.

Database

A database (DB) is a collection of related data. Data (= information, can be anything, really) + management (= logical organization of the data), through Database Management System.

A DBMS has multiple components, as follows:

DBMS diagram

Note that

Database Management System (DBMS)

A DBMS contains a general purpose software that is used to

  1. Define (= datatype, constraints, structures, etc.)
  2. Construct / Create the data (= store the data)
  3. Manipulate / Maintain (= change the structure, query the data, update it, etc.)
  4. Share / Control access (= among users, applications)

You can think of a tool to

  1. Specify a storage unit,
  2. Fill it,
  3. Allow to change its content, as well as its organization,
  4. Allow multiple persons to access all or parts of it at the same time.

Subtasks

Exactly like a program can have

a DBMS offers multiple (sub)tasks and can be interacted with different persons with different roles.

Role Task
Client Specify the business statement, the specifications
DB Administrator Install, configure, secure and maintain up-to-date the DBMS
Designer Lay out the global organization of the data
Programmer Implement the database, work on the programs that will interface with it
User Provide, search, and edit the data (usually)

In those lecture notes, the main focus will be on design and implementation, but we will have to do a little bit of everything, without forgetting which role we are currently playing.

Life of a Project

From the business statement to the usage, a project generally follows one of this path:

diagram of the life of a project

Note that reverse-engineering can sometimes happen, i.e., if you are given a poor implementation and want to extract a relational model from it, to normalize it.

An Example

Let us consider the following:

STUDENT

Name Student_number Class Major
Morgan 18 2 IT
Bob 17 1 CS

COURSE

Course_name Course_number Credit_hours Department
Intro. to CS 1301 4 CS
DB Systems 3401 3 CS
Principles of Scripting and Automation 2120 3 AIST

SECTION

Section_identifier Course_num Semster Year Instructor
2910 1301 Fall 2019 Kate
9230 2103 Spring 2020 Todd

GRADE_REPORT

Student_number Section_identifier Grade
17 2910 A
18 2910 B

PREREQUISITE

Course_number Prerequisite_number
2120 1301
1302 1301

You can describe the structure as a collection of relations, and a collection of columns:

RELATIONS

Relation Name Number of Columns
STUDENT 4
COURSE 4
SECTION 5
GRADE_REPORT 3
PREREQUISITE 2

COLUMNS

Column Name Datatype Belongs to relation
Name String STUDENT
Student_number Integer STUDENT
Class String STUDENT
Major String STUDENT
Course_name String STUDENT
Course_number Integer COURSE
Credit_hours Integer COURSE
Department String COURSE
Prerequisite_number Integer PREREQUISITE

Structure

Interactions

Organization

Why are the files separated like that? Why do not we store the section with the course with the students? For multiple reasons:

In separating the datae, we also need to remember to be careful about consistency and referential integrity, which is a topic we will discuss in detail.

How Is a Database Conceived?

There is a gradation, from really abstract specification that is easy to modify, to more solidified description of what needs to be coded. When we will be discussing high-level models, we will come back to those notions. The global idea is that it is easier to move things around early in the conception, and harder once everything is implemented.

Characteristics of the Database Approach

  1. A database is more than just data: it also contains a complete description of the structure and constraints. We generally have a catalog (a.k.a. the meta-data, the schema) and the data (we can also have self-describing data, where meta-data and data are interleaved, but note that both are still present).
  2. Data-abstraction: A DBMS provides a conceptual representation, and hides implementation details. This implies that changing the internals of the database should not require to change the application (the DBMS) or the way any of the client (program, or CLI) was interacting with the data.
  3. Support of multiple views of the data: view is a subset of the database, or virtual data.
  4. Sharing and multiuser transaction processing: concurrency control using transactions (= series of instructions that is supposed to execute a logically correct database access if executed in its entirety). Isolation, atomicity (all or nothing): cf. the ACID principles.

Exercises

Exercise 1.1

What is the difference between a database and the meta-data of the database?

Exercise 1.2

Is a pile of trash a database? Why, or why not?

Exercise 1.3

Define the word “miniworld”.

Exercise 1.4

Expand the acronym “DBMS”.

Exercise 1.5

Name two DBMS.

Exercise 1.6

Name the four different kinds of action that can be performed on data.

Exercise 1.7

Assign each of the following task to one of the “character” (administrator, client, etc.) we introduced:

Task Assigned to
Install a DBMS on a server.  
Sketch the schema so that the data will not be redundant.  
Write client-side application that uses the DBMS API.  
Establish the purpose of the database.  

Exercise 1.8

List some of the tasks assigned to the Database Administrator.

Exercise 1.9

Why do DBMS include concurrency control?

Exercise 1.10

Do I have to change my DBMS if I want to change the structure of my data?

Exercise 1.11

Exercise 1.12

Assume that I have a file where one record corresponds to one student. Should the information about the classes a student is taking (e.g. room, instructor, code, etc.) being stored in the same file? Why, or why not?

Exercise 1.13

Which one comes first, the physical design, the conceptual design, or the logical design?

Exercise 1.14

What is a virtual data? How can I access it?

Solution to Exercises

Solution 1.1

The data is the information we want to store, the meta-data is its organization, how we are going to store it. Meta-data is information about the data, but of no use on its own.

Solution 1.2

No, because it lacks a logical structure.

Solution 1.3

The mini-world is the part of the universe we want to represent in the database. It is supposed to be meaningful and will serve a purpose.

Solution 1.4

Database Management System

Solution 1.5

Oracle RDBMS, IBM DB2, Microsoft SQL Server, MySQL, PostgreSQL, Microsoft Access, etc., are valid answers. Are not valid “SQL”, “NoSQL”, “Relational Model”, or such: we are asking for the names of actual softwares!

Solution 1.6

The four actions are:

Solution 1.7

We can have something like:

Task Assigned to
Install a DBMS on a server. Administrator, IT service
Sketch the schema so that the data will not be redundant. Designer
Write client-side application that uses the DBMS API. Programmer, Developer
Establish the purpose of the database. Client, business owner

Solution 1.8

The database administrator is in charge of installing, configuring, securing and keeping up-to-date the database management system. They also control the accesses and the performance of the system, troubleshoot it, and create backup of the data.

Solution 1.9

DBMS have concurrency control to ensure that several users trying to update the same data will do so in a controlled manner. It is to avoid inconsistency to appear in the data.

Solution 1.10

Normally no, data and programs are independent. But actually, this is true only if the model does not change: shifting to a “less structured model”, e.g., one of the NoSQL models, can require to change the DBMS.

Solution 1.11

The application should not be sensible to the “internals” of the definition and organization of the data. It matters because having this independence means that changing the data will not require to change the programs.

Solution 1.12

If we were to store all the information about the classes in the student records, then we would have to store it as many time as its number of students! It is better to store it in a different file, and then to “link” the two files, to avoid redundancy.

Solution 1.13

The conceptual design.

Solution 1.14

It is a set of information that is derived from the database but not directly stored in it. It is accessed through queries. For instance, we can infer the age of a person if their date of birth is in the database, but strictly speaking the age is not an information stored in the database.

What is independence between program and data? Why does it matter?

Problems

Problem 1.1 (Define a database for CAMPUS)

Define a CAMPUS database organized into three files as follows:

Pb 1.1 – Question 1

A database catalog is made of two part: a table containing the relations’ name and their number of columns, and a table containing the columns’ name, their data type, and the relation to which they belong. Refer to the example we made previously or consult, e.g., (Elmasri and Navathe 2010, Figure 1.3) or (Elmasri and Navathe 2015, Figure 1.3). Write the database catalog corresponding to the CAMPUS database.

Pb 1.1 – Question 2

Invent data for such a database, with two buildings, three rooms and two professors.

Pb 1.1 – Question 3

Answer the following, assuming all the knowledge you have of the situation comes from the CAMPUS database, which is an up-to-date and accurate representation of its miniworld:

Solutions to Selected Problems

Solution to Problem 1.1 (Define a database for CAMPUS)

Pb 1.1 – Solution to Q. 1

The database catalog should be similar to the following:

RELATIONS

Relation name Number of columns
BUILDING 3
ROOM 3
PROF 4

COLUMNS

Column name Datatype Belongs to relation
Building_Name Character(30) Building
GPSLat Decimal(9,6) Building
GPSLon Decimal(9,6) Building
Building_Name Character(30) ROOM
Room_Number Integer(1) ROOM
Floor Integer (1) ROOM
Prof_Name Character (30) PROF
Phone Integer (10) PROF
Email Character (30) PROF
Room_Number Integer (1) PROF

Pb 1.1 – Solution to Q. 2

For the data, you could have:

Pb 1.1 – Solution to Q. 3

If everything we knew about the campus came from that database, then