Skip to main content

Guide to Database Systems: Introduction

Guide to Database Systems
Introduction
    • Notifications
    • Privacy
  • Project HomeGuide to Database Systems
  • Projects
  • Learn more about Manifold

Notes

Show the following:

  • Annotations
  • Resources
Search within:

Adjust appearance:

  • font
    Font style
  • color scheme
  • Margins
table of contents
  1. Preamble
    1. Preamble
    2. How to Use This Guide
    3. Planned Schedule
    4. Exams Yearbooks
    5. Typesetting and Acknowledgements
    6. Resources
    7. Copyright
  2. Introduction
    1. Introduction
    2. Resources
    3. The Need for a Specialized Tool
    4. Database
    5. Database Management System (DMBS)
    6. Subtasks
    7. Life of a Project
    8. An Example
    9. Characteristics of the Database Approach
    10. Exercises
    11. Solutions to Exercises
    12. Problems
    13. Solutions to Problems
  3. The Relational Model
    1. The Relational Model
    2. Resources
    3. Concepts
    4. Domains, Attributes, Tuples and Relations
    5. Constraints
    6. Keys
    7. Foreign Keys
    8. Example
    9. Transactions and Operations
    10. Exercises
    11. Solutions to Exercises
    12. Problems
    13. Solutions to Selected Problems
  4. The SQL Programming Language
    1. The SQL Programming Language
    2. Resources
    3. Actors
    4. First Commands
    5. Useful Commands
    6. Overview of Constraints
    7. Foreign Keys
    8. A First Look at Conditions
    9. Three-Valued Logic
    10. Various Tools
    11. More Select Queries
    12. More Procedures
    13. More Triggers
    14. Setting Up Your Work Environment
    15. Exercises
    16. Solutions to Exercises
    17. Problems
    18. Solutions to Selected Problems
  5. Designing a Good Database
    1. Designing a Good Database
    2. Resources
    3. Interest for High-Level Design
    4. Interest for High-Level Design
    5. ER to Relational Models Mapping
    6. Guidelines and Normal Form
    7. Unified Modeling Diagrams
    8. Exercises
    9. Solutions to Exercises
    10. Problems
    11. Solutions to Selected Problems
  6. Database Applications
    1. Database Applications
    2. Resources
    3. Overview
    4. Java's Way
    5. Flash Intro to Java
    6. A First Program
    7. Mapping Data Types
    8. Differences Between executeQuery, executeUpdate, and execute
    9. A Second Program
    10. Exercises
    11. Solutions to Exercises
    12. Problems
    13. Solutions to Selected Problems
  7. A Bit About Security
    1. A Bit About Security
    2. Usual Aspects
    3. SQL Injections
    4. Exercises
    5. Solutions to Exercises
    6. Problems
    7. Solutions to Selected Problems
  8. Presentation of NoSQL
    1. Presentation of NoSQL
    2. Resources
    3. A Bit of History
    4. Comparison
    5. Categories of NoSQL Systems
    6. MongoDB
    7. Principles
    8. Exercises
    9. Solutions to Exercises
    10. Problems
    11. Solutions to Selected Problems
  9. References

Introduction

Resources

  • (Elmasri and Navathe 2010, ch. 1.1-1.6)
  • (Elmasri and Navathe 2015, ch. 1.1–1.6)
  • (Watt and Eng 2014, ch. 2–3)

The Need for a Specialized Tool

There is a good chance that any programming language you can think of is Turing complete. Actually, even some of the extremely basic tools you may be using may be Turing complete. However, being complete does not mean being good at any task: it just means that any computable problem can be solved, but does not imply anything in terms of efficiency, comfort, or usability.

In theory, pretty much any programming language can be used to

  • Store, retrieve and update data,
  • Have accessible catalog describing the metadata,
  • Support transactions and concurrency,
  • Support authorization of access and update of data,
  • Enforce constraints.

But to obtain a system that is fast in reading and writing on the disk, convenient to search in the data, and that provides as many “built-in” tools as possible, one should use a specialized tool.

In those lecture notes, we will introduce one of this tool–the SQL programming language– and the theory underneath it–the relational model–. We will also observe that a careful design is a mandatory step before implementing a catalog, and that how good a catalog is can be assessed, and introduce the tools to do so. Finally, we will discuss how an application interacting with a database can be implemented and secured, and the alternatives to SQL offered by the NoSQL approach, as well as the limitations and highlights of both models.

Database

A database (DB) is a collection of related data. Data (= information, can be anything, really) + management (= logical organization of the data), through Database Management System.

  • Represent a mini-world, a “Universe of Disclosure” (UoD).
  • Logically coherent, with a meaning.
  • Populated for a purpose.

A DBMS has multiple components, as follows:

DBMS diagram

Note that

  • The program can be written in any language, be a web interface, etc. It is sometimes part of the software shipped with the DBMS, but not necessarily (you can, and we will, develop your own program to interact with the DBMS).
  • Most DBMS software includes a Command-Line Interface (CLI).
  • The catalog (or schema, meta-data) contains the description of how the data is stored, i.e., the datatypes, nature of the attributes, etc.
    The term “meta-data” has numerous definition (“data about the data”): we use it here to refer to the description of the organization of the data, and not e.g. to statistical data about the data.
  • Sometimes, catalog and data are closer than pictured (you can have “self-describing meta-data”, that is, they cannot be distinguished).

Database Management System (DBMS)

A DBMS contains a general purpose software that is used to

  1. Define (= datatype, constraints, structures, etc.)
  2. Construct / Create the data (= store the data)
  3. Manipulate / Maintain (= change the structure, query the data, update it, etc.)
  4. Share / Control access (= among users, applications)

You can think of a tool to

  1. Specify a storage unit,
  2. Fill it,
  3. Allow to change its content, as well as its organization,
  4. Allow multiple persons to access all or parts of it at the same time.

Subtasks

Exactly like a program can have

  • clients, that specify the requirements,
  • designers, that define the overall architecture of a program,
  • programmers, that implement the details of the program,
  • testers, that make sure the program is free of bugs, and
  • users, that actually use the program,

a DBMS offers multiple (sub)tasks and can be interacted with different persons with different roles.

RoleTask
ClientSpecify the business statement, the specifications
DB AdministratorInstall, configure, secure and maintain up-to-date the DBMS
DesignerLay out the global organization of the data
ProgrammerImplement the database, work on the programs that will interface with it
UserProvide, search, and edit the data (usually)

In those lecture notes, the main focus will be on design and implementation, but we will have to do a little bit of everything, without forgetting which role we are currently playing.

Life of a Project

From the business statement to the usage, a project generally follows one of this path:

diagram of the life of a project

Note that reverse-engineering can sometimes happen, i.e., if you are given a poor implementation and want to extract a relational model from it, to normalize it.

An Example

Let us consider the following:

STUDENT

NameStudent_numberClassMajor
Morgan182IT
Bob171CS

COURSE

Course_nameCourse_numberCredit_hoursDepartment
Intro. to CS13014CS
DB Systems34013CS
Principles of Scripting and Automation21203AIST

SECTION

Section_identifierCourse_numSemsterYearInstructor
29101301Fall2019Kate
92302103Spring2020Todd

GRADE_REPORT

Student_numberSection_identifierGrade
172910A
182910B

PREREQUISITE

Course_numberPrerequisite_number
21201301
13021301

You can describe the structure as a collection of relations, and a collection of columns:

RELATIONS

Relation NameNumber of Columns
STUDENT4
COURSE4
SECTION5
GRADE_REPORT3
PREREQUISITE2

COLUMNS

Column NameDatatypeBelongs to relation
NameStringSTUDENT
Student_numberIntegerSTUDENT
ClassStringSTUDENT
MajorStringSTUDENT
Course_nameStringSTUDENT
Course_numberIntegerCOURSE
Credit_hoursIntegerCOURSE
DepartmentStringCOURSE
………
Prerequisite_numberIntegerPREREQUISITE

Structure

  • Database structure and records, 5 files (=collection of records), each containing data records of the same type, stored in a persistent way.
  • Each record has a structure, different data elements, each has a data type.
  • Records have relationships between them (for instance, you expect the Course_number of PREREQUISITE to occur as a Course_number in COURSE).

Interactions

  • This organization will allow some interactions. For instance, we can obtain the answer to questions like
    “What is the name of the course whose number is 1301?”,
    “What courses is Kate teaching this semester?”,
    “Does Bob meets the pre-requisite for 2910?”
    Note that this last query is a bit different, as it forces us to look up information in multiple relations.

  • We should also be able to perform updates, removal, addition of records in an efficient way (using auxiliary files (indexes), optimization).
  • Finally, selection (for any operation) requires care: do we want all the records, some of them, exactly one?

Organization

Why are the files separated like that? Why do not we store the section with the course with the students? For multiple reasons:

  • To avoid redundancy (“data normalization”), or having it controlled,
  • To controle multiple levels of access (multiple user interface),
  • Without sacrificing the usability!

In separating the datae, we also need to remember to be careful about consistency and referential integrity, which is a topic we will discuss in detail.

How Is a Database Conceived?

  • Specification and analysis. “Each student number will be unique, but they can have the same name. We want to access the letter grade, but not the numerical grade”, etc. This gives the businnes statement.
  • Conceptual design
  • Logical design
  • Physical design

There is a gradation, from really abstract specification that is easy to modify, to more solidified description of what needs to be coded. When we will be discussing high-level models, we will come back to those notions. The global idea is that it is easier to move things around early in the conception, and harder once everything is implemented.

Characteristics of the Database Approach

  1. A database is more than just data: it also contains a complete description of the structure and constraints. We generally have a catalog (a.k.a. the meta-data, the schema) and the data (we can also have self-describing data, where meta-data and data are interleaved, but note that both are still present).
  2. Data-abstraction: A DBMS provides a conceptual representation, and hides implementation details. This implies that changing the internals of the database should not require to change the application (the DBMS) or the way any of the client (program, or CLI) was interacting with the data.
  3. Support of multiple views of the data: view is a subset of the database, or virtual data.
  4. Sharing and multiuser transaction processing: concurrency control using transactions (= series of instructions that is supposed to execute a logically correct database access if executed in its entirety). Isolation, atomicity (all or nothing): cf. the ACID principles.

Exercises

Exercise 1.1

What is the difference between a database and the meta-data of the database?

Exercise 1.2

Is a pile of trash a database? Why, or why not?

Exercise 1.3

Define the word “miniworld”.

Exercise 1.4

Expand the acronym “DBMS”.

Exercise 1.5

Name two DBMS.

Exercise 1.6

Name the four different kinds of action that can be performed on data.

Exercise 1.7

Assign each of the following task to one of the “character” (administrator, client, etc.) we introduced:

TaskAssigned to
Install a DBMS on a server.
Sketch the schema so that the data will not be redundant.
Write client-side application that uses the DBMS API.
Establish the purpose of the database.

Exercise 1.8

List some of the tasks assigned to the Database Administrator.

Exercise 1.9

Why do DBMS include concurrency control?

Exercise 1.10

Do I have to change my DBMS if I want to change the structure of my data?

Exercise 1.11

Exercise 1.12

Assume that I have a file where one record corresponds to one student. Should the information about the classes a student is taking (e.g. room, instructor, code, etc.) being stored in the same file? Why, or why not?

Exercise 1.13

Which one comes first, the physical design, the conceptual design, or the logical design?

Exercise 1.14

What is a virtual data? How can I access it?

Solution to Exercises

Solution 1.1

The data is the information we want to store, the meta-data is its organization, how we are going to store it. Meta-data is information about the data, but of no use on its own.

Solution 1.2

No, because it lacks a logical structure.

Solution 1.3

The mini-world is the part of the universe we want to represent in the database. It is supposed to be meaningful and will serve a purpose.

Solution 1.4

Database Management System

Solution 1.5

Oracle RDBMS, IBM DB2, Microsoft SQL Server, MySQL, PostgreSQL, Microsoft Access, etc., are valid answers. Are not valid “SQL”, “NoSQL”, “Relational Model”, or such: we are asking for the names of actual softwares!

Solution 1.6

The four actions are:

  • Add / Insert
  • Update / Modify
  • Search / Query
  • Delete / Remove

Solution 1.7

We can have something like:

TaskAssigned to
Install a DBMS on a server.Administrator, IT service
Sketch the schema so that the data will not be redundant.Designer
Write client-side application that uses the DBMS API.Programmer, Developer
Establish the purpose of the database.Client, business owner

Solution 1.8

The database administrator is in charge of installing, configuring, securing and keeping up-to-date the database management system. They also control the accesses and the performance of the system, troubleshoot it, and create backup of the data.

Solution 1.9

DBMS have concurrency control to ensure that several users trying to update the same data will do so in a controlled manner. It is to avoid inconsistency to appear in the data.

Solution 1.10

Normally no, data and programs are independent. But actually, this is true only if the model does not change: shifting to a “less structured model”, e.g., one of the NoSQL models, can require to change the DBMS.

Solution 1.11

The application should not be sensible to the “internals” of the definition and organization of the data. It matters because having this independence means that changing the data will not require to change the programs.

Solution 1.12

If we were to store all the information about the classes in the student records, then we would have to store it as many time as its number of students! It is better to store it in a different file, and then to “link” the two files, to avoid redundancy.

Solution 1.13

The conceptual design.

Solution 1.14

It is a set of information that is derived from the database but not directly stored in it. It is accessed through queries. For instance, we can infer the age of a person if their date of birth is in the database, but strictly speaking the age is not an information stored in the database.

What is independence between program and data? Why does it matter?

Problems

Problem 1.1 (Define a database for CAMPUS)

Define a CAMPUS database organized into three files as follows:

  • A BUILDING file storing the name and GPS coordinates of each building.
  • A ROOM file storing the building, number and floor of each room.
  • A PROF file storing the name, phone number, email and room number where the office is located for each professor.

Pb 1.1 – Question 1

A database catalog is made of two part: a table containing the relations’ name and their number of columns, and a table containing the columns’ name, their data type, and the relation to which they belong. Refer to the example we made previously or consult, e.g., (Elmasri and Navathe 2010, Figure 1.3) or (Elmasri and Navathe 2015, Figure 1.3). Write the database catalog corresponding to the CAMPUS database.

Pb 1.1 – Question 2

Invent data for such a database, with two buildings, three rooms and two professors.

Pb 1.1 – Question 3

Answer the following, assuming all the knowledge you have of the situation comes from the CAMPUS database, which is an up-to-date and accurate representation of its miniworld:

  • Is it possible to list all the professors?
  • Is it possible to tell in which department is a professor?
  • Is it possible to get the office hours of a professor?
  • Is it possible to list all the professors whose offices are in the same building?
  • Is it possible to list all the rooms?
  • If a new professor arrives, and has to share his office with another professor, do you have to revise your database catalog?
  • Can you list which professors are at the same floor?
  • Can you tell which professor has the highest evaluations?

Solutions to Selected Problems

Solution to Problem 1.1 (Define a database for CAMPUS)

Pb 1.1 – Solution to Q. 1

The database catalog should be similar to the following:

RELATIONS

Relation nameNumber of columns
BUILDING3
ROOM3
PROF4

COLUMNS

Column nameDatatypeBelongs to relation
Building_NameCharacter(30)Building
GPSLatDecimal(9,6)Building
GPSLonDecimal(9,6)Building
Building_NameCharacter(30)ROOM
Room_NumberInteger(1)ROOM
FloorInteger (1)ROOM
Prof_NameCharacter (30)PROF
PhoneInteger (10)PROF
EmailCharacter (30)PROF
Room_NumberInteger (1)PROF

Pb 1.1 – Solution to Q. 2

For the data, you could have:

  • For the BUILDING file, we could have:
    (Allgood Hall, 33.47520, -82.02503)
    (Institut Galilé,   48.959001, 2.339999)
  • For the ROOM file, we could have:
    (Allgood Hall, 128, 1)
    (Institut Galilé, 205, 3)
    (Allgood Hall, 228, 2)
  • For the PROF file, we could have:
    (Aubert, 839401, dae@ipn.net, 128)
    (Mazza, 938130, Dm@fai.net, 205)

Pb 1.1 – Solution to Q. 3

If everything we knew about the campus came from that database, then

  • Yes, we could list all the professors.
  • No, we could not tell in which department is a professor.
  • No, we could not get the office hours of a professor.
  • Yes, we could list all the professors whose offices are in the same building.
  • Yes, we could list all the rooms.
  • If a new professor arrives, and has to share his office with another professor, we would not have to revise our database catalog (it is fine for two professor to have the same room number, in our model).
  • Yes, we could list which professors are at the same floor.
  • No, we could not tell which professor has the highest evaluations.

Annotate

Next Chapter
The Relational Model
PreviousNext
This text is licensed under a CC BY 4.0 license.
Powered by Manifold Scholarship. Learn more at
Opens in new tab or windowmanifoldapp.org