Skip to main content

Guide to Database Systems: The Relational Model

Guide to Database Systems
The Relational Model
    • Notifications
    • Privacy
  • Project HomeGuide to Database Systems
  • Projects
  • Learn more about Manifold

Notes

Show the following:

  • Annotations
  • Resources
Search within:

Adjust appearance:

  • font
    Font style
  • color scheme
  • Margins
table of contents
  1. Preamble
    1. Preamble
    2. How to Use This Guide
    3. Planned Schedule
    4. Exams Yearbooks
    5. Typesetting and Acknowledgements
    6. Resources
    7. Copyright
  2. Introduction
    1. Introduction
    2. Resources
    3. The Need for a Specialized Tool
    4. Database
    5. Database Management System (DMBS)
    6. Subtasks
    7. Life of a Project
    8. An Example
    9. Characteristics of the Database Approach
    10. Exercises
    11. Solutions to Exercises
    12. Problems
    13. Solutions to Problems
  3. The Relational Model
    1. The Relational Model
    2. Resources
    3. Concepts
    4. Domains, Attributes, Tuples and Relations
    5. Constraints
    6. Keys
    7. Foreign Keys
    8. Example
    9. Transactions and Operations
    10. Exercises
    11. Solutions to Exercises
    12. Problems
    13. Solutions to Selected Problems
  4. The SQL Programming Language
    1. The SQL Programming Language
    2. Resources
    3. Actors
    4. First Commands
    5. Useful Commands
    6. Overview of Constraints
    7. Foreign Keys
    8. A First Look at Conditions
    9. Three-Valued Logic
    10. Various Tools
    11. More Select Queries
    12. More Procedures
    13. More Triggers
    14. Setting Up Your Work Environment
    15. Exercises
    16. Solutions to Exercises
    17. Problems
    18. Solutions to Selected Problems
  5. Designing a Good Database
    1. Designing a Good Database
    2. Resources
    3. Interest for High-Level Design
    4. Interest for High-Level Design
    5. ER to Relational Models Mapping
    6. Guidelines and Normal Form
    7. Unified Modeling Diagrams
    8. Exercises
    9. Solutions to Exercises
    10. Problems
    11. Solutions to Selected Problems
  6. Database Applications
    1. Database Applications
    2. Resources
    3. Overview
    4. Java's Way
    5. Flash Intro to Java
    6. A First Program
    7. Mapping Data Types
    8. Differences Between executeQuery, executeUpdate, and execute
    9. A Second Program
    10. Exercises
    11. Solutions to Exercises
    12. Problems
    13. Solutions to Selected Problems
  7. A Bit About Security
    1. A Bit About Security
    2. Usual Aspects
    3. SQL Injections
    4. Exercises
    5. Solutions to Exercises
    6. Problems
    7. Solutions to Selected Problems
  8. Presentation of NoSQL
    1. Presentation of NoSQL
    2. Resources
    3. A Bit of History
    4. Comparison
    5. Categories of NoSQL Systems
    6. MongoDB
    7. Principles
    8. Exercises
    9. Solutions to Exercises
    10. Problems
    11. Solutions to Selected Problems
  9. References

The Relational Model

Resources

  • (Elmasri and Navathe 2010, ch. 3), (Elmasri and Navathe 2015, ch. 5), including the exercises (look at the exercises 15 and 16, for instance).
  • The wikipedia page for Relational model and the category “Relational database management systems”.

Concepts

relational model diagram

Terminology

The relational data model (or relational database schema) is:

  • a mathematical model (use mathematical relations, set-theory, first-order predicate logic)
  • with multiple implementations (“engineering approximation”)

Domains, Attributes, Tuples and Relations

Definitions

  • Domain (or type) = set of atomic (as far as the relation is concerned) values. You can compare it to datatype and literals, and indeed it can be given in the form of a data type, but it can be named and carry a logical definition (i.e., List_of_major as an enumerated data type, instead of just String), enforce some constraints (i.e., UNIQUE, to force all the values to be different), or even have a default value.
  • Attribute = Attribute name + attribute domain (but we’ll just write the name).
  • Relation Schema (or scheme) = description of a relation, often written “RELATION_NAME(Attribute1, …, Attributen)”, where n is the degre (arity) of the relation, and the domain of Attributei is written dom(Attributei).
  • Tuple t of the schema R(A1, …, An) is an ordered list of values <v1, …, vn> where vi is in dom(Ai) or a special NULL value.
  • Relation (or relation state) r of the schema R(A1, …, An), also written r(R), is the set of n-tuples t1, …, tm where each ti is a tuple of the schema R(A1, …, An).

Characteristics of Relations

  • In a relation, the order of tuples does not matter (a relation is a set). Order in tuple do matter (alternate representation where this is not true exist, cf. self-describing data).
  • Value is atomic = “flat relational model”, we will always be in the first normal form (not composite, not multi-valued).
  • NULL is N/A, unknown, unavailable (or withheld).
  • While a relation schema is to be read like an assertion (e.g., “Every student has a name, a SSN, …”) a tuple is a fact (e.g., “The student Bob Taylor has SSN 12898, …”).
  • Relations represents uniformly entities (STUDENT(…)) and relations (PREREQUISITE(Course_number, Prerequisite_number)).

Notation

  • STUDENT = relation schema + current relation state
  • STUDENT(Name, …, Major) = relation schema only
  • STUDENT.Name = Attribute Name in the relation STUDENT
  • t[Name], t[Name, Major], t.Name (overloading the previous notation) for the value of Name (and Major) in the tuple t.

Constraints

We now study constraints on the tuples. There are constraints on the scheme, for instance, “a relation cannot have two attributes with the same name”, but we studied those already. The goal of those constraints is to maintain the validity of the relations, and to enforce particular connexions between relations.

Inherent Model-Based Constraints (implicit)

Those are part of the definition of the relational model and are independent of the particular relation we are looking at.

  • You can not have two identical tuples in the same relation,
  • The arity of the tuple must match the arity of the relation.

Schema-Based Constraints (explicit)

Those constraints are parts of the schema.

  • The value must match its domain (“Domain constraint”), knowing that a domain can have additional constraints (NOT NULL, UNIQUE).
  • The entity integrity constraint: no primary key value can be NULL.
  • The referential integrity constraint: referred values must exist.

Those last two constraints will be studied in the next section.

Application-Based Constraints (semantics)

Constraints that cannot be expressed in the schema, and hence must be enforced by

  • the application program,
  • or the database itself, using triggers or assertions.

Examples: “the age of an employee must be greater than 16”, “this year’s salary increase must be more than last year’s”.

Keys

Since we can not have two identical tuples in the same relation, there must be a subset of values that distinguish them. We study the corresponding subset of attributes.

  • A superkey is the subset of attributes for which no two tuples have the same values. Formally, the set of attributes SK is a superkey for the relation R, if for all relation state r of R, all tuples t1, t2 in r are such that t1[SK] ≠ t2[SK].
  • A key is a minimal superkey (i.e., removing any attribute from SK would break the uniqueness property).
  • A candidate key is a key, a primary key is the selected candidate key (it is underlined).

Let us consider the following example:

ABCD
YellowSquare10(5, 3)
BlueRectangle10(3, 9)
BlueCircle9(4, 6)

and the following sets of attributes:

{A, B, C, D}{A}{B, C}{D}
Superkey?✔✘✔✔
Key?✘✘✔✔

Note that here we “retro-fit” those definitions, in database design, they come first (i.e., you define what attributes should always distinguish between tuples before populating your database). We are making the assumption that the data pre-exist to the specification to make the concept clearer.

Foreign Keys

A foreign key (FK) is a set of attributes whose values must match the value in a tuple in another, pre-defined relation. Formally, the set of attributes FK in the relation schema R1 is a foreign key of R1 (“referencing relation”) that references R2 (“referenced relation”) if

  • FK refers to R2 (i.e., the attributes in FK have the same domain(s) as the primary key PK of R2)
  • a value of FK in a tuple t1 of r1(R1) either
    • occurs as a value of PK for some tuple t2 of r2(R2), i.e., t1[FK] = t2[PK]
    • is NULL
    in the first case, we say that “t1 refers to t2”.

If there is a foreign key from R1 to R2, then we say that there is a referential integrity constraint from R1 to R2. We draw it with an arrow from the FK to the PK. Note that it is possible that R1 = R2.

Example

diagram
  • In CAR, VIN is the primary key, so it must satisfy the entity integrity constraint, and its value can not be NULL. Note also that all the values must be different, as the same value cannot occur twice as the primary key of tuples: we don’t want to enter the same VIN twice, that would mean we are registering a car that was already registered in our database!
  • In DRIVER, State and License-num are the primary key, so they must together satisfy the intergrity constrant: neither of them can be NULL. Furthermore, their pair must be different from all the other values. Stated differently, you can have <GA, 1234>, <GA, 0000> and <NC, 1234> as values for the <State, License-num> pair, even if they have one element in common, what is forbidden is to have both element in common (i.e., you cannot have <GA, 1234> twice). If both elements were common, that would mean that we are registering a driver that was already in the database.
    Yes, we do need the state and the license number to uniquely identify a driver’s license, since many states use the same license format.
  • Insurance has a primary key, and three foreign keys. The foreign keys must satisfy the referential integrity constraint: if the value stored in Insured-Car is not NULL (which it could be), then it has to be a value that occurs as the VIN value of some tuple in the CAR relation. For the Insured-Driver-State and Insured-Driver-License-Num, the situation is similar: they must either both be NULL, or be values that occurs paired together as the values for State and Licence-Num in a tuple in the CAR relationship. If e.g. Insured-Car was containing the VIN of a car not in the CAR relation, that would mean we are trying to insure a car that is “not known” from the database’s perspective, something we certainly want to avoid.
  • In Price, we have a primary key and a foreign key that obey similar requirements as before.

Transactions and Operations

The operations you can perform on your data are of two kinds: retrievals and updates.

  • Retrievals leave the relation state as it is and output a result relation. That is, retrieval: relation state → result relation
  • Updates change the relation state. That is, update: relation state → relation state

There are two constraints for updates:

  • The new relation state must be “valid” (i.e., comply with the state constraints).
  • There might be transition constraints (your balance cannot become negative, for instance).

A transaction is a series of retrievals and updates performed by an application program, that leaves the database in a consistent state.

In the following, we give examples of insertion, deletion and update that could be performed, as well as how they could lead a database to become inconsistent. The annotations (1.), (2.) and (3.) refer to the “remedies”, discussed afterward.

Insert

Insert  <109920, Honda, Accord, 2012> into CAR

How things can go wrong:

  • Inserting the values in the wrong order (meta)
  • NULL for any value of the attributes of the primary key (1.)
  • Duplicate value for all the values in the primary key (1.)
  • Wrong number of arguments (1.)
  • Fail to reference an existing value for a foreign key (1.)

Delete

Delete  the DRIVER tuple with State = GA and License_number = 123

How things can go wrong:

  • Deleting tuples inadvertently (meta)
  • Deleting tuples that are referenced (1., 2., 3.)

Update (a.k.a. Modify)

Update  Name of tuple in DRIVER where State = GA and License_number = 123 to Georges

How things can go wrong:

  • NULL for the any value of the attributes of the primary key (1.)
  • Duplicate value for the primary key (1.)
  • Change value that are referenced (1., 2., 3.)
  • Change foreign key to a non-existing value (1.)

Dealing with Violations

When the operation leads the database to become inconsistent, you can either:

  • Reject (restrict) the operation,
  • Cascade (propagate) the modification,
  • Set default, or set NULL, the corresponding value(s).

Exercises

Exercise 2.1

Match the items in Column A to the items in Column B:

Column AColumn B
RowAttribute
Column headerTuple
TableRelation

Exercise 2.2

What do we call the number of attributes in a relation?

Exercise 2.3

At the logical level, does the order of the tuples in a relation matter?

Exercise 2.4

What is the difference between a database schema and a database state?

Exercise 2.5

What should we put as a value in an attribute if its value is unknown?

Exercise 2.6

What, if any, is the difference between a superkey, a key, and a primary key?

Exercise 2.7

Name the two kinds of integrity that must be respected by the tuples in a relation.

Exercise 2.8

What is entity integrity? Why is it useful?

Exercise 2.9

Are we violating an integrity constraint if we try to set the value of an attribute that is part of a primary key to NULL? If yes, which one?

Exercise 2.10

If in a relation R1, an attribute A1 is a foreign key referencing an attribute A3 in a relation R2, what does this implies about A2?

Exercise 2.11

Give three examples of operations.

Exercise 2.12

What is the difference between an operation and a transaction?

Exercise 2.13

Consider the following two relations:

COMPUTER(Owner, RAM, Year, Brand)
OS(Name, Version, Architecture)

For each, give

  1. The arity of the relation,
  2. A (preferably plausible) example of tuple to insert.

Exercise 2.14

Give three different ways to deal with operations whose execution in isolation would result in the violation of one of the constraint.

Exercise 2.15

Define what is the domain constraint.

Exercise 2.16

Consider the following three relations:

relations example

For each relation, answer the following:

  1. What is, presumably, the primary key?
  2. Are they, presumably, any foreign key?
  3. Using the model you defined, could we determine which author won the greatest number of awards a particular year?

Exercise 2.17

Consider the following three relations

relations example
  1. What are the foreign keys in the ASSIGNED-TO relation? What are they refering?
  2. In the ASSIGNED-TO relation, explain why the Date attribute is part of the primary key. What would happen if it was not?
  3. Assuming the database is empty, are the following instructions valid? If not, what integrity constraint are they violating?
    1. Insert <'AM-356', 'Surfliner', 2012> into TRAIN
    2. Insert <NULL, 'Graham Palmer', 'Senior'> into CONDUCTOR
    3. Insert <'XB-124', 'GPalmer', '02/04/2018'> into ASSIGNED-TO
    4. Insert <'BTed', 'Bobby Ted', 'Senior'> and <'BTed', 'Bobby Ted Jr.', 'Junior'> into CONDUCTOR

Exercise 2.18

Consider the following relation schema and state:

ABCD
2BlueAustintrue
1YellowParistrue
1PurplePisafalse
2YellowAugustatrue

Assuming that this is all the data we will ever have, discuss whenever {A, B, C, D}, {A, B} and {B} are superkeys and/or keys.

Exercise 2.19

Consider the following relation and possible state. Assuming that this is all the data we will ever have, give two superkeys, and one key, for this relation.

ABCD
1AustintrueShelly
1ParistrueCheryl
3PisafalseSheila
1AugustatrueAsh
1PisatrueLinda

Exercise 2.20

Consider the following relation and possible state. Assuming that this is all the data we will ever have, give three superkeys for this relation, and, for each of them, indicate if they are a key as well.

ABCD
1AAustintrue
2BParistrue
1CPisafalse
2CAugustatrue
1BAugustatrue

Exercise 2.21

Consider the following two relations:

relations diagram
  1. Give two possible tuples for the BUILDING relation, and two possible tuples for the ROOM relation such that the state is consistent.
  2. Based on the data you gave previously, write (in pseudo-code) one INSERT and one UPDATE instruction. Both should violate the integrity of your database.

Exercise 2.22

Consider the following two relations:

  • A Movie relation, with attributes “Title” and “Year”. The “Title” attribute should be the primary key.
  • A Character relation, with attributes “Name”, “First_Appearance”. The “Name” attribute should be the primary key, and the “First_Appearance” attribute should be a foreign key referencing the Movie relation.
  1. Draw its relational model.
  2. Give an example of data that would violate the integrity of your database, and name the kind of integrity you are violating.

Solutions to Exercises

Solution 2.1

Row is Tuple, Column header is Attribute, Table is Relation.

Solution 2.2

The degree, or arity, of the relation.

Solution 2.3

No, it is a set.

Solution 2.4

The schema is the organization of the database (the meta-data), while the state is the state is the content of the database (the data).

Solution 2.5

NULL

Solution 2.6

A superkey is a subset of attributes such that no two tuples have the same combination of values for all those attributes. A key is a minimal superkey, i.e., a superkey from which we cannot remove any attribute without losing the uniqueness constraint. The primary key is one of the candidate key, i.e., the key that was chosen.

Solution 2.7

Referential integrity and entity integrity.

Solution 2.8

Entity integrity ensures that each row of a table has a unique and non-null primary key value. It allows to make sure that every tuple is different from the others, and helps to “pick” elements in the database.

Solution 2.9

Yes, the entity integrity constraint.

Solution 2.10

Then we know that A2 is the primary key of R2, and that A1 and A2 have the same domain.

Solution 2.11

Reading from the database, performing UPDATE or DELETE operations.

Solution 2.12

An operation is an “atomic action” that can be performed on the database (adding an element, updating a value, removing an element, etc.). A transaction is a series of such operations, and the assumption is that, even if it can be made of operations that, taken individually, could violate a constraint, the overall transaction will leave the database in a consistent state.

Solution 2.13

  1. The arities of the relations are: COMPUTER has for arity 4, and OS has for arity 3.
  2. Examples of tuple to insert are (“Linda McFather”, 32, 2017, “Purism”), and (“Debian”, “Stable”, “amd64”).

Solution 2.14

An operation whose execution in isolation would result in the violation of a constraint can either a) be “restricted” (i.e., not executed), b) result in a propagation (i.e., the tuples that would violate a constraint are updated or deleted accordingly), or c) result in some values in tuples that would violate a constraint to be set to a default value, or the NULL value (this last option works only if the constraint violated is the referential entity constraint).

Solution 2.15

The requirement that each tuple must have for an attribute A an atomic value from the domain dom(A), or NULL.

Solution 2.16

To answer 1 and 2, the diagram would become:

diagram

For the last question, the answer is yes: based on the ISSN of the book, we can retrieve the author of the book. Hence, knowing which book was awarded which year, by looking in the GAINED-AWARD table, gives us the answer to that question.

Solution 2.17

  1. In ASSIGNED-TO, TrainRef is a FK to TRAIN.Ref, and ConductorID is a FK to CONDUCTOR.CompanyID.
  2. In this model, a conductor can be assigned to different trains on different days. If Date was not part of the PK of ASSIGNED-TO, then a conductor could be assigned to only one train.
    1. Yes, this instruction is valid.
    2. No, it violates the entity integrity constraint: NULL can be given as a value to an attribute that is part of the PK.
    3. No, it violates the referential integrity constraint: 'XB-124' and 'GPalmer' are not values in TRAIN.Ref and CONDUCTOR.CompanyID.
    4. No, it violates the key constraint: two tuples cannot have the same value for the values of the primary key.

Solution 2.18

  • {A, B, C, D} is a superkey (the set of all the attributes is always a superkey), but not a superkey, as removing e.g. D would still make it a superkey.
  • {A, B} is a superkey and a key, as neither {A} nor {B} are keys.
  • {A} is not a key, and not a superkey: multiple tuples have the value 1.

Solution 2.19

Possible superkeys are {A, B, C, D}, {A, B, C}, {A, C, D}, {B, C, D}, {A, B}, {B, C}. The possible keys are {A, B}, {A, C}, and {B, C}.

Solution 2.20

For this relation, {A, B, C, D}, {A, B, C}, and {D} are superkey. Only the latter, {D}, is a key (for {A, B, C}, removing either A or C still gives a superkey).

Solution 2.21

  1. For the BUILDING relation: <“A.H”, “123 Main St.”>, <“U.H.”, “123 Main St.”>. For the ROOM relation: <12, “A.H.”>, <15, “A.H.”>.
  2. INSERT <"A.H.", NULL> would violate the requirement not to have two tuples with the same value for the attributes that constitute the primary key in the BUILDING relation. UPDATE ROOM with CODE = 12 to Building = "G.C.C." would create an entry referencing a name in the BUILDING relation that does not exist.
Solution 2.22
  1. The relations would be drawn as follows:
diagram
  • Inserting <“Ash”, “Evil Dead”> into the CHARACTER relation would cause an error if the database was empty, since no movie with the primary key “Evil Dead” has been introduced yet: this would be a referential integrity constraint violation. To violate the entity integrity constraint, it would suffice to insert the value <NULL, 2019> into the MOVIE relation.

Problems

Problem 2.1 (Find a candidate key for the CLASS relation)

Consider the relation:

CLASS(Course_Number, Univ_Section_Number, Instructor_Name, Semester, Building_Code, Room_Number, Time, Weekdays, Credit_Hours)
  • This relation represents classes taught in a university.
  • The goal is to be able to have multiple offerings (classes) of courses over several semesters.
  • List three possible candidate keys and describe under what conditions each candidate key would be valid.
  • Each candidate key should have between one and three attributes.

Here are some examples of values for the attributes:

AttributePossible Value
Course_NumberCSCI3410, CSCI1302
Building_CodeAH, UH, ECC
Univ_Section_Number1, 2, 3
Room_NumberE127, N118
Instructor_NameJohn Smith, Sophie Adams
Time1400, 1230, 0900
SemesterSpring 2015, Fall 2010, Summer 2012
WeekdaysM, MW, MWF, T, TH
Credit_Hours1, 2, 3, 4

Problem 2.2 (Design a relational model for a cinema company)

A cinema company wants you to design a relational model for the following set-up:

  • The company has movie stars. Each star has a name, birth date, and unique ID.
  • The company has the following information about movies: title, year, length, and genre. Each movie has a unique ID and features multiple stars.
  • The company owns movie theaters as well. Each theater has a name, address, and a unique ID.
  • Furthermore, each theater has a set of auditoriums. Each auditorium has a unique number, and seating capacity.
  • Each theater can schedule movies at show-times. Each show-time has a unique ID, a start time, is for a specific movie, and is in a specific theater auditorium.
  • The company sells tickets for scheduled show-times. Each ticket has a unique ticket ID and a price.

Problem 2.3 (Design a relational model for bills)

Propose a relational model for the following situation:

  • The database will be used to store all of the bills that are debated and voted on by the U.S. House of Representatives (HR). Each bill has a name, a unique sponsor who must be a member of the HR, and an optional date of when it was discussed.
  • It must record the name, political group, and beginning and expected end-of-term dates for each HR member.
  • It will also record the names of the main HR positions: Speaker, Majority Leader, Minority Leader, Majority Whip, and Minority Whip.
  • Finally, it will record the vote of every member of the HR for each bill.

Problem 2.4 (Relational model for universities)

Propose a relational model for the following situation:

  • You want to store information about multiple universities. A university has multiple departments, a name and a website.
  • Each department offers multiple courses. A course has a name, one (or multiple, when it is cross-listed) code, a number of credit hours.
  • A campus has a name, an address, and belong to one university.
  • A department has a contact address, a date of creation and a unique code.

Problem 2.5 (Relational model for an auction website)

We want to design a relational model for an auction website. Members (that can be buyers, sellers, both or neither) can participate in the sale of items.

  • Members are identified by a unique identifier and have an email address and a nickname.
  • Buyers have a unique identifier, a preferred method of payment and a shipping address.
  • Sellers have a unique identifier, a rating and a bank account number.
  • Items are offered by a seller for sale and are identified by a unique item number. Items also have a name and a starting bid price.
  • Members make bids for items that are for sale. Each bid has a unique identifier, a bidding price and a timestamp.

When creating your schema, do not add any new information, and try as much as possible to avoid relations that will create redundant data and NULL entries. Note that we should be able to uniquely determine the member account linked to the seller account, and similarly for buyers accounts. Furthermore, members can have at most one buyer and one seller account.

Solutions to Selected Problems

Solution to Problem 2.2 (Design a relational model for a cinema company)

A possible solution is:

diagram

Solution to Problem 2.3 (Design a relational model for bills)

Be careful: saying that a bill has a unique sponsor does not imply that a the sponsor is a good primary key for the bills: a house member could very well be the sponsor of multiple bills! It just implies that a single attribute is enough to hold the name of the sponsor.

diagram

For simplicity, we added an ID to our MEMBER and BILL relations. Note that having a “role” in the MEMBER relation to store the information about speaker, etc., would be extremely inefficient, since we would add an attribute to the ~435 members that would be NULL in ~430 of them.

Solution to Problem 2.4 (Relational model for universities)

A possible solution follows. The part that is the hardest to accomodate is the fact that a course can have multiple codes. We are reading here “cross-listed” as “a course that is offered under more than one departmental heading and can receive different codes (e.g., CSCI XXXX and AIST YYYY)”.

diagram

Annotate

Next Chapter
The SQL Programming Language
PreviousNext
This text is licensed under a CC BY 4.0 license.
Powered by Manifold Scholarship. Learn more at
Opens in new tab or windowmanifoldapp.org