The Relational Model
Resources
- (Elmasri and Navathe 2010, ch. 3), (Elmasri and Navathe 2015, ch. 5), including the exercises (look at the exercises 15 and 16, for instance).
- The wikipedia page for Relational model and the category “Relational database management systems”.
Concepts
Terminology
The relational data model (or relational database schema) is:
- a mathematical model (use mathematical relations, set-theory, first-order predicate logic)
- with multiple implementations (“engineering approximation”)
Domains, Attributes, Tuples and Relations
Definitions
- Domain (or type) = set of atomic (as far as the relation is concerned) values. You can compare it to datatype and literals, and indeed it can be given in the form of a data type, but it can be named and carry a logical definition (i.e.,
List_of_major
as an enumerated data type, instead of justString
), enforce some constraints (i.e.,UNIQUE
, to force all the values to be different), or even have a default value. - Attribute = Attribute name + attribute domain (but we’ll just write the name).
- Relation Schema (or scheme) = description of a relation, often written “RELATION_NAME(Attribute1, …, Attributen)”, where n is the degre (arity) of the relation, and the domain of Attributei is written dom(Attributei).
- Tuple t of the schema R(A1, …, An) is an ordered list of values <v1, …, vn> where vi is in dom(Ai) or a special
NULL
value. - Relation (or relation state) r of the schema R(A1, …, An), also written r(R), is the set of n-tuples t1, …, tm where each ti is a tuple of the schema R(A1, …, An).
Characteristics of Relations
- In a relation, the order of tuples does not matter (a relation is a set). Order in tuple do matter (alternate representation where this is not true exist, cf. self-describing data).
- Value is atomic = “flat relational model”, we will always be in the first normal form (not composite, not multi-valued).
NULL
is N/A, unknown, unavailable (or withheld).- While a relation schema is to be read like an assertion (e.g., “Every student has a name, a SSN, …”) a tuple is a fact (e.g., “The student Bob Taylor has SSN 12898, …”).
- Relations represents uniformly entities (STUDENT(…)) and relations (PREREQUISITE(Course_number, Prerequisite_number)).
Notation
- STUDENT = relation schema + current relation state
- STUDENT(Name, …, Major) = relation schema only
- STUDENT.Name = Attribute Name in the relation STUDENT
- t[Name], t[Name, Major], t.Name (overloading the previous notation) for the value of Name (and Major) in the tuple t.
Constraints
We now study constraints on the tuples. There are constraints on the scheme, for instance, “a relation cannot have two attributes with the same name”, but we studied those already. The goal of those constraints is to maintain the validity of the relations, and to enforce particular connexions between relations.
Inherent Model-Based Constraints (implicit)
Those are part of the definition of the relational model and are independent of the particular relation we are looking at.
- You can not have two identical tuples in the same relation,
- The arity of the tuple must match the arity of the relation.
Schema-Based Constraints (explicit)
Those constraints are parts of the schema.
- The value must match its domain (“Domain constraint”), knowing that a domain can have additional constraints (
NOT NULL
,UNIQUE
). - The entity integrity constraint: no primary key value can be
NULL
. - The referential integrity constraint: referred values must exist.
Those last two constraints will be studied in the next section.
Application-Based Constraints (semantics)
Constraints that cannot be expressed in the schema, and hence must be enforced by
- the application program,
- or the database itself, using triggers or assertions.
Examples: “the age of an employee must be greater than 16”, “this year’s salary increase must be more than last year’s”.
Keys
Since we can not have two identical tuples in the same relation, there must be a subset of values that distinguish them. We study the corresponding subset of attributes.
- A superkey is the subset of attributes for which no two tuples have the same values. Formally, the set of attributes SK is a superkey for the relation R, if for all relation state r of R, all tuples t1, t2 in r are such that t1[SK] ≠ t2[SK].
- A key is a minimal superkey (i.e., removing any attribute from SK would break the uniqueness property).
- A candidate key is a key, a primary key is the selected candidate key (it is underlined).
Let us consider the following example:
A | B | C | D |
---|---|---|---|
Yellow | Square | 10 | (5, 3) |
Blue | Rectangle | 10 | (3, 9) |
Blue | Circle | 9 | (4, 6) |
and the following sets of attributes:
{A, B, C, D} | {A} | {B, C} | {D} | |
---|---|---|---|---|
Superkey? | ✔ | ✘ | ✔ | ✔ |
Key? | ✘ | ✘ | ✔ | ✔ |
Note that here we “retro-fit” those definitions, in database design, they come first (i.e., you define what attributes should always distinguish between tuples before populating your database). We are making the assumption that the data pre-exist to the specification to make the concept clearer.
Foreign Keys
A foreign key (FK) is a set of attributes whose values must match the value in a tuple in another, pre-defined relation. Formally, the set of attributes FK in the relation schema R1 is a foreign key of R1 (“referencing relation”) that references R2 (“referenced relation”) if
- FK refers to R2 (i.e., the attributes in FK have the same domain(s) as the primary key PK of R2)
- a value of FK in a tuple t1 of r1(R1) either
- occurs as a value of PK for some tuple t2 of r2(R2), i.e., t1[FK] = t2[PK]
- is
NULL
If there is a foreign key from R1 to R2, then we say that there is a referential integrity constraint from R1 to R2. We draw it with an arrow from the FK to the PK. Note that it is possible that R1 = R2.
Example
- In CAR, VIN is the primary key, so it must satisfy the entity integrity constraint, and its value can not be
NULL
. Note also that all the values must be different, as the same value cannot occur twice as the primary key of tuples: we don’t want to enter the same VIN twice, that would mean we are registering a car that was already registered in our database! - In DRIVER, State and License-num are the primary key, so they must together satisfy the intergrity constrant: neither of them can be
NULL
. Furthermore, their pair must be different from all the other values. Stated differently, you can have<GA, 1234>
,<GA, 0000>
and<NC, 1234>
as values for the <State, License-num> pair, even if they have one element in common, what is forbidden is to have both element in common (i.e., you cannot have<GA, 1234>
twice). If both elements were common, that would mean that we are registering a driver that was already in the database.Yes, we do need the state and the license number to uniquely identify a driver’s license, since many states use the same license format.
- Insurance has a primary key, and three foreign keys. The foreign keys must satisfy the referential integrity constraint: if the value stored in Insured-Car is not
NULL
(which it could be), then it has to be a value that occurs as the VIN value of some tuple in the CAR relation. For the Insured-Driver-State and Insured-Driver-License-Num, the situation is similar: they must either both beNULL
, or be values that occurs paired together as the values for State and Licence-Num in a tuple in the CAR relationship. If e.g. Insured-Car was containing the VIN of a car not in the CAR relation, that would mean we are trying to insure a car that is “not known” from the database’s perspective, something we certainly want to avoid. - In Price, we have a primary key and a foreign key that obey similar requirements as before.
Transactions and Operations
The operations you can perform on your data are of two kinds: retrievals and updates.
- Retrievals leave the relation state as it is and output a result relation. That is, retrieval: relation state → result relation
- Updates change the relation state. That is, update: relation state → relation state
There are two constraints for updates:
- The new relation state must be “valid” (i.e., comply with the state constraints).
- There might be transition constraints (your balance cannot become negative, for instance).
A transaction is a series of retrievals and updates performed by an application program, that leaves the database in a consistent state.
In the following, we give examples of insertion, deletion and update that could be performed, as well as how they could lead a database to become inconsistent. The annotations (1.), (2.) and (3.) refer to the “remedies”, discussed afterward.
Insert
Insert <109920, Honda, Accord, 2012> into CAR
How things can go wrong:
- Inserting the values in the wrong order (meta)
NULL
for any value of the attributes of the primary key (1.)- Duplicate value for all the values in the primary key (1.)
- Wrong number of arguments (1.)
- Fail to reference an existing value for a foreign key (1.)
Delete
Delete the DRIVER tuple with State = GA and License_number = 123
How things can go wrong:
- Deleting tuples inadvertently (meta)
- Deleting tuples that are referenced (1., 2., 3.)
Update (a.k.a. Modify)
Update Name of tuple in DRIVER where State = GA and License_number = 123 to Georges
How things can go wrong:
NULL
for the any value of the attributes of the primary key (1.)- Duplicate value for the primary key (1.)
- Change value that are referenced (1., 2., 3.)
- Change foreign key to a non-existing value (1.)
Dealing with Violations
When the operation leads the database to become inconsistent, you can either:
- Reject (restrict) the operation,
- Cascade (propagate) the modification,
- Set default, or set
NULL
, the corresponding value(s).
Exercises
Exercise 2.1
Match the items in Column A to the items in Column B:
Column A | Column B |
---|---|
Row | Attribute |
Column header | Tuple |
Table | Relation |
Exercise 2.2
What do we call the number of attributes in a relation?
Exercise 2.3
At the logical level, does the order of the tuples in a relation matter?
Exercise 2.4
What is the difference between a database schema and a database state?
Exercise 2.5
What should we put as a value in an attribute if its value is unknown?
Exercise 2.6
What, if any, is the difference between a superkey, a key, and a primary key?
Exercise 2.7
Name the two kinds of integrity that must be respected by the tuples in a relation.
Exercise 2.8
What is entity integrity? Why is it useful?
Exercise 2.9
Are we violating an integrity constraint if we try to set the value of an attribute that is part of a primary key to NULL
? If yes, which one?
Exercise 2.10
If in a relation R1, an attribute A1 is a foreign key referencing an attribute A3 in a relation R2, what does this implies about A2?
Exercise 2.11
Give three examples of operations.
Exercise 2.12
What is the difference between an operation and a transaction?
Exercise 2.13
Consider the following two relations:
COMPUTER(Owner, RAM, Year, Brand)
OS(Name, Version, Architecture)
For each, give
- The arity of the relation,
- A (preferably plausible) example of tuple to insert.
Exercise 2.14
Give three different ways to deal with operations whose execution in isolation would result in the violation of one of the constraint.
Exercise 2.15
Define what is the domain constraint.
Exercise 2.16
Consider the following three relations:
For each relation, answer the following:
- What is, presumably, the primary key?
- Are they, presumably, any foreign key?
- Using the model you defined, could we determine which author won the greatest number of awards a particular year?
Exercise 2.17
Consider the following three relations
- What are the foreign keys in the ASSIGNED-TO relation? What are they refering?
- In the ASSIGNED-TO relation, explain why the Date attribute is part of the primary key. What would happen if it was not?
- Assuming the database is empty, are the following instructions valid? If not, what integrity constraint are they violating?
Insert <'AM-356', 'Surfliner', 2012> into TRAIN
Insert <NULL, 'Graham Palmer', 'Senior'> into CONDUCTOR
Insert <'XB-124', 'GPalmer', '02/04/2018'> into ASSIGNED-TO
Insert <'BTed', 'Bobby Ted', 'Senior'> and <'BTed', 'Bobby Ted Jr.', 'Junior'> into CONDUCTOR
Exercise 2.18
Consider the following relation schema and state:
A | B | C | D |
---|---|---|---|
2 | Blue | Austin | true |
1 | Yellow | Paris | true |
1 | Purple | Pisa | false |
2 | Yellow | Augusta | true |
Assuming that this is all the data we will ever have, discuss whenever {A, B, C, D}, {A, B} and {B} are superkeys and/or keys.
Exercise 2.19
Consider the following relation and possible state. Assuming that this is all the data we will ever have, give two superkeys, and one key, for this relation.
A | B | C | D |
---|---|---|---|
1 | Austin | true | Shelly |
1 | Paris | true | Cheryl |
3 | Pisa | false | Sheila |
1 | Augusta | true | Ash |
1 | Pisa | true | Linda |
Exercise 2.20
Consider the following relation and possible state. Assuming that this is all the data we will ever have, give three superkeys for this relation, and, for each of them, indicate if they are a key as well.
A | B | C | D |
---|---|---|---|
1 | A | Austin | true |
2 | B | Paris | true |
1 | C | Pisa | false |
2 | C | Augusta | true |
1 | B | Augusta | true |
Exercise 2.21
Consider the following two relations:
- Give two possible tuples for the BUILDING relation, and two possible tuples for the ROOM relation such that the state is consistent.
- Based on the data you gave previously, write (in pseudo-code) one
INSERT
and oneUPDATE
instruction. Both should violate the integrity of your database.
Exercise 2.22
Consider the following two relations:
- A Movie relation, with attributes “Title” and “Year”. The “Title” attribute should be the primary key.
- A Character relation, with attributes “Name”, “First_Appearance”. The “Name” attribute should be the primary key, and the “First_Appearance” attribute should be a foreign key referencing the Movie relation.
- Draw its relational model.
- Give an example of data that would violate the integrity of your database, and name the kind of integrity you are violating.
Solutions to Exercises
Solution 2.1
Row is Tuple, Column header is Attribute, Table is Relation.
Solution 2.2
The degree, or arity, of the relation.
Solution 2.3
No, it is a set.
Solution 2.4
The schema is the organization of the database (the meta-data), while the state is the state is the content of the database (the data).
Solution 2.5
NULL
Solution 2.6
A superkey is a subset of attributes such that no two tuples have the same combination of values for all those attributes. A key is a minimal superkey, i.e., a superkey from which we cannot remove any attribute without losing the uniqueness constraint. The primary key is one of the candidate key, i.e., the key that was chosen.
Solution 2.7
Referential integrity and entity integrity.
Solution 2.8
Entity integrity ensures that each row of a table has a unique and non-null primary key value. It allows to make sure that every tuple is different from the others, and helps to “pick” elements in the database.
Solution 2.9
Yes, the entity integrity constraint.
Solution 2.10
Then we know that A2 is the primary key of R2, and that A1 and A2 have the same domain.
Solution 2.11
Reading from the database, performing UPDATE
or DELETE
operations.
Solution 2.12
An operation is an “atomic action” that can be performed on the database (adding an element, updating a value, removing an element, etc.). A transaction is a series of such operations, and the assumption is that, even if it can be made of operations that, taken individually, could violate a constraint, the overall transaction will leave the database in a consistent state.
Solution 2.13
- The arities of the relations are: COMPUTER has for arity 4, and OS has for arity 3.
- Examples of tuple to insert are (“Linda McFather”, 32, 2017, “Purism”), and (“Debian”, “Stable”, “amd64”).
Solution 2.14
An operation whose execution in isolation would result in the violation of a constraint can either a) be “restricted” (i.e., not executed), b) result in a propagation (i.e., the tuples that would violate a constraint are updated or deleted accordingly), or c) result in some values in tuples that would violate a constraint to be set to a default value, or the NULL
value (this last option works only if the constraint violated is the referential entity constraint).
Solution 2.15
The requirement that each tuple must have for an attribute A an atomic value from the domain dom(A), or NULL
.
Solution 2.16
To answer 1 and 2, the diagram would become:
For the last question, the answer is yes: based on the ISSN of the book, we can retrieve the author of the book. Hence, knowing which book was awarded which year, by looking in the GAINED-AWARD table, gives us the answer to that question.
Solution 2.17
- In ASSIGNED-TO, TrainRef is a FK to TRAIN.Ref, and ConductorID is a FK to CONDUCTOR.CompanyID.
- In this model, a conductor can be assigned to different trains on different days. If Date was not part of the PK of ASSIGNED-TO, then a conductor could be assigned to only one train.
- Yes, this instruction is valid.
- No, it violates the entity integrity constraint:
NULL
can be given as a value to an attribute that is part of the PK. - No, it violates the referential integrity constraint:
'XB-124'
and'GPalmer'
are not values inTRAIN.Ref
andCONDUCTOR.CompanyID
. - No, it violates the key constraint: two tuples cannot have the same value for the values of the primary key.
Solution 2.18
- {A, B, C, D} is a superkey (the set of all the attributes is always a superkey), but not a superkey, as removing e.g. D would still make it a superkey.
- {A, B} is a superkey and a key, as neither {A} nor {B} are keys.
- {A} is not a key, and not a superkey: multiple tuples have the value 1.
Solution 2.19
Possible superkeys are {A, B, C, D}, {A, B, C}, {A, C, D}, {B, C, D}, {A, B}, {B, C}. The possible keys are {A, B}, {A, C}, and {B, C}.
Solution 2.20
For this relation, {A, B, C, D}, {A, B, C}, and {D} are superkey. Only the latter, {D}, is a key (for {A, B, C}, removing either A or C still gives a superkey).
Solution 2.21
- For the BUILDING relation: <“A.H”, “123 Main St.”>, <“U.H.”, “123 Main St.”>. For the ROOM relation: <12, “A.H.”>, <15, “A.H.”>.
INSERT <"A.H.", NULL>
would violate the requirement not to have two tuples with the same value for the attributes that constitute the primary key in the BUILDING relation.UPDATE ROOM with CODE = 12 to Building = "G.C.C."
would create an entry referencing a name in the BUILDING relation that does not exist.
- The relations would be drawn as follows:
- Inserting <“Ash”, “Evil Dead”> into the CHARACTER relation would cause an error if the database was empty, since no movie with the primary key “Evil Dead” has been introduced yet: this would be a referential integrity constraint violation. To violate the entity integrity constraint, it would suffice to insert the value <NULL, 2019> into the MOVIE relation.
Problems
Problem 2.1 (Find a candidate key for the CLASS relation)
Consider the relation:
CLASS(Course_Number, Univ_Section_Number, Instructor_Name, Semester, Building_Code, Room_Number, Time, Weekdays, Credit_Hours)
- This relation represents classes taught in a university.
- The goal is to be able to have multiple offerings (classes) of courses over several semesters.
- List three possible candidate keys and describe under what conditions each candidate key would be valid.
- Each candidate key should have between one and three attributes.
Here are some examples of values for the attributes:
Attribute | Possible Value |
---|---|
Course_Number | CSCI3410, CSCI1302 |
Building_Code | AH, UH, ECC |
Univ_Section_Number | 1, 2, 3 |
Room_Number | E127, N118 |
Instructor_Name | John Smith, Sophie Adams |
Time | 1400, 1230, 0900 |
Semester | Spring 2015, Fall 2010, Summer 2012 |
Weekdays | M, MW, MWF, T, TH |
Credit_Hours | 1, 2, 3, 4 |
Problem 2.2 (Design a relational model for a cinema company)
A cinema company wants you to design a relational model for the following set-up:
- The company has movie stars. Each star has a name, birth date, and unique ID.
- The company has the following information about movies: title, year, length, and genre. Each movie has a unique ID and features multiple stars.
- The company owns movie theaters as well. Each theater has a name, address, and a unique ID.
- Furthermore, each theater has a set of auditoriums. Each auditorium has a unique number, and seating capacity.
- Each theater can schedule movies at show-times. Each show-time has a unique ID, a start time, is for a specific movie, and is in a specific theater auditorium.
- The company sells tickets for scheduled show-times. Each ticket has a unique ticket ID and a price.
Problem 2.3 (Design a relational model for bills)
Propose a relational model for the following situation:
- The database will be used to store all of the bills that are debated and voted on by the U.S. House of Representatives (HR). Each bill has a name, a unique sponsor who must be a member of the HR, and an optional date of when it was discussed.
- It must record the name, political group, and beginning and expected end-of-term dates for each HR member.
- It will also record the names of the main HR positions: Speaker, Majority Leader, Minority Leader, Majority Whip, and Minority Whip.
- Finally, it will record the vote of every member of the HR for each bill.
Problem 2.4 (Relational model for universities)
Propose a relational model for the following situation:
- You want to store information about multiple universities. A university has multiple departments, a name and a website.
- Each department offers multiple courses. A course has a name, one (or multiple, when it is cross-listed) code, a number of credit hours.
- A campus has a name, an address, and belong to one university.
- A department has a contact address, a date of creation and a unique code.
Problem 2.5 (Relational model for an auction website)
We want to design a relational model for an auction website. Members (that can be buyers, sellers, both or neither) can participate in the sale of items.
- Members are identified by a unique identifier and have an email address and a nickname.
- Buyers have a unique identifier, a preferred method of payment and a shipping address.
- Sellers have a unique identifier, a rating and a bank account number.
- Items are offered by a seller for sale and are identified by a unique item number. Items also have a name and a starting bid price.
- Members make bids for items that are for sale. Each bid has a unique identifier, a bidding price and a timestamp.
When creating your schema, do not add any new information, and try as much as possible to avoid relations that will create redundant data and NULL
entries. Note that we should be able to uniquely determine the member account linked to the seller account, and similarly for buyers accounts. Furthermore, members can have at most one buyer and one seller account.
Solutions to Selected Problems
Solution to Problem 2.2 (Design a relational model for a cinema company)
A possible solution is:
Solution to Problem 2.3 (Design a relational model for bills)
Be careful: saying that a bill has a unique sponsor does not imply that a the sponsor is a good primary key for the bills: a house member could very well be the sponsor of multiple bills! It just implies that a single attribute is enough to hold the name of the sponsor.
For simplicity, we added an ID
to our MEMBER
and BILL
relations. Note that having a “role” in the MEMBER
relation to store the information about speaker, etc., would be extremely inefficient, since we would add an attribute to the ~435 members that would be NULL
in ~430 of them.
Solution to Problem 2.4 (Relational model for universities)
A possible solution follows. The part that is the hardest to accomodate is the fact that a course can have multiple codes. We are reading here “cross-listed” as “a course that is offered under more than one departmental heading and can receive different codes (e.g., CSCI XXXX and AIST YYYY)”.