Designing a Good Database
Resources
This part of the lecture covers significantly more material than the other, hence we give the details of the references below:
- ER models: (Elmasri and Navathe 2010, ch. 7) or (Elmasri and Navathe 2015, ch. 3)
- The ER to Relational model: (Elmasri and Navathe 2010, ch. 9.1) or (Elmasri and Navathe 2015, ch. 9.1)
- Normalization: (Elmasri and Navathe 2010, ch. 7) or (Elmasri and Navathe 2015, ch. 3)
- UML: not so much in the textbook, but you can look at (Elmasri and Navathe 2010, ch. 7.8, 10.3) or (Elmasri and Navathe 2015, ch. 3.8).
Interest for High-Level Design
Previous relational models have mistakes and limitations:
- What if a hurricane is over more than one state?
- What if an insurance covers more than one car, more than one driver?
- Changing the code “on the fly”, as we did for the
HW_Lecture
andGrade
tables, is difficult and error-prone.
We could go back and forth between relational models (~ logical level) and SQL implementations (~ physical level), but we will use even more high-level tools (~ conceptual level):
- Entity Relationship Models (ER, static: DB)
- Unified Modelling Diagrams (UML, dynamic: DB + software)
-
Enhanced Entity Relationship Models (EER, adds operations to ER)
Feature Conceptual Logical Physical (Main) Audience Business Designer Programmer Entity Names ✔ ✔ Entity Relationships ✔ ✔ Attributes (✔) ✔ Cardinalities ✔ ✔ Primary Keys ✔ ✔ Foreign Keys ✔ ✔ Column Data types (Domain) (✔) ✔ Table Names ✔ Column Names ✔
The conceptual data model is (in theory at least) independent of the choice of database technology.
Remember that in relational models, relations were representing entities (Student
) and relationships (Majors_In
). At the conceptual level, and more particularly in ER diagram, the distinction is made between entities and relationship.
Entity-Relationship Model
Data is organized into entities (with attributes), relationships between entities (with attributes as well).
Entities
- Entity = Thing, object, with independent existence.
- Each entity has attributes (properties)
Entity A:
- Name = Clément Aubert
- Address = HCOB, HA, E. 128 ; Invented St., Auguta, GA
- Diploma = Ph.D in CS; BS in Math
- Highest Diploma = Ph.D in CS
- Favorite Class = CSCI 1301
- Favorite Sport = NULL
Some vocabulary:
- Entity = actual thing (individual)
- Entity type = collection of entities with the same attributes
- Entity set (or collection) = collection of all entities of a particular entity type.
Attributes
Attributes can be
- Composite (divided in smaller parts) or simple (atomic)
- Single-valued or multi-valued
- Stored vs derived
- Nested!
{…} = multi-valued
(…) = complex
For instance, one could
- store the name using a composite attribute (First Name, {Middle Name}, Last Name),
- store multiple addresses using the “schema”
{Address(Street, Number, Apt, City, State, ZIP)}
, - derive the value of “Highest Diploma” using the value(s) stored in “Diploma”.
Key Attributes
A key attribute is an attribute whose value is distinct for each entity in the entity set.
- Serve to identify an entity,
- Can be more than one such attribute (and we leave the options open),
- Cannot be multiple attributes: if more than one attribute is needed to make a key attribute, combine them into a composite attribute and make it the key.
- A composite attribute that is a key attribute should not still be a key attribute if we were to remove one of the attribute (similar to the minimality requirement).
- An entity with no key is called a weak entity type: it is an entity that will be identified thanks to its relation to other entities, and thanks to its partial key (we will discuss this later).
Drawing Entity Types
- Entity = squared box (name in upper case)
- Attribute = rounded box connected to square box (name in lower case)
If the attribute is …, | then… |
---|---|
composite | other attributes are connected to it |
multi-valued | the box have double lines |
derived | the box have dotted lines |
a key | the name of the attribute is underlined |
In the following, we’ll focus on the relationship between the entities more than on the attributes of particular entities, so we’ll sometimes simply draw
leaving the attributes un-specified (but that does not mean that they all have to be atomic) or even just
but that does not mean that the entity type have no attribute!
Relationships
Vocabulary
- Relationship = actual relation (or action) between entities (“teaches”, “loves”, “possesses”, etc.).
- Relationship instance = r1 associates n entities e1, …, en (“Pr. X teaches CSCI YYY”, “There is love between Mary and Paul”, etc.)
- Relationship set = collection of instances
- Relationship type = abstraction (“Every course belong to one instructor”, “Love is a relation between two persons”, etc).
E1, … En participate in R, e1, …, en participate in r1, n is the degree.
Note that we can have Entity Set 1 = Entity Set 2, in which case we say the relation is recursive.
Some sources call the relationships between an entity and itself “unary.” Note that with our convention, it does not make sense to speak of a unary relationship.
Naming convention:
- Use a singular name for entity types.
- Use a verb for relationship.
- Relationship types are drawn in diamonds.
- Drawing usually reads right to left, and up to down.
Role Names and Recursive Relations
Convenient, and sometimes mandatory, to give role names.
If we want to stress that we are considering only one aspect of an entity type (that is, a person is not only an employee, a company is not only an employer, but this aspect is crucial for the “EMPLOYS” relation):
We can also use it to make the “right-side” and the “left-side” of a recursive relationship explicit:
Finally, we will sometimes use “Role Name of Entity 1 : Role Name of Entity 2” as a notation for the relation between them. For instance, we can write “Employer:Employee” to denote the “EMPLOYS” relation, and we will also use this notation when the relationship is between different entities, and write e.g. “PERSON:POSITION” for the “OCCUPIES” relation.
Constraints
Two constraints, called “structural constraints”, applies to relationship types: cardinality ratio and participation constraint. They both concerns the number of relationship instances an entity can participate in (which is different from the cardinality of a relationship type).
Cardinality Ratio
Maximum number of relationships instances that an entity can participate in.
For binary relations, can be 1 : 1, N : 1, 1 : N, or M : N. The 1 stands for “at most 1”, and the M, N, and P stand for “possibly more than 1”, or “no maximum”. In ER diagram, we do not count, and do not make the distinction between “at most 5” and “at most 10”, for instance.
An alternative notation, detailled later on, will address this shortcoming.
Possible examples include:
Relation | Possible Ratio | Explanation |
---|---|---|
MENTOR : MENTEE | 1 : N | “A mentor can have multiple mentees, a mentee has at most one mentor.” |
PERSON : SSN | 1 : 1 | “A person has one SSN, a SSN belongs to one person.” |
COURSE : DEPARTMENT | N : 1 | “A course is offered by one department, a department can offer any number of courses.” |
STUDENT : TEAM | M : N | “A student can participate in multiple team, a team can have multiple students.” |
We indicate the ratio on the edges:
Note that reflexive relations can have any ratio as well. An example of M : N recursive relation could be:
Participation Constraint
Minimum number of relationships instances that an entity can participant it, a.k.a. “minimum cardinality constraint.”
The participation can be total (a.k.a. existence dependency, the entity must be in that relationship at least once) or partial (the entity may or may not be in that relationship).
Total is drawn with a double line, partial is drawn with a single line:
This reads “a course must be offered by a department, but a department may or may not offer courses.”
Attributes
Relationships can have attributes too. The typical example is a date attribute, but other examples include
- TEACHING relation between PROF and CLASS (N : M) could have a “Quarter” attribute.
- MENTORING relation between MENTOR and MENTEE (1 : N) could have a “Since” attribute.
- EMITED_DRIVING_LICENCE between DMV and PERSON (N : 1) could have a “Date” attribute.
Note that an attribute on a relationship type can be atomic or composite, single or multi-valued, stored or derived, but that it cannot be a key attribute (after all, there are no entity to identify!).
Note that there are some moving aspects here: atributes on 1 : 1, 1 : N, N : 1 relationships can be migrated (to the N side when there is one, or to either side where there is none).
For instance, imagine that every phone uses exactly (= “at most and at least”) one carrier, that a carrier can provide network to multiple phones, and that the average quality of the network is an attribute in this relationship:
Then each instance of the relation would be of the form (“Phone X”, “Carrier Y”, “9/10”) for some way of ranking the average quality from 0 to 10. Note that, from the fact that the relationship is N : 1, this means that there is only one tuple involving “Phone X”: this means that the average quality could actually be seen as a property of the phone, and hence be migrated as an attribute to the phone side:
Note that we could not migrate the “average phone quality” to the “Carrier” side: imagine if we had the instances (“Phone X”, “Carrier Y”, “9/10”) and (“Phone Z”, “Carrier Y”, “3/10”), then should the attribute of “Carrier Y” be “9/10” or “3/10”: we have no way of deciding based on this model. Whenever it is a good choice to migrate this attribute or not will depend on the requirement of the models, and it may not always be appropriate to migrate the attribute to the entity. In the case of 1 : 1 relationship, migrating the attribute to both sides (i.e., to both entities) would be a mistake, since it would introduce redundancy in your model.
As an exercise, you can look at the relationships TEACHING, MENTORING and EMITED_DRIVING_LICENCE that are listed above, and see if the attributes can be migrated or not, and if yes, on which side.
Relationships of Degree Higher than Two
Of course, relationships can have a degree higher than two. An example of a ternary relation could be:
To determine cardinality ratio, one should fix all but one parameters, and wonder how many values of the remaining parameter can be in that relationship. Another wording for the same idea can be found in this thread.
Four our example, Customer Y and Bank Z could be in relationship with more than one account (hence the “N”). On the opposite, Customer Y and Account K would be in relationship with only one bank (hence the “1” on the bottom), and Bank Z and Account K would belong to only one customer (hence the “1” on the left).
Let us look at two other examples. First, assume we want to collect information about the treatment prescribed by physicians to patients, we could use a relationship like the following one:
Where
- The “P” stands for the fact that the same physician can prescribe the same treatment to multiple patients,
- The “N” stands for the fact that different treatment can be prescribe by the same physiciant to the same patient,
- The “M” stands for the fact that the same patient can get the same treatment from different physicians.
Now, if we want to store information about who is the president of a country during a term, we could get something like:
Note that this representation of the data assumes that a citizen cannot be the president of two different countries during the same term (the right 1), which could be debatable.
It is sometimes impossible to do without relations with arity greater than 2. For instance, consider the following two diagrams
You should realize that they convey different information. For instance, you can know for a fact that a person visit a bookshop only if they bought something in it, while the second diagram de-correlate the act of buying with the visit to a bookshop. Similarly, the second diagram could give you a hint that a person that owns a copy of a book Z and visits a bookshop X that sells it could also visit it, but you will not know that for sure.
An example of recursive ternary relation could be:
An example of relation of degree 4 could be:
The cardinality ratio are computed using the same method as described before.
Weak Entity Types
There are actually two sorts of entity types:
- Strong (a.k.a. regular, the ones we studied so far), with a key attribute,
- Weak, without key attribute.
Weak (or child) entity types are identified by identifying / owner type that is related to it, in conjunction with one attribute (the partial key). That relation is called identifying (or supporting) relationship, and weak entities have a total participation constraint. The partial key is an attribute, that, when paired with an entity with which they are in relation through their identifying relationship, allows to identify a particular entity.
Weak entities and identifying relationships have a double border, and partial key have a dotted underline, as follows:
The idea here is that we do not need to gather data about all the dependent in the world, or in isolation, but are interested in dependent only if they are related to en employee in our database. Just having the name of a dependent is not enough to identify them, but having their name and the SSN of the employee they are related to is enough. The identifying relation always have ratio 1 : M or 1 : 1: a weak entity cannot be related to more than one entity of the owner type, so that M : N ratio are not possible (cf. e.g. StackExchange). If you need to have, for instance, a dependant connected to multiple employees, then that means that your dependent entity should be strong, because it has an existence “of its own”.
You may wonder why we do not represent weak entities simply as (composite, multi-valued) attributes of their owner type. For instance, why would we use
instead of
? The answer depends whenever we need to have the ability to represent our weak entities (here, PET) as being in relationship with other entities (that can themselves be weak!), as follows:
This would be impossible if PET was an attribute of FRIEND! Whenever the pet entity type is involved in other relationships or not should help you in deciding which representation to choose.
- Weak entities types can sometimes be replaced by complex (composite, multi-valued) attributes, unless they are involved in other relationships.
- Owner can itself be weak!
- The degree of the identifying relationship can be more than 2 (cf. e.g., StackOverflow).
Another example of weak entity whose owner is weak as well could be:
The idea being that the Health care provider cares about an insure only if they are covered by them, and that they care about the doula only if they are currently helping one of their insure.
Alternative Notations
Multiple notations have been used to represent the ratio and constraint on relationship.
In the following, we introduce two of them: the Min/Max and the Crow’s foot notations.
Notation with Explicit Maximal (Min/Max Notation)
The two constraints can be written on the same side, and the N, M, P ratio can be replaced by actual number, providing more information.
For instance,
could be drawn as
meaning that
- A car can be used to carpool between 1 and 5 persons (and that it must be used for at least 1 person),
- A person can be registered for 0, 1, 2 or 3 carpool at the same time.
More generally, we have the following:
Crow’s Foot Notation
Enhanced Entity–Relationship Model
Extended (or Enhanced) ER Models (EER) have additionaly:
- Subtype / Subclass: “every professor is an employee”. There is a class / subclass relationship (you can proceed by specialization or generalization).
- Category (to represent UNION): an OWNER entity that can be either a PERSON, a BANK, or a COMPANY entity type.
Closer to object-oriented programming.
Reverse Engineering
It is possible to go from relational models to ER models, and sometimes needed: if you are given an implementation that seems poorly design, this can be a way of “backing up” and thinking about the (sometimes implicit) choices that were made during the implementation, to eventually correct them.
For instance, consider the code we studied in “A First Example”:
CREATE TABLE STORM ( NAME VARCHAR(25) PRIMARY KEY, Kind ENUM ("Tropical Storm", "Hurricane"), WindSpeed INT, Creation DATE ); -- We can change the enumerated datatype: ALTER TABLE STORM MODIFY Kind ENUM ("Tropical Storm", "Hurricane", "Typhoon"); CREATE TABLE STATE ( NAME VARCHAR(25) UNIQUE, Postal_abbr CHAR(2) PRIMARY KEY, Affected_by VARCHAR(25), FOREIGN KEY (Affected_by) REFERENCES STORM (NAME) ON DELETE SET NULL ON UPDATE CASCADE );
It corresponds to the following relational model:
which in turn corresponds to the following ER diagram:
Looking at this diagram made it obvious that our code has a flaw: a stom can affect more than one state! Turning the 1 on the left-hand side of the “AFFECTS” relationship into a M is immediate on the diagram, but, of course, mapping it back to a relational model, and then implementing it correctly, will require more work. In any case, if you had not noted already this flaw, reverse-engineering this code highlighted it quite clearly.
If we look back at Problem 3.5 (Revisiting the PROF table), we had already made a first step, since we converted the code into the following relational model:
Going a bit further, we could extrapolate just a little bit and get the following ER diagram:
As we noted in our solution to the second question, this model has several limitations. To list a few, this representation can not handle the following situations:
- If multiple instructors teach the same class,
- If the lecture is taught more than once a year (either because it is taught in the Fall, Spring and Summer, or if multiple sections are offered at the same time),
- If a Lecture is cross-listed, then some duplication of information will be needed.
Looking at it as an ER diagram should help you in understanding why we have those flaws, and how they could be addressed, and “testing” the model should be made easier in its ER form than as SQL code.
ER-to-Relational Models Mapping
Intro
We have to map all of the following:
- Entity: Strong, Weak
- Attributes: Composite, Key, Atomic, Multi-valued, Partial Key
- Relationships: Binary (1 : 1, N : 1, 1 : N, N : M), n-ary
Using four tools: Relations, Attributes, Primary Keys, Foreign Keys.
Algorithm
We will use three techniques to represent some of the relationships, the foreign key approach, the merged relations approach and the cross-reference approach. They are detailed and illustrated after the algorithm, which goes as follows:
# | is mapped to | |
---|---|---|
1 | Strong Entity | Relation with all the simple attributes. Decompose complex (composite) attributes. Pick a key to be the PK, if it is composite, take its elements. |
2 | Weak Entity | Relation with all the simple attributes. Decompose complex attributes. Add as a foreign key the primary key of the relation corresponding to the owner entity type, and make it a primary key, in addition to the partial key of the weak entity. If the owner entity type is itself weak, start with it. |
3 | Binary 1 : 1 Relationship Types | Foreign Key, Merge Relations or Cross-Reference approach |
4 | Binary 1 : N Relationship Types | Foreign Key or Cross-Reference approach |
5 | Binary M : N Relationship Types | Cross-Reference approach |
6 | n-ary Relationship Types | Cross-Reference approach |
7 | Multi-valued Attributes | Create a new relation, add as a foreign key the primary key of the relation corresponding to the original strong entity type. Make all the attributes be the primary key. |
whose primary key is the foreign key to the relation corresponding to the entity.
- Foreign Key Approach: choose one of the relation (preferably with total participation constraint, or on the N side), add a foreign key and all the attributes of the relationship.
- Merged Relation Approach: If both participation constraints are total, just merge them. Primary key = just pick one (or take both). If we were working on the implementation, we would add a
NOT NULL
constraint on the attribute that is not part of the primary key anymore. - Cross-Reference or Relationship Relation Approach: Create a lookup table with an appropriate number of foreign keys, pick some of them (the one on the N side, both if the ratio is M : N, for n-ary it is a bit more complex, cf. example below) as the primary key.
Every time a relationships have attributes, they are mapped to the resulting relation.
Let us look in more details at some of those steps. For strong entities, using steps 1 and 7, the following:
would give:
And note that if Serial was a complex attribute, we would just “unfold” it, or decompose it, and make all the resulting attributes the primary key of the relation. If one of the attribute was at the same time multi-valued and composite, as follows:
Then we would obtain:
For relationships, things are a bit more complicated. Consider the following:
Since it is a 1 : 1 relationship where one of the side has a partial constraint, we have the choice between two approaches. The foreign key approach would give:
Note that we could also have added the foreign key on the side of ENT.B, referencing the key of ENT.A. But since ENT.A has a total participation constraint, we know that the value of FK will always exist, whereas some entities in ENT.B may not be in relationship with an entity from ENT.A, creating the (nefast) need for NULL
values.
For the same diagram, the cross-reference approach would give:
Similarly, note that, in MAPPING, KeyB, or KeyA and KeyB, would also be valid primary keys, but that it makes more sense to have KeyA being the primary key, since we know that ENT.A has a total participation constraint, but ENT.B does not.
If both participation constraints were total, as follows:
Then we could use the merged relations approach, and get:
We picked KeyA to be the primary key for the same reason as before. Note that merging the two entities into one relation also means that you have eventually to do some work on the relations that were referring to them.
Of course, if ENT.A and ENT.B are the same entity (that is, REL is recursive), we would get:
or
depending on the approach we chose.
Binary 1 : N and binary M : N relationships are dealt with in a similar way, using foreign key or cross-reference approaches. The most difficult part of the mapping is with n-ary relationships: we have to use cross-reference approaches, but determining the primary key is not an easy task. Consider the following.
This developement was actually asked at Stack Exchange.
The arity constraints here can be rephrased as:
- A member can reserve a particular equipment at multiple time slots (the N),
- An equipment can be reserved at a particular time slot by only one member (the 1 on the left),
- A member can reserve only one equipment per time slot (the 1 on the right).
And note that there is no total participation constraint.
To reprent the RESERVES relationship, we need to create a relation with attributes referencing the primary key of MEMBER, the primary key of TIME_SLOT, and the primary key of EQUIPMENT. Making them all the primary key does not represent the fact that the same equipment cannot be booked twice during the same slot, nor that a member can book only one equipment per slot, but allows members to reserve a particular equipment at multiple time slots. To improve this situation, we can either
- take the foreign key to MEMBER and the foreign key to TIME_SLOT to be the primary key of this relation,
- or take the foreign key to EQUIPMENT and the foreign key to TIME_SLOT to be the primary key of this relation.
Both solutions enforce only some of the requirement expressed by the ER diagram.
Outro
ER Model | Relational Model |
---|---|
Entity type | Entity relation |
1 : 1 or 1 : N relationship type | Foreign key (or relationship relation) |
M : N relationship type | Relationship relation and two foreign keys |
n-ary relationship type | Relationship relation and n foreign keys |
Simple attribute | Attribute |
Composite attribute | Set of simple component attributes |
Multivalued attribute | Relation and foreign key |
Value set | Domain |
Key attribute | Primary key |
You can have a look at e.g. Holowczak to get a slightly different explanation of this conversion, and additional pointers.
Guidelines and Normal Form
What makes a good database? At the logical (conceptual) and physical (implementation) levels.
Goals:
- Information preservation (and avoid loss of information)
- Minimum redundancy
- Make queries easy (avoid redundant work, make
SELECT
and select-project-join easy)
For ER diagrams, some of the usual techniques are:
Cf. for instance Stanford Infolab.
- Limit the use of weak entity sets.
- Do not use an entity when an attribute will do.
General Rules
Semantics
1 relation corresponds to 1 entity or 1 relationship type
No Anomalies
- Insertion Anomalies
- Having to invent values or to put NULL to insert tuples, especially on a key attribute!
- Deletion Anomalies
- Loosing information inadvertently
- Modification Anomalies
- Updates have to be consistent.
(Bad!) Example:
---------- (Login, Name, AdvisoryName, AdvisorOffice, Major, MajorHead)
-----------(Office, PhoneNumber, Building)
- Advisor without student
- Delete last student of advisor
- Advisor change name.
NULL
Should Be Rare
NULL
has 3 meanings, wastes space, and makes join / nested projections harder.
Example:
STUDENT(Login, …, siblingEnrolled)
Transform into “Emergency Contact in University” relation (bonus: allow multiple contacts).
Identical Attributes in Different Tables Should Be (Primary, Forgein) Key Pairs
Example with advisorOffice and Office: if we try to write a join to obtain the phone number of a student’s advisor, we will obtain all the phone.
Example
MARKER(Owner, Color, OwnerOffice, Brand, BrandEmail)
TEACHER(Office, Name, Phone)
Corrected to:
MARKER(Owner, Color, B͟r͟a͟n͟d͟)
TEACHER(Office, N͟a͟m͟e͟, Phone)
BRAND(N͟a͟m͟e͟, Email)
Functional Dependencies
Functional dependencies (FD) is a formal tool used to assess how “good” a database is, a property of the relation schema. Functional dependencies list the constraints between two sets of attributes from the database. For instance, if X and Y are (sets of) attributes, X → Y reads “X fixes Y”, and implies that the value(s) of Y is fixed by the value(s) of X.
Using Semantics of Attributes
“What should be.”
Let us list all the attributes of our previous example:
MARKER.Owner, MARKER.Color, MAKER.Brand, TEACHER.Office, TEACHER.Name,
TEACHER.Phone, BRAND.Name, BRAND.Email
Think about their dependencies, and list them:
TEACHER.Name → TEACHER.Office
BRAND.Name → BRAND.Email
TEACHER.Office → TEACHER.Name
TEACHER.Office → TEACHER.Phone
MAKER.Owner and MARKER.Color → MARKER.Brand
?
Using Relation States
“What is.”, can disprove some of the assumptions made previously, but should not add new dependencies based on it (they may be by chance!).
- Maybe
TEACHER.Office → TEACHER.Name
does not hold, because teachers share offices? - Maybe
TEACHER.Name → MARKER.Brand
andMARKER.Color
seemed to be enforced by the state, but we should not add a functional dependency based on that: there are no “requirement” that a Teacher must always buy the same brand and color, this could simply true be by chance so far and should not be imposed to the teachers.
A particular state cannot enforce a FD, but it can negate one.
Example:
Att. 1 | Att. 2 | Att. 3 |
---|---|---|
Bob | 15 | Boston |
Bob | 13 | Boston |
Jane | 12 | Augusta |
Emily | 12 | Augusta |
May hold | Will not hold |
---|---|
Att. 2 → Att. 3 | Att1 → Att2 |
Att. 3 → Att. 2 | Att. 3 → Att. 2 |
Att. 1 → Att. 3 | Att. 2 → Att. 1 |
{Att. 1, Att. 2} → Att. 3 | {Att. 3, Att. 2} → Att. 1 |
Notations
Or, more conveniently:
If an attribute is a foreign key to another, we will draw an arrow between relations:
Note that:
- X and Y are sets, we will write A instead of {A}, but keep writing {A, B} for {A, B}.
- {A1, …, An} → {B1, …, Bm} means that A1 and … and An fix B1, and that A1 and … and An fix Bn, etc.
- FD1, FD2, …, FDn for the list of functional dependencies, F for all of them.
- A → B does not imply nor refute B → A.
- We will not write the FD that are implied by (this variation of) Armstrong’s axioms:
- Reflexivity: If Y is a subset of X, then X → Y
- Augmentation: If X → Y, then {X, Z} → Y
- Transitivity: If X → Y and Y → Z, then X → Z
We will assume that the consequence of those axioms always hold (“closure under those rules”), but will generaly not write them explicitely, since they do not carry any new or additional information.
Definitions
Remember superkey (not minimal key), key, candidate key, secondary key? We now have a formal definition.
In one particular relation R(A1, …, An),
- If {A1, …, An} → Y for all attribute Y, then {A1, …, An} is a superkey.
- If {A1, …, An}/Ai is not a superkey anymore for all Ai, then {A1, …, An} is a key.
- We will often discard candidate keys and focus on one primary key.
- If Ai is a member of some candidate key of R, it is a prime attribute of R. It is a non-prime attribute otherwise.
Given a FD {A1, …, An} → Y,
- It is a full functional dependency if for all Ai, {A1, …, An}/Ai → Y, does not hold.
- It is a partial dependency otherwise.
A FD : X → Y is a transivive dependency if there exist a set of attribute B s.t.
- B ≠ X, B ≠ Y
- B is not a candidate key,
- B is not a subset of any candidate key,
- X → B and B → Y hold
Normal Forms and Keys
First, Second, Third, Fourth, Fifth normal form (“X”NF). Stronger than the Third, there is the Boyce-Codd NF (BCNF)
If you satisfy N, you satisfy N − 1, N − 2, etc. The normal form of a relation is the highest normal form condition that it meets.
Fist Normal Form
Definition
The domain of all attributes must be atomic (simple, indivisible): exclude multi-valued and composite attributes.
Sometimes, additional requirement that every relation has a primary key. We will take this requirement to be part of the definition of 1NF, but some authors take a relation to be in 1NF if it has at least candidate keys (i.e., multiple possible keys, but no primary key, which makes their definition more general, cf. (Elmasri and Navathe 2015, 14.4.1)). Hence, we will always assume that a primary key is given, and it will be underlined.
Normalization
To be written
Second Normal Form
Definition
1NF + Every non-prime attribute is fully functionnaly dependent on the primary key.
Normalization
For each attribute A of the relation whose primary key is A1, …, An:
- Is it prime (i.e., is A ∈ {A1, …, An})?
- Yes → Done.
- No → Is it partially dependent on the primary key ?
- No, it is fully dependent on the primary key → Done
- Yes, it depends only of {A′1, …, A′k} → Do the following:
- Create a new relation with A and {A′1, …, A′k}, make {A′1, …, A′k} the primary key, and “import” all the functional dependencies,
- Remove A from the original relation, and all the functional dependencies that implied it,
- Add a foreign key from {A′1, …, A′k} to their original counterparts in the original relation.
becomes
Refinement: note that if more than one attribute depends of the same subset {A′1, …, A′k}, we will create two relations: that is useless, we could have created just one. For instance, considering
applying the algorithm would give
whereas a more subtle algorithm would give
Note that in both cases, all the relations are in Second Normal Form, though.
Note also that, sometimes, removing the “original” relation may be preferable: cf. an example in Problem 4.27 (COFFEE relation: primary key and normal form).
Note also that if our primary key is a singleton, then there is nothing to do, we are in 2NF as soon as we are in 1NF.
Third Normal Form
Definition
2NF + no non-prime attribute is transitively dependent on the primary key.
Normalization
For each attribute A of the relation whose primary key is A1, …, An:
- Is it prime (i.e., is A ∈ {A1, …, An})?
- Yes → Done.
- No → Is it transitively dependent on the primary key ?
- No, there is no {A′1, …, A′k} such that {A1, …, An} → {A′1, …, A′k} → A and {A′1, …, A′k} ⊈ {A1, …, An} and A ∉ {A′1, …, A′k} → Done
- Yes, there is such a {A′1, …, A′m} → Do the following:
- Create a new relation with A and {A′1, …, A′k}, make {A′1, …, A′k} the primary key, and import all the functional dependencies,
- Remove A from the original relation, as well as all the functional dependencies involving it,
- Add a foreign key from {A′1, …, A′k} to their original counterparts in the original relation.
Examples
We can have a look at another example:
Note that {State, Driver_Licence_Num}, would be a valid primary key for this relation, and that adding it would make it a relation in 1NF.
As we can see, the name “Driver” is somehow counter-intuitive, since the relation also carries information about Governors. This relation is actually not in 2NF, because the FD {State, Driver_Licence_Num} → Governor is not fully functional. A possible way to fix it is to get:
As you can see, the 2NF helped us in separating properly the entities.
An example of a relation that is in 2NF but not in 3NF could be:
As we can see, all the non-prime attributes are fully functionally dependent from Login, which is our primary key. But, obviously, one of this dependecy is transitive, and breaks the 3NF. A way to fix it is:
As we can see, 3NF also helped us in separating properly the entities, in a slightly different way.
In conclusion, we can observe that every FD X → Y s.t. X is a proper subset of the primary key, or a non-prime attribute, is problematic. 2NF is a guarantee that every entity has its own relation, 3NF is a way to avoid data inconsistency.
Unified Modeling Language Diagrams
Overview
One approach for analysis, design, implementation and deployment of databases and their applications. Databases interact with multiple softwares and users, we need a common language.
Unified Modeling Language is a standard:
- Generic
- Language-independent
- Platform-independent
Wide, powerful, but also intimidating.
You know UML from object-oriented programming language:
That is an example of a class diagram (with class name, attributes and operators, as well as a particular way to represent that a class extends another) , there are other types of diagrams, they are not unrelated! For instance, using communication diagrams, deployment diagrams, and state chart diagrams, you can collect the requirements needed to draw a class diagram! They each offer a viewpoint on a software that will help you in making sure the various pieces will fit together: it is a tool commonly used in software engineering, and useful in database design.
Types of Diagrams
There are 14 different types of diagrams, divided between two categories: structural and behavioral.
(Source: Wikimedia)
Structural UML Diagrams
They describe structural, or static, relationships between objects, softwares.
- Class diagram describes static structures: classes, interfaces, collaborations, dependencies, generalizations, etc. We can represent conceptual data base schema with them!
- Object diagram, a.k.a. instance diagram, represents the static view of a system at a particular time. You can think of a “freeze” of a program, to be able to observe the value of the variables and the objects (or instances) created.
- Component diagram describes the organization and the dependencies among software components (e.g., executables, files, libraries, etc.), to describe how an arbitrary large software system is split into pieces.
- Deployment diagram is the description of the physical deployment of artifacts (i.e., software components) on nodes (i.e., hardware). If your program runs on a local computer, fetching data from the Internet, and storing output on a server, you may describe this situation using this sort of diagram.
In this category also exist Composite structure diagram, Package diagram and Profile diagram.
Behavioral UML diagrams
They describe the behavioral, or dynamic, relationship, between components.
- Use case diagram describes the interaction between the user and the system. Supposedly, it is the privileged tool to communicate with end-users.
- State machine diagram, a.k.a., state chart diagram, describes how a system react to external events. You can picture yourself a complex form of finite state automata diagram.
- Activity diagram is a flow of control between activities. You may have seen them already, they are supposedly easy to follow:
Then there is the sub-category of “Interaction diagrams”:
- Sequence diagram describes the interactions between objects over time, the flow of information or messages between objects. It is helpful to grasp the time ordering of the interactions.
- Communication diagram, a.k.a., collaboration diagram, describes the interactions between objects as a serie of sequenced messages. It is helpful to grasp the structure of the objects, who is interacting with who.
This sub-category also comprise Timing diagram and Interaction overview diagram.
Zoom on Classes Diagrams
Looking at the “COMPANY conceptual schema in UML class diagram notation”, and comparing it with the “ER schema diagram for the COMPANY database” from the textbook, can help you in writing your own “Rosetta Stone” between ER and UML diagram. Let us introduce some UML terminology for the class diagrams.
UML | ER |
---|---|
Class | Entity Type |
Class Name | Entity Name |
Attributes | Attributes |
Operations (or Method) | Sometimes Derived Attributes |
Association | Relationship Type |
Link | Relationship Instance |
Multiplicities | Structural Constraint |
As well as for ER diagram, the domain (or data type) of the attributes is optional. A composite attribute in a ER diagram can be interpreted as a structured domain in a UML diagram (think of a struct), and a multi-valued attribute requires to create a new class.
Associations are, to some extend, more expressive than relationship types:
- As for relationship types, they can be recursive (or reflexive), and uses role names to clarify the roles of both parties.
- As for relationship types they can have attributes: actually, a whole class can be connected to an association.
- As for relationship types, they can express a cardinality constraint on the relation between classes. They are written as
min
..max
, with*
for “no maximum”, and the following shorthands:*
stands for0
..*
and1
stands for1
..1
. An association with1
on one side and*
on the other (resp.1
and1
,*
and1
,*
and*
) is sometimes called “one-to-many” (resp., “one-to-one”, “many-to-one”, “many-to-many”). The notation in partially inverted w.r.t. ER diagrams:
Additionally, associations can be “extended”, and they are not the only kind of relationship that can be expressed between two classes.
- As opposed to the relationship types, they can be given a direction, indicating that the user should be able to navigate them only in one direction, or in two (which is the default). This is used for security or privacy purposes.
- As opposed to the relationship types, they can be qualified, implying that a class is not connected to the other class as a whole, but to one particular attribute, called the qualifier, or discriminator.
- As opposed to the relationship types, they are part of a bigger collection of relationships. Other relationships include:
- Aggregation, a.k.a. “is part of” relationship, between a whole object and its component (that have their own existence).
- Composition, which is the particular case of aggregation where the component does not have an existence of their own.
- Generalization, a.k.a. inheritance, that eliminates redundancy and makes a class a specialization of another one.
- A full list can be consulted, e.g., at IBM.
Qualified associations can be used for weak entities, but not only.
Some of those subtleties depend on your need, and are subjective, but are important tool to design properly a database, and relieving the programmer from the burden of figuring out many details.
Exercises
- Exercise 4.1
Name the three high-level models we will be learning about in this class (expand the acronyms).
- Exercise 4.2
What could be the decomposition of an attribute used to store an email address? When could that be useful?
- Exercise 4.3
Draw the ER diagram for a “COMPUTER” entity that has one multivalued attribute “Operating_System”, a composite attribute “Devices” (decomposed into “Keyboard” and “Mouse”) and an “ID” key attribute.
- Exercise 4.4
Draw the ER diagram for a “CELLPHONE” entity that has a composite attribute “Plan” (decomposed into “Carrier” and “Price”), an “MIN” (Mobile Identification Number) key attribute, and a multi-valued “App_Installed” attribute.
- Exercise 4.5
Name one difference between a primary key in the relational model and a key attribute in the ER model.
- Exercise 4.6
What is a derived attribute? Give two examples and justify them.
- Exercise 4.7
Invent an entity type with at least one composite attribute and one atomic attribute, but no multi-valued attributes. Identify a possible key attribute and draw the entity type you obtained using the conventions we used in class.
- Exercise 4.8
What is the degree of a relationship type?
- Exercise 4.9
What is a self-referencing, or recursive, relationship type? Give two examples.
- Exercise 4.10
What does it mean for a binary relationship type “Owner” between entity types “Person” and “Computer” to have a cardinality ratio of M : N?
- Exercise 4.11
What are the two possible structural constraints on a relationship type?
- Exercise 4.12
Draw a diagram to represent a relationship type R between two entities types A and B such that:
- An entity in A may or may not be in relationship R with an entity in B.
- An entity in B must be in relationship R with an entity in A.
- An entity in A can be in relationship R with at most one entity in B.
- An entity in B can be in relationship R with any number of entities in A.
- Exercise 4.13
Express the constraints represented in the following diagram in plain English.
- Exercise 4.14
- What does it mean for a binary relationship type “is the Chair of” between entity types “Professor” and “Department” to have a cardinality ratio of 1:N? Would it make sense to be have a total participation constraint on one side, and if yes, on which side?
- Exercise 4.15
Express the constraints represented in the following diagram in plain English.
- Exercise 4.16
For the following binary relationships, suggest cardinality ratios based on the common-sense meaning of the entity types.
Entity 1 Cardinality Ratio Entity 2 STUDENT : MAJOR CAR : TAG INSTRUCTOR : LECTURE INSTRUCTOR : OFFICE COMPUTER : OPERATING_SYSTEM - Exercise 4.17
Give an example of a binary relationship type of cardinality 1 : N.
- Exercise 4.18
Give an example of a binary relationship type of cardinality N : 1 and draw the corresponding diagram (you do not have to include details on the participating entity types).
- Exercise 4.19
Draw an ER diagram with a single entity type, with two stored attributes, and one derived attribute. In your answer, it should be clear that the value for the derived attribute can always be obtained from the value(s) of the other attribute(s).
- Exercise 4.20
Draw an ER diagram expressing the total participation of an entity type “BURGER” in a binary relation “CONTAINS” with an entity type “INGREDIENT”. What would be the cardinality ratio of such a relation?
- Exercise 4.21
Under what condition(s) can an attribute of a binary relationship type be migrated to become an attribute of one of the participating entity types?
- Exercise 4.22
Suppose a “PRODUCES” relationship with an attribute “Amount” exists between a “PRODUCER” entity type and a “MOVIE” entity type, with ratio 1 : M. Migrate the “Amount” attribute to one of the entity types and draw the resulting diagram.
- Exercise 4.23
Suppose a “MEMBERSHIP relationship with an attribute”Level" (e.g., “silver”, “platinium”, etc.) exists between a “PERSON” entity type and a “CLUB” entity type, with ratio M : 1. Migrate the “Level” attribute to one of the entity types and draw the resulting diagram.
- Exercise 4.24
Assume with have three entity types, “Lecture Notes”, “Class” and “Professor.”
- Draw a diagram of a ternary relationship between the three entities.
- Draw a diagram that has two binary relationships from one of the three entities to the other two entities.
- Come up with a question that could be answered using one model but not the other from the previous steps (specify which relationship would be able to answer your question).
You can specify role names in your diagrams for added clarity, and remember to list all the constraints.
- Exercise 4.25
Can we always replace a ternary relationship with three binary relationships? Give an example.
- Exercise 4.26
What is the difference between an entity type and a weak entity type?
- Exercise 4.27
What is a partial key?
- Exercise 4.28
Why do weak entity type have a total participation constraint?
- Exercise 4.29
Invent a weak entity type, its identifying (owner) entity type and the identifiying (or supporting) relationship. Both entities should have (partial) key, and each should have at least one composite attribute.
- Exercise 4.30
Convert the following ER diagram into a relational model:
- Exercise 4.31
What is insertion anomaly? Give an example.
- Exercise 4.32
What is deletion anomaly? Is it a desirable feature?
- Exercise 4.33
Why should we avoid attributes whose value will often be
NULL
? Can the usage ofNULL
be completely avoided?- Exercise 4.34
Consider the following relation:
PROF(S͟S͟N͟, Name, Department, Bike_brand)
Why is it a poor design to have a “Bike_brand” attribute in such a relation? How should we store this information?
- Exercise 4.35
Consider the following relation:
STUDENT(S͟S͟N͟, Name, …, Sibling_On_Campus)
Why is it a poor design to have a “Sibling_On_Campus” attribute in such a relation? How should we store this information?
- Exercise 4.36
Consider the following relational database schema:
STUDENT(L͟o͟g͟i͟n͟, Name, …, Major, Major_Head)
DEPARTMENT(C͟o͟d͟e͟, Name, Major_Head)Assuming that “Major” is a foreign key referencing “DEPARTMENT.Code”, what is the problem with that schema? How could you address it?
- Exercise 4.37
Why can we not infer a functional dependency automatically from a particular relation state?
- Exercise 4.38
Consider the relation R(A, B, C, D, E, F) and the following functional dependencies:
- F → {D, C}, D → {B, E}, {B, E} → A
- {A, B} → {C, D}, {B, E} → F
- A → {C, D}, E → F, D → B
For each set of functional dependency, give a key for R. We want a key, so it has to be minimal.
- Exercise 4.39
Consider the relation R(A, B, C, D, E, F) and the following functional dependencies:
A → {D, E}, D → {B, F}, {B, E} → A, {A, C} → {B, D, F}, A → F
Answer the following:
- How many candidate keys is there? List them.
- How many transitive dependencies can you find? Give them and justify them.
- Exercise 4.40
What is a composite attribute in a ER diagram? Can a relational schema with composite attribute be in Second Normal Form?
- Exercise 4.41
Consider the relation R(A, B, C, D) and answer the following:
- If {A, B} is the only key, is {A, B} → {C, D}, {B, C} → D a 2NF? List the nonprime attributes and justify.
- If {A, B, C} is the only key, is A → {B, D}, {A, B, C} → D a 2NF? List the nonprime attributes and justify.
- Exercise 4.42
Consider the relation R(A, B, C, D, E, F) with candidate keys {A, B} and C. Remember that, in all generality, to be a prime attribute, you just need to be part of a possible candidate key. Answer the following:
- What are the prime attributes in R?
- Is {C, D} → E a fully functional dependency?
- Write a set of functional dependencies containing at least one transitive depency, and justify your answer.
- Exercise 4.43
Consider the relation R(A, B, C, D, E) and the following functional dependencies:
- C → D, {C, B} → A, A → {B, C, D}, B → E
- A → {C, D}, C → B, D → E, {E, C} → A
- {A, B} → D, D → {B, C}, E → C
For each one, give one candidate key for R.
- Exercise 4.44
Consider the relation R(A, B, C, D, E) and answer the following:
- If {A, B} is the primary key, is B → E, C → D a 2NF? List the nonprime attributes and justify.
- If {A} is the primary key, is B → C, B → D a 2NF? List the nonprime attributes and justify.
- Exercise 4.45
Consider the relation R(A, B, C, D, E, F), and let {B, D} be the primary key, and have additionnaly the functional dependencies {A, D} → E, C → F. This relation is not in 3NF, can you tell why?
- Exercise 4.46
Consider the relation R(A, B, C, D) and answer the following:
- If A is the only key, is A → {B, C, D}, {A, B} → C, {B, C} → D a 3NF? List the nonprime attributes and justify.
- If B is the only key, is B → {A, C, D}, A → {C, D}, {A, C} → D a 3NF? List the nonprime attributes and justify.
- Exercise 4.47
Consider the relation R(A, B, C, D, E) and the functional dependencies {A, B} → C, B → D, C → E. Answer the following:
- A by itself is not a primary key, but what is the only key that contains A?
- List the non-prime attributes.
- This relation is not in 2NF: what transformation can you operate to obtain a 2NF?
- One of the relation you obtained at the previous step is likely not to be in 3NF. Can you normalize it? If yes, how?
- Exercise 4.48
What are the two different categories of UML diagram?
- Exercise 4.49
Can a
C++
developer working on Linux and aJava
developer working on MacOS use the same class diagram as a basis to write their programs? Justify your answer.- Exercise 4.50
What kind of diagram should we use if we want to …
- describe the functional behavior of the system as seen by the user?
- capture the flow of messages in a software?
- represent the workflow of actions of an user?
- Exercise 4.51
Name two reasons why one would want to use a UML class diagram over an ER diagram to represent a conceptual schema.
- Exercise 4.52
Consider the following diagram:
Give the number of attributes for both classes, and suggest two operations for the class that does not have any. Discuss the multiplicities: why did the designer picked those values?
- Exercise 4.53
Convert the following ER diagram to a UML class diagram.
- Exercise 4.54
Briefly explain the difference between an aggregation and a composition association.
- Exercise 4.55
How is generalization (or inheritance) represented in a UML class diagram? Why is such a concept useful?
- Exercise 4.56
Convert the following ER diagram into a UML class diagram:
Solution to Exercises
- Solution 4.1
The three high-level models we will be learning about are the Unified Modeling Language, Entity Relationship, and Enhanced Entity–Relationship models.
- Solution 4.2
A useful decomposition of an email address attribute could be: the username part before the @ sign, and the domain part afterwards. It might be useful to have statistics about the domains of the users or to sort the usernames by length, etc.
- Solution 4.3
- Solution 4.4
- Solution 4.5
There can be more than one key in the ER model, but it has to be made of a single attribute, whereas a primary key can be made of multiple attributes.
- Solution 4.6
A derived attribute is an attribute whose value can be determined by the value of other attributes. For instance:
- The value of an “Age” attribute could be determined from the value of an “Date of birth” attribute and the current day.
- The value of a “State” attribute can be determined from the value of a “Zip code” attribute.
- The value of a “Body Mass Index” attribute could be calculated from the values of height and weight attributes.
- The value of an “Initials” attribute could be determined using the values of the “First Name”, “Middle Name”, and “Last Name” attributes.- Solution 4.7
- Solution 4.8
The degree of a realationship type is the number of its participating entity types.
- Solution 4.9
A self-referencing relationship type is where the same entity type participates more than once. On a SEATS entity type, it would be an attribute like “is to the left of” or on a PERSONS entity type, it would be and attribute like “is married to”.
- Solution 4.10
The cardinality ratio on the binary relationship type “Owner” between the entity types “Person” and “Computer” means that a person can own multiple computers, and a computer can have multiple owners.
- Solution 4.11
The two possible structural constraints on a relationship type are the cardinality ratio and participation constraints.
- Solution 4.12
We would obtain the following diagram:
- Solution 4.13
A key opens only one door, and every key must open at least one door. A door can be opened by multiple keys, and some doors may not be opened by any key (think of doors that do not have a lock).
- Solution 4.14
The binary relation type “is the Chair of” with a cardinality ratio of 1:N between entity types “Professor” and “Department” means that a department can have at most one professor as its chair, but that a professor can be the chair of multiple departments. It could make sense to require that every department has a chair, hence writing a double line between the Department entity and the “is the Chair of” relationship, but it would not make sense to have a total participation constraint on the side of the professor (which would mean that every professor has to be the chair of a department).
- Solution 4.15
An operating system may be supported by many computers, but it is also possible that no computer supports it (think of an operating system in development, or developed for embeded devices). A computer must support at least one operating system and can support multiple operating systems.
- Solution 4.16
Entity 1 Cardinality Ratio Entity 2 Explanation STUDENT N : 1 MAJOR “A student has one major, but multiple students can have the same major” CAR 1 : 1 TAG “A car has exactly one tag, a tag belongs to one particular car.” INSTRUCTOR 1 : N LECTURE “An instructor can teach multiple lecture, but a lecture is taught by only one person.” INSTRUCTOR 1 : N OFFICE “An instructor can have multiple office, but an office belongs to only one instructor” COMPUTER M : N OPERATING_SYSTEM “A computer can have multiple operating system, the same operating system can be installed on more than one computer.” Some of these choices are debatable (typically, almost any combination seems reasonable for the INSTRUCTOR : OFFICE relation).
- Solution 4.17
A binary of relationship of SUPERVISOR as a recursive relationship on EMPLOYEE.
- Solution 4.18
- Solution 4.19
- Solution 4.20
- Solution 4.21
An attribute of a binary relationship type can be migrated to one of the participating entity types when the cardinality ratio is 1 : N, 1 : 1, or N : 1. It can be migrated “to the N side” or, if there is no N side, to either side. Note that for n-ary relationships, at least one ratio needs to be 1 for the attribute to be allowed to migrate (and “to the N side”, or, if there is no N side, to any side).
- Solution 4.22
We could have the following:
- Solution 4.23
We could have the following:
- Solution 4.24
A possible example of ternary relationship is:
One example of two binary relationships could be:
A question like
“Who wrote the lecture notes X?”
could be answered with the binary relationships but not the ternary. Conversely, a question like
“What are the lecture notes refered to by Prof. X in their class Y?”
could not be answered using the binary relationships (since we do not know what classes are taught by Prof. X).
- Solution 4.25
No, a ternary relationship cannot always be replaced by three binary relationship. For instance, if I have a “Travelling to” relationship between a “Person”, a “City” and a “Transport mode”, to represent the fact that a person is travelling to a city using a particular mode of transportation, there is no way I can convey the same information using binary relationships.
- Solution 4.26
The weak entity type does not have a key attribute, it cannot be distinguised from the other weak entities based on a single attribute, for that we also need to know its relationship to some other entity type.
- Solution 4.27
For a weak entity attribute, it is the attribute that can uniquely identify weak entites that are related to the same owner entity.
- Solution 4.28
Otherwise, we could not identify entities in it without owner entity.
- Solution 4.29
A possible solution is:
Note that the two composite attributes are “generic”, in the sense that you can re-use those examples easily.
- Solution 4.30
A possible option is:
Note that “Stays_At” could also be a separate relation, with two attributes, “Address” and “Person”, linked to respectively PLACE.Address and PERSON.SSN, and both being the primary key of the relation.
- Solution 4.31
When you have to invent a primary key or add a lot of NULL value to be able to add a tuple. I want to add a room in my DB, but the only place where rooms are listed are as an attribute on a Instructor table, so I have to “fake” an instructor to add a room.
- Solution 4.32
A delete anomaly exists when certain attributes are lost because of the deletion of other attributes. It is not desirable, since it can lead to the loss of information.
- Solution 4.33
Because they waste space, they are ambiguous (N/A, or unknown, or not communicated?), and they make querries harder. No, it is necessary sometimes.
- Solution 4.34
Because it will be
NULL
most of the time. In a separate relation, e.g. a “BIKE” relation, with two attributes, “Owner” and “Brand”, “Owner” being a foreign key referencing the SSN attribute of PROF.- Solution 4.35
Because it will be
NULL
most of the time, and because students could have more than one sibling on campus. In a separate relation, e.g. in a “EMERGENCY_CONTACT” relation, with two attributes, “Student” (refercing the SSN attribute of STUDENT), and “Contact”. If the emergency contacts are not related to the student, or if we want to preserve the fact that one student is a sibling to another, we can create another relation to store that information.- Solution 4.36
Major_Head will give update anomalies. By putting the Head of the department in the DEPARTMENT relation only, i.e., removing it from STUDENT.
- Solution 4.37
Just because a coincidence exists (i.e., “in my data set, no android user is color-blind”) does not mean that it will always be true (i.e., “no color-blind person will ever use android”). Functional dependencies should come from a principled reasoning about the attributes, and not from the observation of the data.
- Solution 4.38
- F
- {A, B, E}
- {A, E}
- Solution 4.39
- Only one: {A, C},
- A → F by A → D, D → F.
- Solution 4.40
A composite attribute is an attribute made of multiple attributes, like an “Address” attribute could be composed of the “sub”-attributes “Street”, “City”, “Zip” and "State. A relational schema needs a primary key and to have only atomic domains to be in first normal form, so, no, a relational schema with composite attributes can not be in second normal form.
- Solution 4.41
- Yes. C and D are non prime, and they fully depend on {A, B}.
- No. D is the only non prime, and it depends only on A.
- Solution 4.42
- A, B and C.
- No, because we can remove D,
- A → D, D → E and A → E
- Solution 4.43
- {B, C}, A
- A, {C, E},
- {A, D, E}, {A, B, E}
- Solution 4.44
- No. C, D, E, and E has a partial relation to B
- Yes. Since the primary key is a singleton, it is obvious.
- Solution 4.45
{B, D} → C → F breaks the 3NF.
- Solution 4.46
- No. B, C and D are non prime, A → {B, C} → D breaks the 3NF.
- No. A, B and D are non prime, B → {A, C} → D breaks the 3NF.
- Solution 4.47
- {A, B},
- C, D, E,
- R1(A, B, C, E) and R2(B, D)
- R1(A, B, C), R2(C, E) and R3(B, D)
- Solution 4.48
The two different categories of UML diagram are behaviour and structure.
- Solution 4.49
Yes, UML diagram is language-independent and platform-independent.
- Solution 4.50
- Use-case
- Sequence diagram
- Activity diagram
- Solution 4.51
To use direction for association, to have a common language with someone less knowledgeable of other diagrammatic notations. For the concept of integration.
- Solution 4.52
Flight
has 5 attributes,Plane
has 4. ThePlane
class could have the operationsgetLastFlightNumber() : Integer
andsetMaximumSpeed(MPH) : void
.For the multiplicities: A flight could not have a plane assigned, and a plane could not be assigned to a flight. A plane can be assigned to multiple (or no) flights, but a flight must have at most one plane (and could have none).
- Solution 4.53
The absence of total participation constraint on the left side of the diagram may seem odd: what would be a hand not belonging to a person? Still, we have to accept it: we do not know what the requirements are, or the precise nature of the entities. As far as we know “hand” could refer to a card game, and “person” could refer to players. A straightforward representation of the same diagram as a UML class diagram could be:
Note that we could convey more information, for instance by using aggregation, or even composition, but, without more information about those entities and this relationship, it may be safer not to make any additional supposition.
- Solution 4.54
Aggregation: associated class can have an existence of its own.
Composition association: class does not exist without the association.
- Solution 4.55
Because it avoids redundancy.
- Solution 4.56
Problems
- Problem 4.1 (Design for your professor)
Your professor designed the following relational model, at some point in his career, to help him organize his exams and the students’ exam grades:
Table Name and Attributes Example of Value EXAM(Number, Date, Course) < 1, ‘2018-02-14’, ‘CSCI3410’> PROBLEM(Statement, Points, Length, Exam) < ‘Your professor designed…’, 10, ‘00:10:00’, 1> STUDENT_GRADE(Login, Exam, Grade) < ‘aalyx’, 1, 83> EXAM.Number
,PROBLEM.Statement
,STUDENT_GRADE.Login
andSTUDENT_GRADE.Exam
are all the primary key, andSTUDENT_GRADE.Exam
andPROBLEM.Exam
are foreign keys that both refer toEXAM.Number
.The idea was to have the following design elements:
- The
EXAM
table for storing information about exams. - The
PROBLEM
table for storing each problem as its’ own entry and to associate every problem to an exam. - The
STUDENT_GRADE
table for storing the grade of one student for one particular exam.
Unfortunately, this design turned out to be terrible.
- Describe at least one common and interesting situation where this model would fail to fulfill its purpose.
- Propose a way to correct the particular problem you identified.
- The
- Problem 4.2 (Reading the MOVIES database ER schema)
Consider the ER schema for the MOVIES database (inspired from (Elmasri and Navathe 2010, Figure 7.24)):
Movies Database Example Where the attributes are omitted, and separate entities are created for actors, producers and directors even if they happen to be the same person (to deal with e.g. pseudonyms or different attributes, like agent or address).
Given the constraints shown in the ER schema, respond to the following statements with True or False. Justify each answer.
- There are no actors in this database that have been in no movies.
- There might be actors who have acted in more than ten movies.
- Some actors could have done a lead role in multiple movies.
- A movie can have only a maximum of two lead actors.
- Every director has to have been an actor in a movie.
- No producer has ever been an actor.
- A producer cannot be an actor in some other movie.
- There could be movies with more than a dozen actors.
- Producers can be directors as well.
- A movie can have one director and one producer.
- A movie can have one director and several producers.
- There could be actors who have perfomed a lead role, directed a movie, and produced a movie.
- It is impossible for a director to play in the movie (s)he directed.
- Problem 4.3 (ER diagram for car insurance)
Draw the ER diagram for the following situation:
- A car insurance company wants to have a database of accidents.
- An accident involves cars, drivers, and several aspects: the time and location of where it took place, the amount of damages, and a unique report number.
- A car has a license, a model, a year, and an owner.
- A driver has an ID, an age, a name, and an address.
One of the interesting choices is: should “accident” be an entity type or a relationship type?
- Problem 4.4 (ER diagram for job and offers)
You want to design a database to help you apply for jobs and to compare offers. Every job has a salary range, a title, multiple requirements (like languages known, years of experience, etc.) and was advertised by a company at a particular url. Every company has a physical and numerical address, provides some benefits (assuming they provide the same benefits to all their employees). Sometimes you know one or multiple persons working there, and you want to keep track of their names, role, and (if this is the case) of the job they told you about. Finally, you want to keep track of the offers you received: the job they correspond to, the actual salary offered and the possible starting date.
- Draw the ER diagram for this situation.
- Add attributes for the key attributes if needed.
- Specify the cardinality ratios and participation constraints.
- Problem 4.5 (Incorrect ER diagram)
A company wants to develop a database to keep track of the programmers, projects and programming languages they know of. They are not willing to store guidelines for the sake of it, but believe that if a project requires a particular guideline (like, which IDE to use, what spacing convention they use, etc.), it should be stored somewhere. They want to accommodate the fact that a project can use multiple programming languages (and sometimes even multiple versions of the same language), and keep track of which programmer is leading which project. To ease “match making”, they also want to track which programmer is knowledgeable of what programming language. They would also like to store links to the specifications of programming languages, as well as urls of the projects and their guidelines.
They came up with the following ER diagram:
This diagram, to your expert eyes, has multiple flaws, missing constraints, and has some inconsistencies with their requirements. List as many as you can, and suggest improvments or solution when you can think of one.
- Problem 4.6 (Reverse engineering by hand)
Look at the following relational model and “reverse-engineer” it to obtain an ER diagram:
- Problem 4.7 (Discovering MySQL Workbench)
In this problem, we will install and explore the basic functionalities of MySQL Workbench, which is a cross-platform, open-source, and free graphical interface for database design.
- Install MySQL Workbench if needed. Maybe you already included it in the packages to install when you installed MySQL (cf. the instructions to install MySQL on Windows): try to find if this is the case before trying to install it. Otherwise, use your package manager, or download the binaries from MySQL. The installation should be straightforward for all operating systems.
- Once installed, execute the software. The instructions below were tested for the 6.3.8 version on Debian and 8.0.19 version on Windows. The trouble with GUI-software is that the menus may differ slightly with what you see, but the core tools we will be using should still be there under a similar name, if not the same name.
- Under the panel “MySQL Connections”, you should see your local installation listed as “Local instance 3306.” Click on the top-right corner of that box and then on “Edit Connections.” Alternatively, click on “Database”, then “Manage Connections”, and then on “Local instance 3306.”
- Check that all the parameters are correct. Normally, you only have to change the name of the user to “testuser” and leave the rest as it is. Click on “Test the connection” and enter your password (which should be “password”) when prompted. If you receive a warning about “Incompatible/nonstandard server version or connection protocol detected”, click on “Continue anyway.”
- Now, click on the box “Local instance 3306” and enter your password. A new tab appears in which you can see the list of schemas in the bottom part of the left panel.
- Click on “Database”, then on “Reverse Engineering” (or hit ctrl + r), then click on “next”, enter your password, and click on “next.” You should see the list of the schemas stored in your database. Select one (any one, we are just exploring the functionalities at that point. You can pick, for instance, HW_DB_COFFEE from Problem 3.7 (Read, correct, and write SQL statements for the COFFEE database)), click on “next”, then click on “execute”, “next”, and “close.”
- You’re back on the previous view, but you should now see “EER diagram” on the top of the middle panel. Click on “EER diagram” twice, scroll down if needed, and you should see the EER diagram.
- This diagram is not exaclty an ER diagram and it is not a UML diagram either: it is an EER diagram, that uses Crow’s foot notation. Make sure you understand it.
- Try to modify the EER diagram. Make some relations mandatory, change their name, add an attribute, change the name of another relation, insert a couple of elements in an entity, add a row in a table, etc. Make sure you understand the meaning of the lines between the entities.
- Once you’re done, try to “Forward Engineer” by hitting “Ctrl” + “G.” Click on “next” twice, enter your password, click on “next” once more and you should see the SQL code needed to produce the table you just designed using the graphical tool.
- Problem 4.8 (ER-to-Relation mapping for car insurance)
Apply the ER-to-Relation mapping to your ER diagram from Problem 4.3 (ER diagram for car insurance).
- Problem 4.9 (From ER diagram to Relational model – BIKE)
Consider the following ER diagram:
Using this diagram, answer the following:
Is it true that … Yes No … a customer cannot drop two bikes at the exact same time and date? … two different customers cannot drop two different bikes at the exact same time and date? … an employee cannot repair two bikes at the same time? … a customer can be assigned to more than one employee? … a customer can have a bike repaired by an employee that is not assigned to him/her? … a bike can be in the database without having been dropped by a customer? … an employee can be asked to repair a bike without having that type of bike as one of their specialties? Convert that ER diagram into a relational model. Try to make as few assumptions as possible.
- Problem 4.10 (From ER diagram to Relational model – RECORD)
Consider the following ER diagram:
Using this diagram, answer the following:
Is it true that … Yes No … a label can have multiple logos? … a recording can be released by multiple labels and at different dates? … a record shop can have multiple exclusivities? … two record shops can have the same address? … two logos can have the same name? … two recordings can have the same title? … a record shop must sell at least one recording? Convert that ER diagram into a relational model. Try to make as few assumptions as possible.
Problem 4.11 (ER-to-Relation mapping for Country)
Consider the following ER schema:
where
- “W_IN” stands for “WRITTEN_IN”, and
- “B_W_F” stands for “BORROWS_WORDS_FROM.”
- “W_IN” stands for “WRITTEN_IN”, and
- “B_W_F” stands for “BORROWS_WORDS_FROM”.
For this relationship, on the left-hand side is the language that borrows a word and on the right-hand side is the language that provides the loanword.
Map that ER diagram to a relational database schema.
- Problem 4.12 (From business statements to ER diagram – UNIVERSITY)
Consider the following requirements for a UNIVERSITY database used to keep track of students’ transcripts.
- The university keeps track of each student’s name, student number, class (freshman, sophomore, …, graduate), major department, minor department (if any), and degree program (BA, BS, …, PhD). Student number has unique values for each student.
- Each department is described by a name and has a unique department code.
- Each course has a course name, a course number, credit hours, and is offered by at least one department. The value of course number is unique for each course. A course has at least one section.
- Each section of a course has an instructor, a semester, a year, and a section number. The section number distinguishes different sections of the same course that are taught during the same semester/year; its values are 1, 2, 3, …, up to the number of sections taught during each semester. Students can enroll in sections and receive a letter grade and grade point (0, 1, 2, 3, 4 for F, D, C, B, A, respectively).
- Draw an ER diagram for this schema.
- Specify the key attributes of each entity type and the structural constraints on each relationship type.
- Note any unspecified requirements and make appropriate assumptions to complete the specification.
- Problem 4.13 (Normal form of a CAR_SALE relation)
Consider the following relation and its functional dependencies:
CAR_SALE(Car_no, Date_sold, Salesman_no, Commission, Discount_amt)
{Car_no, Salesman_no} → {Date_sold, Commission, Discount_amt} Date_sold → Discount_amt Salesman_no → Commission and let {Car_no, Salesman_no} be the primary key of this relation.
- Based on the given primary key, is this relation in 1NF, 2NF, or 3NF? Why or why not?
- Normalize it to its third normal form.
- Problem 4.14 (Normal form of a simple relation)
Consider the following relation:
REL(A, B, C, D, E)
Suppose we have the following dependencies:
A → D {A, B} → C D → E - What would be a suitable key for this relation?
- How could this relation not be in first normal form?
- Assume that it is in first normal form, and normalize it to the third normal form.
- Problem 4.15 (Normal form of a SCHEDULE relation)
Consider the following relation:
SCHEDULE(Period_Start, Period_End, Date, Room, Building, Organizer, Length)
And the following dependencies:
{Period_Start, Date} → {Room, Period_End} {Period_Start, Length} → Period_End {Period_Start, Period_End} → Length {Period_End, Length} → Period_Start {Date, Period_Start} → Organizer Room → Building - Based on those functional dependencies, what would be a suitable primary key?
- If this relation is not in second normal form, normalize it to the second normal form.
- If this relation or the relation(s) you obtained previously is (are) not in third normal form, then normalize it (them) to the third normal form.
- Problem 4.16 (Normalizing the FLIGHT relation)
Consider the following relation:
FLIGHT(From, To, Airline, Flight#, Date_Hour, HeadQuarter, Pilot, TZDifference)
A tuple in the FLIGHT relation contains information about an airplane flight: the airports of departure and arrival, the airline carrier, the number of the flight, its time of departure, the headquarter of the company chartering the flight, the name of the pilot(s), and the time zone difference between the departure and arrival airports.
- The “Pilot” attribute is multi-valued (so that between 1 and 4 pilot’s names can be stored in it).
- Given an airline and a flight number, one can determine the departure and arrival airports, as well as the date, time, and the pilot(s).
- Given the airline carrier, one can determine the headquarter.
- Finally, given the departure and arrival airports, one can determine their time zone difference.
Normalize the “FLIGHT” relation to its third normal form. You can indicate your steps, justify your reasoning, and indicate the foreign keys if you want to, but you do not have to.
- Problem 4.17 (From business statement to dependencies, BIKE)
This problem asks you to convert business statements into dependencies. Consider the following relation:
BIKE(Serial_no, Manufacturer, Model, Batch, Wheel_size, Retailer)
Each tuple in the relation BIKE contains information about a bike with a serial number, made by a manufacturer, with a particular model number, released in a certain batch, which has a certain wheel size, and is sold by a certain retailer.
- Write each of the following dependencies as a functional dependency (the first one is given as an example):
- A retailer cannot have two bikes of the same model from different batches. solution: {Retailer, Model} → Batch
- The manufacturer and serial number uniquely identifies the bike and where it is sold.
- A model number is registered by a manufacturer and therefore cannot be used by another manufacturer.
- All bikes in a particular batch are of the same model.
- All bikes of a certain model have the same wheel size.
- Based on those statements, what could be a key for this relation?
- Assuming all those functional dependencies hold, and taking the primary key you identified at the previous step, what is the degree of normality of this relation? Justify your answer.
- Write each of the following dependencies as a functional dependency (the first one is given as an example):
- Problem 4.18 (From business statement to dependencies, ROUTE)
This problem asks you to convert business statements into dependencies. Consider the following relation:
ROUTE(Name, Direction, Fare_zone, Ticket_price, Type_of_vehicle, Hours_of_operations)
A tuple in the ROUTE relation contains information about a public transportation route: its name (e.g. “Gold”, “Green”, …), its direction (e.g., “Medical Campus”, “GCC”, …), the fare zone where the route operates (e.g., “Zone 1”, “Zone 2”, …), the price of a ticket, the nature of the vehicles assuring the route (e.g., “subway”, “bus”, …) and the time of operations (e.g., “24 hours a day”, “from 0600 to 2200”, etc.).
- Write each of the following business statement as a functional dependency:
- Two different types of vehicle can not operate on routes with the same name.
- The ticket price depends of the fare zone and the type of vehicule.
- Both the name and the direction are needed to determine the hours of operations.
- Two routes with the same name and the same direction must have the same fare zone.
- Based on those statements, what could be a key for this relation?
- Write each of the following business statement as a functional dependency:
- Problem 4.19 (From business statement to dependencies, ISP)
Consider the following business statement:
We want to represent the market of Internet Service Providers (ISP). Each ISP offers multiple bundles, that have a maximum bandwith and a price. Some ISP uses the same name for their bundles (e.g. “premium”, or “unlimited”). Each ISP is given multiple Internet Protocol addresses (IP), and those never change. Every client has a ID that is proper to the ISP (i.e., ISP A and ISP B could both have a client with ID “00001”), an email and subscribes to a particular bundle from a particular ISP. The IP of a client changes over the time.
- Assuming we have a relation with all the attributes written in bold in the business statement, list all the functional dependencies given by the statement.
- Based on the functional dependencies you identified at the first step, construct a collection of relations, all in 3rd normal form, that would represent this situation.
- Problem 4.20 (Normalization)
Consider the relations R and T below and their functional dependencies (as well as the one induced by the primary keys):
R(E͟v͟e͟n͟t͟I͟d͟, E͟m͟a͟i͟l͟, Time, Date, Location, Status) T(I͟n͟v͟n͟o͟, Subtotal, Tax, Total, Email, Lname, Fname, Phone) {EventId, Email} → Status EventId → {Time, Date, Location} Invno → {Subtotal, Tax, Total, Email} Email → {Fname, Lname, Phone} Normalize the relations to 2NF and 3NF. Show all relations at each stage (2NF and 3NF) of the normalization process.
- Problem 4.21 (Normal form of the BOOK relation)
Consider the following relation for published books:
BOOK(Book_title, Book_type, Author_name, List_price, Author_affil, Publisher)
Suppose we have the following dependencies:
Book_title → { Publisher, Book_type } Book_type → List_price Author_name → Author_affil - What would be a suitable key for this relation?
- Explain how this relation could not be in first normal form.
- This relation is not in second normal form: explain why and normalize it.
- Are the relations you obtained in the previous step in third normal form? Explain why and normalize them if needed.
- Problem 4.22 (Normal form of the DELIVERY relation)
Consider the following relation for deliveries:
DELIVERY(Shipment, PackageNumber, RecipientName, Weight, DriverName, DriverPhone, RecipientPhone)
Suppose we have the following functional dependencies:
Shipment → DriverName PackageNumber → Shipment PackageNumber → {RecipientName, RecipientPhone} PackageNumber → Weight DriverName → DriverPhone Answer the following three questions:
- Reply Yes / No to the following: In this model…
- … can the shipment being handled by two drivers?
- … can a package have two different recipient?
- … can a package being in different shipments?
- … can a recipient have two different phone numbers for the same name?
- … can a driver have two different phone numbers for the same name?
- Find a primary key for this relation.
- Normalize this relation to the third normal form.
- Reply Yes / No to the following: In this model…
- Problem 4.23 (Normal form of the CONTACT relation)
Consider the relation
CONTACT(Phone, Call_center, Email, Zip, Brand, Website)
and the following functional dependencies:
{Zip, Brand} → {Phone} {Brand} → {Email} { Brand} → {Website} {Phone} → {Call_center} Assume that {Zip, Brand} is the primary key. Normalize this relation to the second normal form, and then to the third normal form. Give the relations, their primary keys, and functional dependencies for both steps.
- Problem 4.24 (Normal form of the MESSAGE relation)
This exercise asks you to convert business statements into dependencies. Consider the following relation:
MESSAGE(SenderId, Time, Date, ReceiverId, Content, Length, Attachment, Size)
A tuple in the MESSAGE relation contains information about a text message: its sender, the time and date when it was sent, the receiver, the content, the length (in characters), the attachment, and the size (in bytes).
- Write each of the following business statements as a functional dependency:
- The length of a message, which can be computed from its content.
- The content and attachment, which determines the size of a message.
- A sender can send the same content and attachment to multiple receivers at the exact same time and date, but cannot send two different contents and attachments at the exact same time and date.
- Assuming all the functional dependencies you identified at the previous step hold, determine a suitable primary key for this relation.
- Taking the primary key you identified at the previous step, what is the degree of normality of this relation? Justify your answer.
- If needed, normalize this relation to the third normal form.
- Write each of the following business statements as a functional dependency:
- Problem 4.25 (PRINT relation in third normal form)
Normalize the following relation to the third normal form.
Do not forget to indicate all the primary keys in your relations.
- Problem 4.26 (CONSULTATION relation: justification, primary key and normal form)
Consider the relation
CONSULTATION(Doctor_no, Patient_no, Date, Diagnosis, Treatment, Charge, Insurance)
with the following functional dependencies:
{Doctor_no, Patient_no, Date} → {Diagnosis} {Doctor_no, Patient_no, Date} → {Treatment} {Treatment, Insurance} → {Charge} {Patient_no} → {Insurance} - The designer decided not to add the functional dependency {Diagnosis} → {Treatment}. Explain what could be the designer’s justification (at the level of the mini-world).
- Identify a primary key for this relation.
- What is the degree of normalization of this relation? Normalize it to the third normal form if necessary.
- Problem 4.27 (COFFEE relation: primary key and normal form)
Consider the relation
COFFEE(Origin, Type_Of_Roast, Price, Roasted_Date, Best_Before, Color, Customer, Rating)
with the following functional dependencies:
{Origin, Type_Of_Roast} → Price {Origin, Type_Of_Roast, Customer} → Rating {Origin, Type_Of_Roast, Roasted_Date} → Color Roasted_Date → Best_Before Assume that all the attributes are atomic and answer the following.
- Based on those functional dependencies, what would be a suitable primary key?
- What is the degree of normalization of this relation? Justify your answer.
- Normalize this relation to the third normal form, and do not forget to indicate all the functional dependencies. You can indicate the second normal form if that helps you.
- Problem 4.28 (A Relation for Network Cards)
A network card (NIC) has a manufacturer, a model, and a unique serial number (MAC address). It offers one or multiple network technologies (ethernet, wi-fi, bluetooth, etc.), and can be connected to the motherboard using one or multiple connections (PCI connector, FireWire, usb, etc.).
- Assuming we have a NIC relation with all the attributes emphasized in the business statement, list all the functional dependencies.
- The relation you obtained is not in 1st normal form, since two of the attributes (network technology and connection) are multi-valued. Propose a way to fix it, and suggest a primary key for the relation(s) you obtained.
- Based on the functional dependencies you identified in the first step, is (are) the relation(s) you constructed in the second step in second normal form? If yes, explain why, if no, normalize it (them).
- Problem 4.29 (From ER to relational schema and UML class diagram – CAR_INFO)
Consider the following ER schema for the CAR_INFO database:
Note that a car can have at most one driver, N passengers, N insurances, and that the car insurance entity exists only if it is “tied up” to a car (i.e., it is a weak entity, and its identifying relationship is called “Insured”).
- Find the key attribute for Car, and the partial key for Car Insurance. If you cannot think of any, add a dummy attribute and make it be the key.
- Convert the ER diagram to a relational database schema.
- Convert the ER diagram to a UMLclass diagram. Hint: Comparing Figure 7.16 with Figure 7.2 from your textbook should guide you.
- Problem 4.30 (From Business Statement to ER Diagram to Relational Model – A Network of Libraries)
You are asked to design a database for a network of libraries.
Each library has a name, an address (made of a number, a street, and a zip), and have copies of documents available to borrow and to reserve. A document is of a particular kind (book, video, or disk), has a title, and an internal catalog number (that can be the ISBN, a barcode, etc.). There can be multiple copies of a document in the network, and each copy has a particular unique code. A copy of a document always “belongs” to a particular library, even when it is checked out.
Furthermore, you want to be able to add the patrons in your database. A patron has a name, a unique library card number, and an email. A patron can reserve (put a hold on) multiple copies of documents for up to two weeks, and can borrow multiple copies of documents for one week if it is a video or a disk, and one month if it is a book. Of course, a copy can be borrowed by only one patron, but it can be put on hold for one patron while being borrowed.
- Draw the ER diagram for this situation. Remember to add all the constraints on your relations.
- Convert your ER diagram to the relational model.
- Problem 4.31 (Using MySQL Workbench’s reverse engineering)
This problem requires you to have successfully completed Pb 4.7 and Pb 4.33.
Using the relational database schema you obtained in Pb 4.33, write the SQL implementation of that database. Then, using MySQL Workbench, use the “Reverse Engineering” function to obtain an EER diagram of your database and compare it with the UML diagram from Pb 4.33. Apart from the difference inherent to the nature of the diagram (i.e., UML vs EER), how else are they different? How are they the same? Is the automated tool as efficient and accurate as you are?
- Problem 4.32 (From business statements to dependencies – KEYBOARD)
This exercise asks you to convert business statements into dependencies. Consider the following relation:
KEYBOARD(Manufacturer, Model, Layout, Retail_Store, Price)
A tuple in the KEYBOARD relation contains information about a computer keyboard; its manufacturer, its model, its layout (AZERTY, QWERTY, etc.), the place where it is sold, and its price.
Write each of the following business statements as a functional dependency:
- A model has a fixed layout.
- A retail store cannot have two different models produced by the same manufacturer.
Based on those statements, what could be a key for this relation?
Assuming all those functional dependencies hold, and taking the primary key you identified at the previous step, what is the degree of normality of this relation? Justify your answer.
- Problem 4.33 (From UML to relational model – DRIVER)
Consider the UML diagram below, and convert it to the relational model. Do not forget to indicate primary and foreign keys.
Solutions to Selected Problems
- Solution to Problem 4.2 (Reading the MOVIES database ER schema)
- True. The double lines on either side of the “PERFORMS IN” relation denote a total participation constraint. This means in particaular that an actor must have performed in at least one movie and that a movie must have had at least one actor who performed in it.
- True. The arity on the right side of the “PERFORMS IN” relation is “N.” This means there are an unlimited amount of movies in which an actor may perform.
- True. The arity on the right side of “LEAD ROLE” is “N.” This means that an actor may have performed in multiple lead roles for movies.
- True. The arity on the left side of “LEAD ROLE” is “2.” This means that a move can have a maximum of two lead roles.
- False. The single line from the “DIRECTOR” to “ALSO A DIRECTOR” is unconstrained and means that the relation is optional for that entity.
- False. There exists an “ACTOR PRODUCER” realtion between the “ACTOR” and “PRODUCER” entities to facilitate this situation.
- False. Although the single line between “PRODUCER” and “ACTOR PRODUCER” means the relationship is optional, it does not mean it cannot happen.
- True. The arity on the left side of “PERFORMS IN” is “M.” This means that a movie can have an unlimited amount of actors in it.
- True. It is not explicit in this schema, but an actor can be both a director and a producer, which implies that a producer can also be a director by transitivity.
- True. In fact, there is a partial participation constraint that means a movie must have one director and one producer at least.
- True. The arity on the left of “DIRECTS” is “1” and is “M” on the left of “PRODUCES.” This means that a movie can only have up to one director, but have an unlimited amuont of producers.
- True. All of these relations exist to enable an actor to act in a lead role, be a director, and be a producer.
- False. There exists a relation for a director to be an actor and there are no constraints keeping the director to perform in the movie they are directing.
- Solution to Problem 4.4 (ER diagram for job and offers)
A possible solution is:
Note that
CONTACT
could be a weak entity with the identifying relationship being eitherDISCUSSED_BY
orEMPLOYS
, but both have disadvantages: they would not allow a contact to discuss more than one offer or to be hired by more than one company.- Solution to Problem 4.5 (Incorrect ER diagram)
Among the numerous flaws, come to mind:
- “Technical errors” (making the diagram incorrect):
- Multiple keys for an entity (“PROGRAMMER”),
- Absence of a key (“PROGRAMMING_LANGUAGE”),
- Absence of the total participation constraint between the weak entity and its identifying relationship (“GUIDELINES” and “RECOMMENDED_BY”),
- Absence of an arity (between the “USES” relationship and the “PROJECT” entity),
- To a certain extent, inconsistent naming (spaces / underscore, absence of capital letter for “leader” or “url”)
- Violations of the business statement:
- “They want to accommodate the fact that a project can use multiple programming languages (and sometimes even multiple versions of the same language)”: this is not possible. There are two ways of adding this feature:
- Make the “PROGRAMMING_LANGUAGE” entity become a “VERSION_OF_PROGRAMMING_LANGUAGE” entity, and make the “USE” relationship M:N.
- Leave the “PROGRAMMING_LANGUAGE” entity as it is, make the “USE” relationship M:N, and add a “Version” attribute to it. The first option being a bit more elegant, to some extend (but we lose the capacity of deriving the latest version).
- “keep track of which programmer is leading which project.”: As a leader is supposed to be a programmer as well, there should be a “IS_THE_LEADER_OF” relationship between “PROGRAMMER” and “PROJECT”, instead of an attribute on the “PROJECT” entity.
- “they also want to track which programmer is knowledgeable of what programming language”: the “KNOWS” relationship should not be “M:1”, as a programmer can probably being knowledgeable in more than one programming language.
- “if a project requires a particular guideline[…], it should be stored somewhere”: The “RECOMMENDED_BY” relationship should be 1:1.
- “They want to accommodate the fact that a project can use multiple programming languages (and sometimes even multiple versions of the same language)”: this is not possible. There are two ways of adding this feature:
- Basic understanding errors:
- The lack of “Name” attribute in the “PROGRAMMING_LANGUAGE” entity is puzzling.
- “CONTRIBUTES_TO” should probably not being total on any side, as you may want to record programmers even if they are not contributing to any project, and project even if nobody is actively working on them. Also, it should not be 1:1, but M:N.
- “RECOMMENDED_BY” could probably be renamed “GUIDE_THE_DEVELOPMENT_OF”, for added clarity.
- “Technical errors” (making the diagram incorrect):
- Solution to Problem 4.9 (From ER diagram to Relational model – BIKE)
Is it true that … Yes No … a customer cannot drop two bikes at the exact same time and date? ✔ … two different customers cannot drop two different bikes at the exact same time and date? ✔ … an employee cannot repair two bikes at the same time? ✔ … a customer can be assigned to more than one employee? ✔ … a customer can have a bike repaired by an employee that is not assigned to him/her? ✔ … a bike can be in the database without having been dropped by a customer? ✔ … an employee can be asked to repair a bike without having that type of bike as one of their specialties? ✔ - For the
1:M
relationships that are not identifying, we can choose between the foreign key and the cross-reference approaches. If we use the former, we obtain:
We could also have used a combination of both!
- Solution to Problem 4.10 (From ER diagram to Relational model – RECORD)
Is it true that … Yes No a label can have multiple logos? ✔ a recording can be released by multiple labels and at different dates? ✔ a record shop can have multiple exclusivities? ✔ two record shops can have the same address? ✔ two logos can have the same name? ✔ two recordings can have the same title? ✔ a record shop must sell at least one recording? ✔ - For the 1:M relationship IS_AN_EXCLUSIVITY_OF, we can choose between the foreign key and the cross-reference approaches. For the 1:1 relationship USES, we can use any approach we want (foreign key, merged relation, or cross-reference). We will choose to merge the two relations LABEL and LOGO and to have a look-up table for the IS_AN_EXCLUSIVITY_OF relation. This obtains:
- Solution to Problem 4.13 (Normal form of a CAR_SALE relation)
The CAR_SALE relation is in 1st normal form, since it has a primary key, and by assuming that all the attributes are atomic. This relation is not is 2nd Normal Form: since Date_sold → Discount_amount and Salesman_no → Commission, then some attributes (namely Discount_amount and Commission) are not fully functional dependent on the primary key. Hence, this relation cannot be in 3rd normal form either.
To normalize,
2NF:
Relations Functional Dependencies Car_Sale1(Car_no, Date_sold, Discount_amt) Car_no → {Date_Sold, Discount_amt} and Date_Sold → Discount_amt Car_Sale2(Car_no, Salesman_no) Car_no → Salesman_no Car_Sale3(Salesman_no, Commission) Salesman_no → Commission 3NF:
Relations Functional Dependencies Car_Sale1-1(Car_no, Date_sold) Car_no → Date_Sold Car_Sale1-2(Date_sold, Discount_amt) Date_Sold → Discount_amt Car_Sale2(Car_no, Salesman_no) Car_no → Salesman_no Car_Sale3(Salesman_no,Commission) Salesman_no → Commission - Solution to Problem 4.15 (Normal form of a SCHEDULE relation)
{Period_Start, Date} would be a suitable primary key.
This relation is already in second normal form: there are no non-prime attributes that are not fully dependent of the primary key. Stated differently, there are no non-prime A such that {Period_Start} → A or {Date} → A.
This relation is not in 3rd normal form. Consider the following relation: {Period_Start, Date} → {Period_Start, Period_End} → Length. {Period_Start, Period_End} is different from {Period_Start, Date} and from Length, and it is not included in a candidate key. The same goes for {Period_Start, Date} → Room → Building.
Once normalized to the third normal form, we get:
- Solution to Problem 4.17 (From business statement to dependencies, BIKE)
- The functional dependencies we obtain are:
- { Manufacturer, Serial_no } → { Model, Batch, Wheel_size, Retailer}
- Model → Manufacturer
- Batch → Model
- {Model, Manufacturer} → Wheel_size
- {Manufacturer, Serial_no }
- If every attribute is atomic, it is in second normal form. { Manufacturer, Serial_no } → Batch → Model breaks the 3NF.
- The functional dependencies we obtain are:
- Solution to Problem 4.18 (From business statement to dependencies, ROUTE)
The relation we consider is:
ROUTE(Name, Direction, Fare_zone, Ticket_price, Type_of_vehicle, Hours_of_operations)
- This problem asks to convert business statements into dependencies.
- Two different types of vehicles can not operate on routes with the same name. Name → Type_of_vehicle
- The ticket price depends of the fare zone and the type of vehicule. {Fare_zone, Type_of_vehicle} → Ticket_price,
- Two different types of vehicles can not operate on routes with the same name. Name → Type_of_vehicle
- Both the name and the direction are needed to determine the hours of operations. {Name, Direction} → Hours_of_operations
- Two routes with the same name and the same direction must have the same fare zone.
{Name, Direction} → Fare_zone - Based on those statements, {Name, Direction} is the only key for this relation.
- This problem asks to convert business statements into dependencies.
- Solution to Problem 4.19 (From business statement to dependencies, ISP)
- The relation we consider is:
ISP(ISP, bunle, bandwith, price, IP, ID, email, time)
The functional dependencies suggested by the business statement are:
{ISP, bundle} → {bandwidth, price} IP → ISP {ISP, ID} → {email, bundle} {ISP, ID, time} → IP We obtain the following four relations when we normalize it to the third normal form:
- BUNDLE(I͟S͟P͟, b͟u͟n͟d͟l͟e͟, bandwidth, price)
- IP(I͟S͟P͟, IP)
- CLIENT(I͟S͟P͟, i͟d͟, email, bundle)
- CLIENT_IP(I͟S͟P͟, i͟d͟, t͟i͟m͟e͟, IP)
- BUNDLE(I͟S͟P͟, b͟u͟n͟d͟l͟e͟, bandwidth, price)
- Solution to Problem 4.21 (Normal form of the BOOK relation)
- {Book Title, Author Name}
- If an attribute is composite or multi-valued, then the relation would not be in first normal form.
- It is not in second normal form because of { Book_title } → { Publisher, Book_type }. We can normalize it like so: (Book Title, Publisher, Book Type, List Price), (Author Name, Author Affiliation), (Author Name, Book Title).
- The relations are in third normal form because of {Book title} → { Book_type} → { List_price}| (Book Title, Publisher, Book Type) and (Book Type, List Price), (Author Name, Author Affiliation), (Author Name, Book Title).
- Solution to Problem 4.22 (Normal form of the DELIVERY relation)
- Reply Yes / No to the following: In this model…
- … can the shipment being handled by two drivers? No
- … can a package have two different recipient? No
- … can a package being in different shipments? No
- … can a recipient have two different phone numbers for the same name? Yes
- … can a driver have two different phone numbers for the same name? No
- PackageNumber would be a suitable primary key for this relation.
- A possible third normal form is (where the only functional dependencies are the one given by the primary keys): SHIPMENT(S͟h͟i͟p͟m͟e͟n͟t͟, DriverName)
DRIVER(D͟r͟i͟v͟e͟r͟N͟a͟m͟e͟, DriverPhone)
PACKAGE(P͟a͟c͟k͟a͟g͟e͟N͟u͟m͟b͟e͟r͟, Shipment, Weight, RecipientName, RecipientNumber)
- Reply Yes / No to the following: In this model…
- Solution to Problem 4.25 (PRINT relation in third normal form)
After normalizing PRINT to the second normal form (by adding the primary key {Author, Title, Size}, and working on dependencies like {Author, Title} → Technique, which does not fully depend of the primary key), we would obtain three relations that are already in third normal form:
- PRICING(A͟u͟t͟h͟o͟r͟, T͟i͟t͟l͟e͟, S͟i͟z͟e, Price)
- ART(A͟u͟t͟h͟o͟r͟, T͟i͟t͟l͟e͟,͟ Technique)
- SHIPPING_COSTS(S͟i͟z͟e͟, Price)
- PRICING(A͟u͟t͟h͟o͟r͟, T͟i͟t͟l͟e͟, S͟i͟z͟e, Price)
- Solution to Problem 4.26 (CONSULTATION relation: justification, primary key and normal form)
- The treatment for a particular disease can vary with the patient (for instance, his age can be a crucial parameter).
- {Doctor_no, Patient_no, Date} is a primary key for this relation.
- Because there are no partial dependencies, the given relation is in 2NF already. This, however, is not 3NF because Charge is a non-key attribute that is determined by another non-key attribute (Treatment). We must decompose the relation further:
CONSULTATION (Doctor_no, Patient_no, Date, Diagnosis, Treatment)
PRICE_LISTING (Treatment, Charge)- Solution to Problem 4.27 (COFFEE relation: primary key and normal form)
The original relation is:
COFFEE(Origin, Type_Of_Roast, Price, Roasted_Date, Best_Before, Color, Customer, Rating)
A suitable primary key would be PKCOFFEE = {Origin, Type_Of_Roast, Roasted_Date, Customer}. Note that it is the minimal and only primary key.
This relation is in first normal form because it has a primary key (the one we just defined), and because all the attributes are atomic. It is not in second normal form, because, for example, the functional dependency PKCOFFEE → Price is not fully functionally dependent, since {Origin, Type_Of_Roast} → Price holds.
Normalizing to the second normal form actually gives us relations in third normal form:
-CLIENT_RATING(Origin, Type_Of_Roast, Customer, Rating)
-PRICING(Origin, Type_Of_Roast, Price )
-EXPIRATION_DATE(Roasted_Date, Best_Before)
-COFFEE_BATCH(Origin, Type_Of_Roast, Roasted_Date, Color)
Where the functional dependencies always are in such a way that all the attributes but the last one fix the value of the last one, and are taken to be the primary key.
Checking that they are all in third normal form is straightforward. Note that the “original” relation was somewhat lost, since we do not have a relation whose primary key is PKCOFFEE anymore. We could have re-introduced a relation with only the attributes of PKCOFFEE to be on the “safe side”, but the benefit would not have been clear
- Solution to Problem 4.29 (From ER to relational schema and UML class diagram – CAR_INFO)
For Car, we need to create an attribute, like VIN. For Car Insurance, Policy Number is the perfect key attribute.
Note that, during the coversion, we had to make Insured Car part of the primary key of CAR INSURANCE.
- Solution to Problem 4.30 (From Business Statement to ER Diagram to Relational Model – A Network of Libraries)
- For the ER diagram, we could get something like:
Note that:- We want to represent the fact that a single document can have multiple copies, which suggests that DOCUMENT and COPY are two separate entities.
- COPY could be made into a weak entity, OF being the identifying relation.
- Nothing in the statement forces a relationship between the patron and the library to exist, so, by simplicity, we do not add it. However, adding it would not have been a mistake.
- The fact that a COPY has to be of a particular kind does not force the kind attribute to be multi-valued or composite: it just means that if we were representing the domains as well, this attribute would have a particular domain that restricts the values to three possibilities (book, video or disk).
- POSSESS is total on the COPY side because the statement reads “A copy of a document always ‘belongs’ to a particular library.”
- HOLD_BY could be N : M, since nothing in the statement says that a document can be put on hold by only one patron.
- Its mapping to a relational model could be:
Note that:- The relationships HOLD_BY and BORROWED_BY could be represented using the foreign key approach. However, their value would be NULL most of the time, so it would not be very efficient.
- COPY could be the only attribute in the primary key of BORROWING and HOLD because a copy can be borrowed or put on hold only once. Having both attributes being the primary key could allow for more flexibility (typically, a copy could be put on hold by multiple patrons at the same time).
- For the ER diagram, we could get something like:
- Solution to Problem 4.31 (Using MySQL Workbench’s reverse engineering)
We give the code first, then the drawing:
/* code/sql/HW_Person.sql */ DROP SCHEMA IF EXISTS HW_Person; CREATE SCHEMA HW_Person; USE HW_Person; CREATE TABLE PERSON ( ID VARCHAR(25) PRIMARY KEY, NAME VARCHAR(25), Street VARCHAR(25), City VARCHAR(25), Seat VARCHAR(25), Position VARCHAR(25) ); CREATE TABLE CAR ( Vin VARCHAR(25) PRIMARY KEY, Make VARCHAR(25), Model VARCHAR(25), Year DATE, Driver VARCHAR(25), FOREIGN KEY (Driver) REFERENCES PERSON (ID) ON UPDATE CASCADE ); ALTER TABLE PERSON ADD FOREIGN KEY (Seat) REFERENCES CAR (Vin); CREATE TABLE CAR_INSURANCE ( Policy_number VARCHAR(25) PRIMARY KEY, Company_name VARCHAR(25), Insured_car VARCHAR(25), FOREIGN KEY (Insured_car) REFERENCES CAR (Vin) ); CREATE TABLE PHONE ( ID VARCHAR(25), Number VARCHAR(25), FOREIGN KEY (ID) REFERENCES PERSON (ID), PRIMARY KEY (ID, number) );
mysql Workbench Diagram