This week I'm revisiting some old course notes that I originally published on www.big-oh.net. I'm moving away from self-hosting content on that domain, migrating blog-style content into the cloud ... Blogger in this case.
Following are notes that I used to study for the Data Base Theory and Design section of the Texas State University Computer Science graduate comprehensive examination.
Elementary DBMS Concepts
Functional Dependencies, Attributes and Keys
- A database management system (DBMS) promotes data integrity through elimination of data repetition, referential integrity and transactional operations.
- An attribute B has a functional dependency on attribute A (A -> B) if for each value of attribute A there is exactly one value of attribute B.
- A super key is a collection of attributes that uniquely identifies a row in a table -- i.e. the set of attributes in the table is functionally dependent on the super key.
- A candidate key is a minimal super key.
- A primary key is selected from all possible candidate keys.
- A non-prime attribute is one that is not part of any candidate key.
Normalization of Tables
1NF, 2NF, 3NF, and BCNF
- A table with a unique key is in first normal form (1NF). All tables with a primary key are, by definition, in 1NF.
- A 1NF table is in second normal form (2NF) if all functional dependencies of non-prime attributes depend on a complete candidate key (or another non-prime attribute). If a non-prime attribute has a functional dependency on a proper subset of a candidate key, then the table is not 2NF. Note that if no candidate keys are composite, then the table is 2NF by definition.
- For example, in an EMPLOYEE_SKILLS entity comprised of (EMPLOYEE_ID, EMPLOYEE_ADDRESS, SKILL), the EMPLOYEE_ADDRESS is functionally dependent on EMPLOYEE_ID, which is a proper subset of the only candidate key (EMPLOYEE_ID, SKILL). This is an intuitively undesirable situation because the EMPLOYEE_ADDRESS attribute will contain repetitive data that may become inconsistent due to update anomalies.
- A 2NF table is in third normal form (3NF) if all non-prime attributes are non-transitively dependent on every candidate key. As with 2NF, 3NF protects against logical inconsistencies brought on by update anomalies.
- BNCF is a slightly stronger version of 3NF in which for all non-trivial functional dependencies A -> B, A must be a super key. Most 3NF and BCNF tables are free of update, insertion and deletion anomalies.
- Normalization protects against three types of data anomalies.
Data Anomalies
- Update anomalies may arise when the same information is stored in multiple locations, thus potentially allowing an update to create inconsistent data.
- Insertion anomalies may arise when two pieces of unrelated information (A and B) are stored together in the same entity, thus potentially leading to a situation in which A cannot be defined in the DB because B does not yet exist in the DB.
- Deletion anomalies may arise when two pieces of unrelated information (A and B) are stored together in the same entity, thus potentially leading to a situation in which information about A will be (incorrectly) deleted because the last related B has been deleted.
Entity-Relationship Modeling
Diagram Basics
- Entities are typically nouns that are depicted in a rectangle.
- Relationships are typically verbs that are depicted in a diamond.
- Both entities and relationships can have attributes, which are depicted in ellipses. The names of attributes that form an entity's primary key are underlined.
- An arrow indicates an "at most one" relationship.
- A double line indicates mandatory participation.
- Wikipedia has an excellent example of an entity-relationship diagram.
No comments:
Post a Comment