Sunday, November 28, 2010

Course Notes -- Data Base Theory and Design (CS5332)

This week I'm revisiting some old course notes that I originally published on www.big-oh.net. I'm moving away from self-hosting content on that domain, migrating blog-style content into the cloud ... Blogger in this case.

Following are notes that I used to study for the Data Base Theory and Design section of the Texas State University Computer Science graduate comprehensive examination.


Elementary DBMS Concepts

Functional Dependencies, Attributes and Keys
  • A database management system (DBMS) promotes data integrity through elimination of data repetition, referential integrity and transactional operations.
  • An attribute B has a functional dependency on attribute A (A -> B) if for each value of attribute A there is exactly one value of attribute B.
  • A super key is a collection of attributes that uniquely identifies a row in a table -- i.e. the set of attributes in the table is functionally dependent on the super key.
  • A candidate key is a minimal super key.
  • A primary key is selected from all possible candidate keys.
  • A non-prime attribute is one that is not part of any candidate key.

Normalization of Tables

1NF, 2NF, 3NF, and BCNF
  • A table with a unique key is in first normal form (1NF). All tables with a primary key are, by definition, in 1NF.
  • A 1NF table is in second normal form (2NF) if all functional dependencies of non-prime attributes depend on a complete candidate key (or another non-prime attribute). If a non-prime attribute has a functional dependency on a proper subset of a candidate key, then the table is not 2NF. Note that if no candidate keys are composite, then the table is 2NF by definition.
  • For example, in an EMPLOYEE_SKILLS entity comprised of (EMPLOYEE_ID, EMPLOYEE_ADDRESS, SKILL), the EMPLOYEE_ADDRESS is functionally dependent on EMPLOYEE_ID, which is a proper subset of the only candidate key (EMPLOYEE_ID, SKILL). This is an intuitively undesirable situation because the EMPLOYEE_ADDRESS attribute will contain repetitive data that may become inconsistent due to update anomalies.
  • A 2NF table is in third normal form (3NF) if all non-prime attributes are non-transitively dependent on every candidate key. As with 2NF, 3NF protects against logical inconsistencies brought on by update anomalies.
  • BNCF is a slightly stronger version of 3NF in which for all non-trivial functional dependencies A -> B, A must be a super key. Most 3NF and BCNF tables are free of update, insertion and deletion anomalies.
  • Normalization protects against three types of data anomalies.
Data Anomalies
  • Update anomalies may arise when the same information is stored in multiple locations, thus potentially allowing an update to create inconsistent data.
  • Insertion anomalies may arise when two pieces of unrelated information (A and B) are stored together in the same entity, thus potentially leading to a situation in which A cannot be defined in the DB because B does not yet exist in the DB.
  • Deletion anomalies may arise when two pieces of unrelated information (A and B) are stored together in the same entity, thus potentially leading to a situation in which information about A will be (incorrectly) deleted because the last related B has been deleted.

Entity-Relationship Modeling

Diagram Basics
  • Entities are typically nouns that are depicted in a rectangle.
  • Relationships are typically verbs that are depicted in a diamond.
  • Both entities and relationships can have attributes, which are depicted in ellipses. The names of attributes that form an entity's primary key are underlined.
  • An arrow indicates an "at most one" relationship.
  • A double line indicates mandatory participation.
  • Wikipedia has an excellent example of an entity-relationship diagram.

No comments:

Post a Comment