Sunday, November 28, 2010

Course Notes -- Design of Programming Languages (CS5318)

This week I'm revisiting some old course notes that I originally published on www.big-oh.net. I'm moving away from self-hosting content on that domain, migrating blog-style content into the cloud ... Blogger in this case.

Following are notes that I used to study for the Design of Programming Languages section of the Texas State University Computer Science graduate comprehensive examination.


Language Design Criteria

The sub-headings below catalog some of the most important language design criteria.

Efficiency

The efficiency of a programming language relates to the speed/cost of performing basic activities of a language's lifecycle. Many facets of a programming language can be evaluated with respect to efficiency.

  • Efficiency of Execution - Influenced by language features like recursion and dynamic memory allocation.
  • Efficiency in Translation - How many passes does the compiler need?
  • Efficiency in Implementation - Influenced by the number of features provided and by level of orthogonality.
  • Efficiency in Programming - Time to train programmers and time to solve problems in the language.
  • Efficiency in Maintenance - Largely determined by readability.
Regularity

The regularity of a language relates to the interaction of its features. A language's regularity can be further decomposed into the following characteristics.

  • Generality - A language is general if there are no special cases in the availability or use of the language's operators. An example of failed generality in the C language is that individual items can be compared with the "==" operator, but entire structs cannot be compared using that operator.
  • Orthogonality - The constructs of a language are orthogonal if they do not behave differently in different contexts and can be easily combined in all meaningful ways. An example of failed orthogonality in the C language is that local variables can only be defined at the beginning of the block.
  • Uniformity - In a uniform language, things that are similar should look similar and things that are different should look different. An example of failed uniformity in the C language is that the difference between "=" and "==" is confusing.
Simplicity

Simple languages have the following properties.

  • Simple languages have fewer features, which makes them easier to implement.
  • Simple languages may have increased implementation efficiency, but decreased programming efficiency and decreased expressiveness.

Abstraction

Abstractions hide unnecessary details, those that are only relevant to the internal implementation. For example, the String data type is an abstraction in that it hides the implementation details of storing and manipulating character data. Another example abstraction is that the "if" statement is an abstraction in that it eliminates the need to consider logic-based jump commands.

The table below describes a taxonomy of abstraction levels in programming languages.

Abstraction Level Data Control
Basic Variables as abstractions for memory locations. Variable assignment and jump statements.
Structured Arrays, records, structs, classes, lists, etc. Guarded commands.
Unit Packages, components, libraries, etc.
Parallel Threads, processes, tasks, etc.

Computational Paradigms

A single programming language may support one or more computational paradigms. The sub-headings below details some of the computational paradigms covered by the comprehensive exam.

Imperative

The imperative computational paradigm is closely tied to the structure of the von Neumann computer model.

  • A.k.a. procedural.
  • Examples include Fortran, C, Pascal and Algol
  • Characterized by sequenced instruction execution, loops, and variables that represent memory locations.
Functional

The functional computational paradigm structures programs in a manner that mirrors the structure of lambda-calculus.

  • A.k.a. applicative.
  • Examples include Lisp, Miranda and Scheme.
  • Reasoning about this type of programs is easier because it is based on recursive function theory.
  • Characterized by (recursive) function calls as the basis for program control. Instead of having variables that represent memory locations, this paradigm has function parameters that accept input values.
Logic

The logical computational paradigm is a higher-level paradigm that abstracts away the need to describe "what to do," requiring that the programmer describe only "what s/he wants."

  • A.k.a. declarative.
  • Examples include Prolog and SQL.
  • Characterized by variables that represent partial computations, which are inputs to Horn clause resolution.

Programming Language Levels

When writing a program in any language, errors can arise at many different levels:

  • Lexical - identifiers, operators, separators, comments
    // not a valid operator or identifier
    int x#1 = 2;
  • Context-Free Syntax - dictates the form of statements, functions, modules, etc.
    // missing parentheses violates 
    // if statement syntax
    if x == y { a=b; }
  • Context-Sensitive Syntax - type checking, forward references, variable declaration, etc.
    // duplicate variable declaration
    double a;
    ...
    int a; 
      
  • Dynamic Semantics - divide by zero, use of un-initialized variable, etc.
    double a = 0;
    double b = 12 / a;
      

Grammars

Definition

A grammar is defined by:

  • A finite set of non-terminal symbols.
  • A finite set of terminal symbols.
  • A single start symbol.
  • A finite set of production rules.
Context-Free Grammars

A Context-free grammar consists of a series of grammar rules that observe the form:

Non-terminal -> {terminals and non-terminals}

This type of grammar is context-free because each left-hand-side non-terminal can always be replaced by any right-hand-side choice, no matter where the non-terminal might appear.

Additionally, a context-free grammar also has a distinguished non-terminal called a start symbol. The language for the grammar is the set of all strings of terminals that can be derived from the start symbol.

By defining a programming language using a context-free grammar, the grammar also implicitly defines the steps that a parser must take to construct a parse tree for a string.

Attribute Grammars

Attribute grammars are an extension of context-free grammars. Because context-free grammars describe only the lexical and syntax levels of a language, they are unable to describe many features of modern programming languages. For example, context-free grammars are unable to enforce type checking. Attribute grammars provide the ability to enforce static semantic rules, like type checking.

An attribute grammar has these properties:

  • Each grammar symbol has a set of (inherited or synthesized) attributes.
  • Each grammar rule has a set of (semantic of predicate) functions, which are defined over the attributes of the symbols in the rule.

Course Notes -- Algorithm Design and Analysis (CS5329)

This week I'm revisiting some old course notes that I originally published on www.big-oh.net. I'm moving away from self-hosting content on that domain, migrating blog-style content into the cloud ... Blogger in this case.

Following are notes that I used to study for the Algorithm Design and Analysis section of the Texas State University Computer Science graduate comprehensive examination.


Sorting Algorithms

Quick Reference

Here is a quick summary of the features offered by some of the most important sorting algorithms.

Algorithm Worst Case Average Case In Place? Stable?
Quick Sort n2 n lg n Yes No
Merge Sort n lg n n lg n No Yes
Heap Sort n lg n n lg n Yes No
Insertion Sort n2 n + d Yes Yes
Counting Sort n + k n + k No Yes

The table above uses several variables. The variable n represents the number of items to be sorted. The variable d represents the number of inversions required by the algorithm. The variable k represents an upper bound on the values to be sorted.

Quick Sort

Quick sort is a classic divide-and-conquer algorithm in which the input array is split into left and right subsets such that all elements in the left subset are <= a pivot value and all elements in the right subset are >= a pivot value.

Quicksort(A,p,r)
  if (p<r)
    q = Partition(A,p,r);
    Quicksort(A,p,q-1);
    Quicksort(A,q+1,r);

Partition(A,p,r)
  x = A[r];
  i = p-1;
  for j = p to r-1
    if A[j] <= x
      i++;
      swap(A[i],A[j]);
  i++;
  swap(A[i], A[r]);
  return i;

To achieve the best-case run time for quick sort, the partition function must evenly divide the input array each time that it is called. T(n) = 2T(n/2) + O(n) --> T(n) is O(n lg n)

In the worst-case runtime for the algorithm, each call to the partition function creates sets of size n-1 and 1. T(n) = T(n-1) + T(1) + O(n) --> T(n) = O(n2)

Merge Sort

Merge sort is also a divide-and-conquer algorithm. The algorithm relies on the fact that fewer steps are required to construct a sorted list from two sorted lists than two unsorted lists.

In the merge sort algorithm, unsorted lists are recursively divided in half until they can be divided no further. At that point, the ordered lists are merged together, forming ever-larger ordered lists.

Heap Sort

Heaps are complete binary trees (with possible exception of rightmost nodes on the lowest level) in which all non-root nodes observe the heap property: parent(i) >= i. Heap sort operates by building a heap from the input elements and then iteratively removing the top element from the heap.

Insertion Sort

Insertion sort is an easy to implement comparison sort that builds a sorted array one element at a time. The first k iterations of the algorithm produce an ordered k-element sub-list. The algorithm requires n such iterations, each of which may need to perform k comparisons in the worst case.

Insertion sort has good performance on an almost-sorted input list. Despite poor worst-case performance of the algorithm, it can out-perform other more sophisticated comparison sorting algorithms when operating on very small input sets.

Counting Sort

Counting sort is NOT a comparison sort. Counting sort takes an input of n integers whose values range from 1 to k. The general strategy for each input element x is to determine the number of inputs <= x and then put x directly into the correct output position.

Solving Recurrences

Useful Mathematical Properties
  • Summation of 1..n = n(n+1)/2
  • x = loga y --> (a to some power x equals y) --> y = ax
  • loga a = 1
Asymptotic Notation
  • Big-Oh - An upper bound:
    f is Ο(g) if for some c and n0, 0 <= f(n) <= c * g(n) for all values of n >= n0
  • Big-Omega - A lower bound:
    f is Ω(g) if for some c and n0, 0 <= c * g(n) <= f(n) for all values of n >= n0
  • Big-Theta - An asymptotically tight bound:
    f is Θ(g) if for some c1, c2 and n0, 0 <= c1 * g(n) <= f(n) <= c2 * g(n) for all values of n >= n0
Recursion Trees
  • Each node in the tree is labeled with its complexity (excluding recursive calls).
  • The complexity of recursive calls is represented by child nodes in the tree.
  • The total complexity is determined by summing the work done at each level in the tree.
Master Method

Many (but not all) recurrences can be solved using the master method. Consider recurrences of the form:
T(n) = a * T(n/b) + f(n)
where a >= 1 and b > 1 and f(n) is an asymptotically positive function.

  • Case 1 - If for some constant e > 0,
    f(n) = Ο(n (logb a - e))
    then T(n) = Θ(n (logb a))
  • Case 2 - If f(n) = Θ(n (logb a))
    then T(N) = Θ(n (logb a) lg n)
  • Case 3 - If for some constant e > 0,
    a) f(n) = Ω(n (logb a + e))
    and
    b) a * f(n/b) <= c * f(n) for some c < 1
    then T(n) = Θ(f(n))

Divide-and-Conquer Techniques

Divide-and-conquer is a recursive algorithm design technique in which subproblems are divided until they are small enough to solve directly. Solutions to subproblems are then combined to solve the problem at-large.

Divide-and-conquer algorithms have a related recurrence relation that can be used to determine the run time complexity of the algorithm.

The quick sort algorithm is an example of a divide-and-conquer algorithm.

Dynamic Programming

Elements of Dynamic Programming

Dynamic programming is applicable when subproblems share subsubproblems. It is typically utilized in solving optimization problems.

There are four steps to developing a dynamic programming algorithm:

  • Characterize the structure of an optimal solution.
  • Recursively define the value of an optimal solution.
  • Compute the value of an optimal solution in a bottom-up fashion.
  • Construct an optimal solution from computed information.
Assembly Line Scheduling - An Example

Two lines, each containing n stations. The problem is to determine which stations to choose from line 1 and which to choose from line 2 in order to minimize the total time through the factory.

  • We observe that the problem has optimal substructure: The optimal path to station Si,j contains an optimal path to either station S1,j-1 or S2,j-1.
  • Recursive definition: f1[j] = min(f1[j-1] + a1,j; f2[j-1] + t2,j-1 + a1,j)
  • Compute bottom-up with a table of the following form:
      Station 1 Station 2   ..   Station n
    line1 f/l f/l f/l f/l
    line2 f/l f/l f/l f/l
    Where f is the cost to get to that station and l records the line number of the j-1th station we used to achieve the optimal solution.
  • Optimal solution is constructed from table produced in prior step.

Greedy Strategies

Elements of Greedy Strategies

Greedy strategies are a special case of dynamic programming. In addition to exhibiting optimal substructure, problems that are appropriately solved by a greedy strategy have the greedy choice property:

A globally optimal solution can be obtained by making a locally optimal (greedy) choice.

Greedy algorithms differ from traditional dynamic programming solutions in that they are a top-down strategy.

Activity Selection - An Example

Problem is to select a max size set of mutually compatible activities. The key to solving with a greedy strategy is to sort activities in order of finish time. The greedy solution always chooses the first compatible activity (i.e. the one with the earliest finish time). That solution is greedy in that it maximizes the remaining available time.

Note that the greedy strategy described above is appropriate for soling the activity selection problem because the problem exhibits:

  • Optimal substructure
  • Greedy choice property
Coin Changing - An Example

The coin changing problem highlights the fact that greedy strategies are not always optimal. In that problem, we are asked to produce the minimal number of coins representing a dollar amount.

By way of example, consider the scenario in which we are asked to compose $0.31 from these available denominations: $0.25, $0.10, $0.01. In that case, the Greedy solution would be:
25,1,1,1,1,1,1
However, the optimal solution is:
10,10,10,1

Course Notes -- Data Base Theory and Design (CS5332)

This week I'm revisiting some old course notes that I originally published on www.big-oh.net. I'm moving away from self-hosting content on that domain, migrating blog-style content into the cloud ... Blogger in this case.

Following are notes that I used to study for the Data Base Theory and Design section of the Texas State University Computer Science graduate comprehensive examination.


Elementary DBMS Concepts

Functional Dependencies, Attributes and Keys
  • A database management system (DBMS) promotes data integrity through elimination of data repetition, referential integrity and transactional operations.
  • An attribute B has a functional dependency on attribute A (A -> B) if for each value of attribute A there is exactly one value of attribute B.
  • A super key is a collection of attributes that uniquely identifies a row in a table -- i.e. the set of attributes in the table is functionally dependent on the super key.
  • A candidate key is a minimal super key.
  • A primary key is selected from all possible candidate keys.
  • A non-prime attribute is one that is not part of any candidate key.

Normalization of Tables

1NF, 2NF, 3NF, and BCNF
  • A table with a unique key is in first normal form (1NF). All tables with a primary key are, by definition, in 1NF.
  • A 1NF table is in second normal form (2NF) if all functional dependencies of non-prime attributes depend on a complete candidate key (or another non-prime attribute). If a non-prime attribute has a functional dependency on a proper subset of a candidate key, then the table is not 2NF. Note that if no candidate keys are composite, then the table is 2NF by definition.
  • For example, in an EMPLOYEE_SKILLS entity comprised of (EMPLOYEE_ID, EMPLOYEE_ADDRESS, SKILL), the EMPLOYEE_ADDRESS is functionally dependent on EMPLOYEE_ID, which is a proper subset of the only candidate key (EMPLOYEE_ID, SKILL). This is an intuitively undesirable situation because the EMPLOYEE_ADDRESS attribute will contain repetitive data that may become inconsistent due to update anomalies.
  • A 2NF table is in third normal form (3NF) if all non-prime attributes are non-transitively dependent on every candidate key. As with 2NF, 3NF protects against logical inconsistencies brought on by update anomalies.
  • BNCF is a slightly stronger version of 3NF in which for all non-trivial functional dependencies A -> B, A must be a super key. Most 3NF and BCNF tables are free of update, insertion and deletion anomalies.
  • Normalization protects against three types of data anomalies.
Data Anomalies
  • Update anomalies may arise when the same information is stored in multiple locations, thus potentially allowing an update to create inconsistent data.
  • Insertion anomalies may arise when two pieces of unrelated information (A and B) are stored together in the same entity, thus potentially leading to a situation in which A cannot be defined in the DB because B does not yet exist in the DB.
  • Deletion anomalies may arise when two pieces of unrelated information (A and B) are stored together in the same entity, thus potentially leading to a situation in which information about A will be (incorrectly) deleted because the last related B has been deleted.

Entity-Relationship Modeling

Diagram Basics
  • Entities are typically nouns that are depicted in a rectangle.
  • Relationships are typically verbs that are depicted in a diamond.
  • Both entities and relationships can have attributes, which are depicted in ellipses. The names of attributes that form an entity's primary key are underlined.
  • An arrow indicates an "at most one" relationship.
  • A double line indicates mandatory participation.
  • Wikipedia has an excellent example of an entity-relationship diagram.

Course Notes -- Advanced Artificial Intelligence (CS5346)

This week I'm revisiting some old course notes that I originally published on www.big-oh.net. I'm moving away from self-hosting content on that domain, migrating blog-style content into the cloud ... Blogger in this case.

Following are notes that I used to study for the Advanced Artificial Intelligence section of the Texas State University Computer Science graduate comprehensive examination.


Conventional Search

Quick Reference

Conventional search strategies utilize only the facts provide in the problem definition. They are dumb searches in the sense that they are only capable of generating successors and distinguishing goal states from non-goal states.

Here is a quick summary of the features offered by some of the most important conventional search strategies.

Strategy Summary Complete? Optimal? Space
Breadth First Search Uses a queue to manage successor nodes. Yes Yes Ο(bd+1)
Depth First Search Uses a stack to manage successor nodes. No (when the search space is infinitely deep) No Ο(bm)
Depth Limited Search A derivation of depth first search that treats nodes at some cutoff maximum depth as though they had no successors. No No Ο(bl)
Iterative Deepening Depth Limited Search Iteratively runs depth limited search with increasing cutoff values until a goal state is found. Yes Yes Ο(bd)

The table above uses several variables. The variable b represents the branching factor of the search space. The variable d represents the depth of the shallowest solution in the search space. The variable m represents the maximum depth of the search tree.

Informed Search

Best-First Strategies

Informed searches use problem-specific knowledge to find solutions (goal states) more efficiently. An important subclass of informed searches, best-first searches navigate the search space by choosing the next (best) node for expansion based on an evaluation function. The efficiency of a best-first search algorithm depends entirely on the accuracy of the evaluation function -- i.e. how often it can recommend the node that is next in leading us to a goal state. There are many best-first search algorithms, differing only in the evaluation functions that they use.

A priority queue is useful data structure in implementing best-first search algorithms.

Greedy Best-First Search

The greedy best-first search strategy uses the evaluation function f(n) = h(n), where h(n) is a heuristic that judges how close the current node is to a goal state. Solutions found by this evaluation strategy may not be optimal. Moreover, the strategy is not complete.

A* Search

The A* search strategy is a best-first search strategy that uses the evaluation function f(n) = g(n) + h(n), where g(n) is the cost to get to the current node and h(n) is a heuristic estimate of the cost from the current node to a goal state. If h(n) is an admissible heuristic, then A* is both complete and optimal. An admissible heuristic is one that never overestimates the cost to reach a goal state.

Problem relaxation and combination are both important methods for developing admissible heuristics for the A* search strategy.

Local Search

Overview

Local search strategies can be used when we seek only a goal state, but not a path to the goal state. In general, local search algorithms operate by starting with start with a goal state and then moving to a neighbor state. Local search strategies typically use much less memory than both the conventional and informed search methods described in the sections above.

Examples

Here is a list of some important local search strategies.

  • Hill climbing is the most basic local search technique. In hill climbing, we iteratively move to neighbor states with greater value until we hit a local maximum.
  • Random-restart hill climbing performs the hill climbing algorithm several times from random starting states.
  • In simulated annealing, the strategy is to pick a random successor move. If the move is an improvement, it is kept. Otherwise, if the move is not an improvement, it is only kept with some probability less than 100%.

Minimax Adversarial Search

Overview

The minimax search strategy seeks to minimize the maximum possible loss a player will incur when playing a turn-based multiplayer game. The minimax search strategy assumes that the opposing player will always make optimal moves. The strategy also relies on having a utility function that can be used to score game states as well as a terminal-test function that can be used to identify terminal game states.

We typically describe the players of the game as A (who is trying to maximize the score) and B (who is trying to minimize the score. At each step in the minimax search tree, we assume that A and B make optimal choices, meaning that A selects the child with the largest score and B selects the child with the smallest score.

Example

Wikipedia has an excellent example of a 2-ply minimax search tree in which circles represent "A's turn" and squares represent "B's turn." Notice that circles (A) are filled-in with the largest child value and squares (B) are filled-in with the smallest child value.

Pseudo Code

The implementation of the minimax search strategy is very simple, requiring only two nearly identical recursive control functions:

function MaxValue(state) returns a utility value
 if Terminal-State(state) 
  return Utility(state); 
 v = -infinity;
 for s in Successors(state) 
  v = Max(v, MinValue(s));
 return v;
 
function MinValue(state) returns a utility value
 if Terminal-State(state)
  return Utility(state);
 v = +infinity;
 for s in Successsors(state)
  v = Min(v, MaxValue(s));
 return v;
Alpha-Beta Pruning

Alpha-Beta pruning is a technique that allows us to calculate a correct minnimax decision without considering every node in the game tree. Specifically, alpha-beta pruning trims away sub trees that cannot possibly influence the final value calculated by the minimax search strategy.

To apply the alpha-beta pruning concept to our minimax functions, we need only introduce two new parameters to the pseudo-code cataloged above:

  • a - the best (largest) value in the search path for Max
  • b - the best (smallest) value in the search path for Min
function MaxValue(state, a, b) returns a utility value
 if Terminal-State(state) 
  return Utility(state);
 v = -infinity;
 for s in Successors(state) 
  v = Max(v, MinValue(s, a, b));
  if v >= b
   return v;
  else
   a = Max(a, v);
 return v;
function MinValue(state, a, b) returns a utility value
 if Terminal-State(state)
  return Utility(state);
 v = +infinity;
 for s in Successsors(state)
  v = Min(v, MaxValue(s, a, b));
  if v <= a
   return v;
  else
   b = Min(b, v);
 return v;
Imperfect Search

The minimax search procedure (with or without alpa-beta pruning) is an optimal search strategy. However, the large search space of most games prohibits execution of the minimax search procedure down to terminal nodes. In order to search for a good solution to an adversarial problem, an evaluation function can be used to turn non-terminal nodes into terminal nodes. To conduct imperfect searches, we must make two modifications to the minimax search routine:

  • The utility function is replaced by a heuristic evaluation function.
  • The terminal test is replaced by a cutoff test that decides when to call the evaluation function.

Knowledge Representation

Motivation

Logics (propositional or first-order predicate) are useful for building knowledge bases that can be used by logical agents to infer the answers to submitted problems. Such systems are generally declarative (as opposed to procedural) and are implemented by inference algorithms that are both sound (truth preserving) and complete.

Propositional Logic

The syntax of propositional logic is characterized by atomic sentences comprised of propositional symbols (e.g. R P Q), negations (-), conjunctions (^), disjunctions (or), implications (->) and biconditionals (<->). Truth tables are one way of defining the meaning of the above operators.

Propositional sentences can be classified according to validity and satisfiability:

  • A valid proposition is one that is true in every model.
  • A satisfiable proposition is one that is true in some model.
  • A non-satisfiable proposition is one that is true in no model.

Inference rules can be applied to propositional sentences to soundly derive new sentences.

  • Modus Ponens:
    * a->b
    * a
    entails b
  • And-Elimination:
    * a ^ b
    entails a
  • Unit Resolution:
    * a or b
    * -b
    entails a
  • Resolution:
    * a or b
    * -b or c
    entails a or c

Horn clauses are propositional logic clauses that are limited to being disjunctions of literals, of which at most one is positive. Every Horn clause can be rewritten as an implication. For example, the Horn clause (-a or -b or c) can be rewritten as the implication a ^ b -> c. Horn clauses support forward chaining and backward chaining.

Another standard format for propositional logic sentences is conjunctive normal form. All propositional logic sentences can be translated to conjunctive normal form, which is a conjunction of disjunctions: (A or -B or C) ^ (-D or E) ^ (F or G or H)

De Morgan's laws are useful when converting to CNF:

  • -(A ^ B) can be rewritten as (-A or -B)
  • -(A or B) can be rewritten as (-A ^ -B)
First-Order Predicate Logic

Propositional logic is not sufficient to succinctly represent complex environments. First-order predicate logic is more powerful because it deals with objects and their relations, not just propositional facts. The syntax of first-order predicate logic is comprised of:

  • Constants, which stand for objects -- e.g. John
  • Predicates, which stand for relations -- e.g. Brother(John, Tom)
  • Functions

First-order predicate logic also allows the use of the logical connectives we saw in propositional logic: ^, V, -, ->, <->. In addition, first-order predicate logic also allows the use of quantifiers, which allow us to make single statements that apply to collections of objects. The universal quantifier (∀) and the existential quantifier (∃) can be used to rewrite multiple propositional logic sentences as a single first-order predicate logic sentence. For example, we can capture the idea that "everyone loves someone" using this first-order predicate logic sentence: ∀x ∃y Loves(x,y)

Expert Systems

The techniques of forward chaining and backward chaining can both be used to construct expert systems.

Forward Chaining

Forward chaining is an example of data-driven reasoning -- i.e. reasoning in which the focus of attention begins with all known facts.

The forward chaining algorithm seeks to determine whether a query q is entailed by the (Horn clause) sentences in a knowledge base. Forward chaining begins by collecting all of the know facts (positive literals) in the database. Next, using implication, if all premises of a knowledge base sentence are known, its conclusion is added to the collection of facts. That process is repeated until either the query q is derived as a fact or until no further facts can be generated.

Backward Chaining

Backward chaining is an example of goal-directed reasoning -- i.e. reasoning in which the focus of attention begins with the query statement.

The backward chaining algorithm works backward from the query statement. The algorithm attempts to prove true a set of statements from which the query q is entailed. For each statement s needed to entail q, backward chaining is recursively applied to prove s.

Course Notes -- Survey of Software Engineering (CS5391)

This week I'm revisiting some old course notes that I originally published on www.big-oh.net. I'm moving away from self-hosting content on that domain, migrating blog-style content into the cloud ... Blogger in this case.

Following are notes that I used to study for the Survey of Software Engineering section of the Texas State University Computer Science graduate comprehensive examination.


Software Development Lifecycle Models

Waterfall Model
  • This model is document-driven and is also known as the "linear sequential" model.
  • Activities in this model are structured in a linear cascade of phases in which the output of the previous phase is the input to the next phase.
  • Phases are: Feasibility Study -> Requirements Analysis -> Technical Design -> Code and Unit Test -> Integration and System Test -> Deployment and Maintenance.
  • The linear structure of this model provides no feedback to prior phases. This is often impractical since, for example, it is not always possible to identify all requirements at the beginning of a project.
Incremental Model
  • The stages of this model consist of expanding increments of an operational software product.
  • Increments are delivered to customers as they are completed.
  • Each delivered unit is a self-contained functional unit.
  • This methodology allows for continuous validation (as opposed to a final validation phase).
  • This model retains all steps defined in the waterfall model, but performs those steps multiple times in an iterative fashion.
Spiral Model

This model iteratively utilizes the steps below to identify and mitigate high-risk issues in the development of high-quality software.

  • Step 1 - Identify quality objectives for a portion of the product.
  • Step 2 - Identify risks in achieving objectives and evaluate alternative mitigation strategies.
  • Step 3 - Develop and verify a unit of functionality.
  • Step 4 - Review development results and plan for next iteration.
Capability Maturity Model

The Capability Maturity Model (CMM) has the following characteristics:

  • The CMM is an application of process management and quality improvement concepts to software development and maintenance.
  • The CMM doesn't describe how to create an effective software organization. Rather it simply describes the features of such an organization.
  • The CMM defines an ordinal scale of five levels used to measure the capabilities of an organization's software process.

Each level of the CMM is composed of a set of key process areas for which related activities are defined:

  • Level 1 - Chaotic/Initial. Processes are unpredictable and poorly controlled. Success results from individual efforts.
  • Level 2 - Repeatable. Projects can repeat previously mastered tasks. Basic project management processes are established to track cost, schedule and functionality.
  • Level 3 - Defined. Processes are defined and understood by staff. The software process activities are standardized and documents.
  • Level 4 - Managed. Processes are measured and controlled (since you can only manage what you can measure).
  • Level 5 - Optimized. Organization is focused on process improvement.

Software Design Methodologies

Structured Analysis and Design
  • Structured analysis and design attempts to harness complexity through a hierarchical design created by iteratively decomposing program segments until the program is completed.
  • The algorithm is the fundamental building block of this design methodology.
  • Abstraction is used to focus on subproblems, ignoring the overall strategy of the algorithm.
Object-Oriented Analysis and Design
  • Classes and objects are the fundamental building blocks of this design methodology; they are extracted from the vocabulary of the problem domain.
  • Abstraction is applied to reuse the features of related classes that are already part of the object model.
  • Reuse is enabled by language features: inheritance and polymorphism.

Software Testing Strategies

Program Inspection
  • Goal: Ensure the program element follows the specification. Ensure the program element supports stated non-functional requirements.
  • Method: A review group (comprising the roles of Moderator, Author, Reviewer and Recorder), uses a checklist to review a program elements to ensure the above goals.
  • Phase: Implementation.
Black-Box Testing
  • Goal: Verify that a system or sub-system behaves as expected.
  • Method: Derive test cases from the public interface specification. May use equivalence partitioning and boundary value analysis strategies.
  • Phase: System and Integration Testing.
White-Box Testing
  • Goal: Verify that a software unit behaves as expected.
  • Method: Derive test cases from the internal structure of the program.
  • Phase: Implementation.
Load or Stress Testing
  • Goal: Gain confidence that software meets non-functional performance requirements.
  • Method: Find mean time to failure using an input set that is appropriately varied.
  • Phase: Integration Testing.
Regression Testing
  • Goal: Ensure that previously released features have not been broken by new software updates.
  • Method: Execute functional test cases to gain confidence that key software features have not been broken.
  • Phase: Maintenance.

Sunday, November 7, 2010

Logging HttpServletRequests

Motivation

Whereas observing ServletResponses is non-trivial, Sun Oracle makes it much easier to observe HTTP requests via the ServletRequestListener interface. ServletRequestListeners are awesome; they make it super easy to record and log requests, thereby improving the undestandability of your web applications. Recording and logging ServletRequests with a listener is preferable to doing that same work in the requested resource (Servlet, JSP, etc.) since the former strategy achieves an improved separation of concerns.

Implementation

Big-Oh Software's CommonWeb Framework provides RequestObserverListener, an abstract implementation of the ServletRequestListener interface that can be extended to observe ServletRequests in myriad meaningful ways. That abstract implementation of the listener interface collaborates with the ObservedHttpServletRequest class, which adapts an original ServletRequest into an object that is more easily observed, recorded and logged.

An Example: Servlet Request Logging

Using the RequestObserverListener to log incoming Servlet requests is shamefully easy. Only a few lines of Java and XML are needed:

package foo.bar;

import net.big_oh.common.web.listener.request.*;

public class MyRequestLoggerListener extends RequestObserverListener
{

 @Override
 protected void doOnServletRequestInitialized(ObservedServletRequest observedServletRequest)
 {
  System.out.println(observedServletRequest);
 }

}
... and ...
<!-- web.xml snippet -->

<listener>
  <listener-class>foo.bar.RequestLoggerListener</listener-class>
</listener>

By applying the above logging listener to a web application, you'll see that new ServletRequests are logged in this format:
Received a HTTP/1.1 request from localhost (127.0.0.1) for resource http://127.0.0.1:8080/CommonWeb/ (GET) -- jSessionId: B46435C89F9FABCE571AF701E0BC6820 -- userName: unknown