Sunday, November 28, 2010

Course Notes -- Advanced Artificial Intelligence (CS5346)

This week I'm revisiting some old course notes that I originally published on www.big-oh.net. I'm moving away from self-hosting content on that domain, migrating blog-style content into the cloud ... Blogger in this case.

Following are notes that I used to study for the Advanced Artificial Intelligence section of the Texas State University Computer Science graduate comprehensive examination.


Conventional Search

Quick Reference

Conventional search strategies utilize only the facts provide in the problem definition. They are dumb searches in the sense that they are only capable of generating successors and distinguishing goal states from non-goal states.

Here is a quick summary of the features offered by some of the most important conventional search strategies.

Strategy Summary Complete? Optimal? Space
Breadth First Search Uses a queue to manage successor nodes. Yes Yes Ο(bd+1)
Depth First Search Uses a stack to manage successor nodes. No (when the search space is infinitely deep) No Ο(bm)
Depth Limited Search A derivation of depth first search that treats nodes at some cutoff maximum depth as though they had no successors. No No Ο(bl)
Iterative Deepening Depth Limited Search Iteratively runs depth limited search with increasing cutoff values until a goal state is found. Yes Yes Ο(bd)

The table above uses several variables. The variable b represents the branching factor of the search space. The variable d represents the depth of the shallowest solution in the search space. The variable m represents the maximum depth of the search tree.

Informed Search

Best-First Strategies

Informed searches use problem-specific knowledge to find solutions (goal states) more efficiently. An important subclass of informed searches, best-first searches navigate the search space by choosing the next (best) node for expansion based on an evaluation function. The efficiency of a best-first search algorithm depends entirely on the accuracy of the evaluation function -- i.e. how often it can recommend the node that is next in leading us to a goal state. There are many best-first search algorithms, differing only in the evaluation functions that they use.

A priority queue is useful data structure in implementing best-first search algorithms.

Greedy Best-First Search

The greedy best-first search strategy uses the evaluation function f(n) = h(n), where h(n) is a heuristic that judges how close the current node is to a goal state. Solutions found by this evaluation strategy may not be optimal. Moreover, the strategy is not complete.

A* Search

The A* search strategy is a best-first search strategy that uses the evaluation function f(n) = g(n) + h(n), where g(n) is the cost to get to the current node and h(n) is a heuristic estimate of the cost from the current node to a goal state. If h(n) is an admissible heuristic, then A* is both complete and optimal. An admissible heuristic is one that never overestimates the cost to reach a goal state.

Problem relaxation and combination are both important methods for developing admissible heuristics for the A* search strategy.

Local Search

Overview

Local search strategies can be used when we seek only a goal state, but not a path to the goal state. In general, local search algorithms operate by starting with start with a goal state and then moving to a neighbor state. Local search strategies typically use much less memory than both the conventional and informed search methods described in the sections above.

Examples

Here is a list of some important local search strategies.

  • Hill climbing is the most basic local search technique. In hill climbing, we iteratively move to neighbor states with greater value until we hit a local maximum.
  • Random-restart hill climbing performs the hill climbing algorithm several times from random starting states.
  • In simulated annealing, the strategy is to pick a random successor move. If the move is an improvement, it is kept. Otherwise, if the move is not an improvement, it is only kept with some probability less than 100%.

Minimax Adversarial Search

Overview

The minimax search strategy seeks to minimize the maximum possible loss a player will incur when playing a turn-based multiplayer game. The minimax search strategy assumes that the opposing player will always make optimal moves. The strategy also relies on having a utility function that can be used to score game states as well as a terminal-test function that can be used to identify terminal game states.

We typically describe the players of the game as A (who is trying to maximize the score) and B (who is trying to minimize the score. At each step in the minimax search tree, we assume that A and B make optimal choices, meaning that A selects the child with the largest score and B selects the child with the smallest score.

Example

Wikipedia has an excellent example of a 2-ply minimax search tree in which circles represent "A's turn" and squares represent "B's turn." Notice that circles (A) are filled-in with the largest child value and squares (B) are filled-in with the smallest child value.

Pseudo Code

The implementation of the minimax search strategy is very simple, requiring only two nearly identical recursive control functions:

function MaxValue(state) returns a utility value
 if Terminal-State(state) 
  return Utility(state); 
 v = -infinity;
 for s in Successors(state) 
  v = Max(v, MinValue(s));
 return v;
 
function MinValue(state) returns a utility value
 if Terminal-State(state)
  return Utility(state);
 v = +infinity;
 for s in Successsors(state)
  v = Min(v, MaxValue(s));
 return v;
Alpha-Beta Pruning

Alpha-Beta pruning is a technique that allows us to calculate a correct minnimax decision without considering every node in the game tree. Specifically, alpha-beta pruning trims away sub trees that cannot possibly influence the final value calculated by the minimax search strategy.

To apply the alpha-beta pruning concept to our minimax functions, we need only introduce two new parameters to the pseudo-code cataloged above:

  • a - the best (largest) value in the search path for Max
  • b - the best (smallest) value in the search path for Min
function MaxValue(state, a, b) returns a utility value
 if Terminal-State(state) 
  return Utility(state);
 v = -infinity;
 for s in Successors(state) 
  v = Max(v, MinValue(s, a, b));
  if v >= b
   return v;
  else
   a = Max(a, v);
 return v;
function MinValue(state, a, b) returns a utility value
 if Terminal-State(state)
  return Utility(state);
 v = +infinity;
 for s in Successsors(state)
  v = Min(v, MaxValue(s, a, b));
  if v <= a
   return v;
  else
   b = Min(b, v);
 return v;
Imperfect Search

The minimax search procedure (with or without alpa-beta pruning) is an optimal search strategy. However, the large search space of most games prohibits execution of the minimax search procedure down to terminal nodes. In order to search for a good solution to an adversarial problem, an evaluation function can be used to turn non-terminal nodes into terminal nodes. To conduct imperfect searches, we must make two modifications to the minimax search routine:

  • The utility function is replaced by a heuristic evaluation function.
  • The terminal test is replaced by a cutoff test that decides when to call the evaluation function.

Knowledge Representation

Motivation

Logics (propositional or first-order predicate) are useful for building knowledge bases that can be used by logical agents to infer the answers to submitted problems. Such systems are generally declarative (as opposed to procedural) and are implemented by inference algorithms that are both sound (truth preserving) and complete.

Propositional Logic

The syntax of propositional logic is characterized by atomic sentences comprised of propositional symbols (e.g. R P Q), negations (-), conjunctions (^), disjunctions (or), implications (->) and biconditionals (<->). Truth tables are one way of defining the meaning of the above operators.

Propositional sentences can be classified according to validity and satisfiability:

  • A valid proposition is one that is true in every model.
  • A satisfiable proposition is one that is true in some model.
  • A non-satisfiable proposition is one that is true in no model.

Inference rules can be applied to propositional sentences to soundly derive new sentences.

  • Modus Ponens:
    * a->b
    * a
    entails b
  • And-Elimination:
    * a ^ b
    entails a
  • Unit Resolution:
    * a or b
    * -b
    entails a
  • Resolution:
    * a or b
    * -b or c
    entails a or c

Horn clauses are propositional logic clauses that are limited to being disjunctions of literals, of which at most one is positive. Every Horn clause can be rewritten as an implication. For example, the Horn clause (-a or -b or c) can be rewritten as the implication a ^ b -> c. Horn clauses support forward chaining and backward chaining.

Another standard format for propositional logic sentences is conjunctive normal form. All propositional logic sentences can be translated to conjunctive normal form, which is a conjunction of disjunctions: (A or -B or C) ^ (-D or E) ^ (F or G or H)

De Morgan's laws are useful when converting to CNF:

  • -(A ^ B) can be rewritten as (-A or -B)
  • -(A or B) can be rewritten as (-A ^ -B)
First-Order Predicate Logic

Propositional logic is not sufficient to succinctly represent complex environments. First-order predicate logic is more powerful because it deals with objects and their relations, not just propositional facts. The syntax of first-order predicate logic is comprised of:

  • Constants, which stand for objects -- e.g. John
  • Predicates, which stand for relations -- e.g. Brother(John, Tom)
  • Functions

First-order predicate logic also allows the use of the logical connectives we saw in propositional logic: ^, V, -, ->, <->. In addition, first-order predicate logic also allows the use of quantifiers, which allow us to make single statements that apply to collections of objects. The universal quantifier (∀) and the existential quantifier (∃) can be used to rewrite multiple propositional logic sentences as a single first-order predicate logic sentence. For example, we can capture the idea that "everyone loves someone" using this first-order predicate logic sentence: ∀x ∃y Loves(x,y)

Expert Systems

The techniques of forward chaining and backward chaining can both be used to construct expert systems.

Forward Chaining

Forward chaining is an example of data-driven reasoning -- i.e. reasoning in which the focus of attention begins with all known facts.

The forward chaining algorithm seeks to determine whether a query q is entailed by the (Horn clause) sentences in a knowledge base. Forward chaining begins by collecting all of the know facts (positive literals) in the database. Next, using implication, if all premises of a knowledge base sentence are known, its conclusion is added to the collection of facts. That process is repeated until either the query q is derived as a fact or until no further facts can be generated.

Backward Chaining

Backward chaining is an example of goal-directed reasoning -- i.e. reasoning in which the focus of attention begins with the query statement.

The backward chaining algorithm works backward from the query statement. The algorithm attempts to prove true a set of statements from which the query q is entailed. For each statement s needed to entail q, backward chaining is recursively applied to prove s.

Course Notes -- Survey of Software Engineering (CS5391)

This week I'm revisiting some old course notes that I originally published on www.big-oh.net. I'm moving away from self-hosting content on that domain, migrating blog-style content into the cloud ... Blogger in this case.

Following are notes that I used to study for the Survey of Software Engineering section of the Texas State University Computer Science graduate comprehensive examination.


Software Development Lifecycle Models

Waterfall Model
  • This model is document-driven and is also known as the "linear sequential" model.
  • Activities in this model are structured in a linear cascade of phases in which the output of the previous phase is the input to the next phase.
  • Phases are: Feasibility Study -> Requirements Analysis -> Technical Design -> Code and Unit Test -> Integration and System Test -> Deployment and Maintenance.
  • The linear structure of this model provides no feedback to prior phases. This is often impractical since, for example, it is not always possible to identify all requirements at the beginning of a project.
Incremental Model
  • The stages of this model consist of expanding increments of an operational software product.
  • Increments are delivered to customers as they are completed.
  • Each delivered unit is a self-contained functional unit.
  • This methodology allows for continuous validation (as opposed to a final validation phase).
  • This model retains all steps defined in the waterfall model, but performs those steps multiple times in an iterative fashion.
Spiral Model

This model iteratively utilizes the steps below to identify and mitigate high-risk issues in the development of high-quality software.

  • Step 1 - Identify quality objectives for a portion of the product.
  • Step 2 - Identify risks in achieving objectives and evaluate alternative mitigation strategies.
  • Step 3 - Develop and verify a unit of functionality.
  • Step 4 - Review development results and plan for next iteration.
Capability Maturity Model

The Capability Maturity Model (CMM) has the following characteristics:

  • The CMM is an application of process management and quality improvement concepts to software development and maintenance.
  • The CMM doesn't describe how to create an effective software organization. Rather it simply describes the features of such an organization.
  • The CMM defines an ordinal scale of five levels used to measure the capabilities of an organization's software process.

Each level of the CMM is composed of a set of key process areas for which related activities are defined:

  • Level 1 - Chaotic/Initial. Processes are unpredictable and poorly controlled. Success results from individual efforts.
  • Level 2 - Repeatable. Projects can repeat previously mastered tasks. Basic project management processes are established to track cost, schedule and functionality.
  • Level 3 - Defined. Processes are defined and understood by staff. The software process activities are standardized and documents.
  • Level 4 - Managed. Processes are measured and controlled (since you can only manage what you can measure).
  • Level 5 - Optimized. Organization is focused on process improvement.

Software Design Methodologies

Structured Analysis and Design
  • Structured analysis and design attempts to harness complexity through a hierarchical design created by iteratively decomposing program segments until the program is completed.
  • The algorithm is the fundamental building block of this design methodology.
  • Abstraction is used to focus on subproblems, ignoring the overall strategy of the algorithm.
Object-Oriented Analysis and Design
  • Classes and objects are the fundamental building blocks of this design methodology; they are extracted from the vocabulary of the problem domain.
  • Abstraction is applied to reuse the features of related classes that are already part of the object model.
  • Reuse is enabled by language features: inheritance and polymorphism.

Software Testing Strategies

Program Inspection
  • Goal: Ensure the program element follows the specification. Ensure the program element supports stated non-functional requirements.
  • Method: A review group (comprising the roles of Moderator, Author, Reviewer and Recorder), uses a checklist to review a program elements to ensure the above goals.
  • Phase: Implementation.
Black-Box Testing
  • Goal: Verify that a system or sub-system behaves as expected.
  • Method: Derive test cases from the public interface specification. May use equivalence partitioning and boundary value analysis strategies.
  • Phase: System and Integration Testing.
White-Box Testing
  • Goal: Verify that a software unit behaves as expected.
  • Method: Derive test cases from the internal structure of the program.
  • Phase: Implementation.
Load or Stress Testing
  • Goal: Gain confidence that software meets non-functional performance requirements.
  • Method: Find mean time to failure using an input set that is appropriately varied.
  • Phase: Integration Testing.
Regression Testing
  • Goal: Ensure that previously released features have not been broken by new software updates.
  • Method: Execute functional test cases to gain confidence that key software features have not been broken.
  • Phase: Maintenance.

Sunday, November 7, 2010

Logging HttpServletRequests

Motivation

Whereas observing ServletResponses is non-trivial, Sun Oracle makes it much easier to observe HTTP requests via the ServletRequestListener interface. ServletRequestListeners are awesome; they make it super easy to record and log requests, thereby improving the undestandability of your web applications. Recording and logging ServletRequests with a listener is preferable to doing that same work in the requested resource (Servlet, JSP, etc.) since the former strategy achieves an improved separation of concerns.

Implementation

Big-Oh Software's CommonWeb Framework provides RequestObserverListener, an abstract implementation of the ServletRequestListener interface that can be extended to observe ServletRequests in myriad meaningful ways. That abstract implementation of the listener interface collaborates with the ObservedHttpServletRequest class, which adapts an original ServletRequest into an object that is more easily observed, recorded and logged.

An Example: Servlet Request Logging

Using the RequestObserverListener to log incoming Servlet requests is shamefully easy. Only a few lines of Java and XML are needed:

package foo.bar;

import net.big_oh.common.web.listener.request.*;

public class MyRequestLoggerListener extends RequestObserverListener
{

 @Override
 protected void doOnServletRequestInitialized(ObservedServletRequest observedServletRequest)
 {
  System.out.println(observedServletRequest);
 }

}
... and ...
<!-- web.xml snippet -->

<listener>
  <listener-class>foo.bar.RequestLoggerListener</listener-class>
</listener>

By applying the above logging listener to a web application, you'll see that new ServletRequests are logged in this format:
Received a HTTP/1.1 request from localhost (127.0.0.1) for resource http://127.0.0.1:8080/CommonWeb/ (GET) -- jSessionId: B46435C89F9FABCE571AF701E0BC6820 -- userName: unknown