Sunday, November 28, 2010

Course Notes -- Algorithm Design and Analysis (CS5329)

This week I'm revisiting some old course notes that I originally published on www.big-oh.net. I'm moving away from self-hosting content on that domain, migrating blog-style content into the cloud ... Blogger in this case.

Following are notes that I used to study for the Algorithm Design and Analysis section of the Texas State University Computer Science graduate comprehensive examination.


Sorting Algorithms

Quick Reference

Here is a quick summary of the features offered by some of the most important sorting algorithms.

Algorithm Worst Case Average Case In Place? Stable?
Quick Sort n2 n lg n Yes No
Merge Sort n lg n n lg n No Yes
Heap Sort n lg n n lg n Yes No
Insertion Sort n2 n + d Yes Yes
Counting Sort n + k n + k No Yes

The table above uses several variables. The variable n represents the number of items to be sorted. The variable d represents the number of inversions required by the algorithm. The variable k represents an upper bound on the values to be sorted.

Quick Sort

Quick sort is a classic divide-and-conquer algorithm in which the input array is split into left and right subsets such that all elements in the left subset are <= a pivot value and all elements in the right subset are >= a pivot value.

Quicksort(A,p,r)
  if (p<r)
    q = Partition(A,p,r);
    Quicksort(A,p,q-1);
    Quicksort(A,q+1,r);

Partition(A,p,r)
  x = A[r];
  i = p-1;
  for j = p to r-1
    if A[j] <= x
      i++;
      swap(A[i],A[j]);
  i++;
  swap(A[i], A[r]);
  return i;

To achieve the best-case run time for quick sort, the partition function must evenly divide the input array each time that it is called. T(n) = 2T(n/2) + O(n) --> T(n) is O(n lg n)

In the worst-case runtime for the algorithm, each call to the partition function creates sets of size n-1 and 1. T(n) = T(n-1) + T(1) + O(n) --> T(n) = O(n2)

Merge Sort

Merge sort is also a divide-and-conquer algorithm. The algorithm relies on the fact that fewer steps are required to construct a sorted list from two sorted lists than two unsorted lists.

In the merge sort algorithm, unsorted lists are recursively divided in half until they can be divided no further. At that point, the ordered lists are merged together, forming ever-larger ordered lists.

Heap Sort

Heaps are complete binary trees (with possible exception of rightmost nodes on the lowest level) in which all non-root nodes observe the heap property: parent(i) >= i. Heap sort operates by building a heap from the input elements and then iteratively removing the top element from the heap.

Insertion Sort

Insertion sort is an easy to implement comparison sort that builds a sorted array one element at a time. The first k iterations of the algorithm produce an ordered k-element sub-list. The algorithm requires n such iterations, each of which may need to perform k comparisons in the worst case.

Insertion sort has good performance on an almost-sorted input list. Despite poor worst-case performance of the algorithm, it can out-perform other more sophisticated comparison sorting algorithms when operating on very small input sets.

Counting Sort

Counting sort is NOT a comparison sort. Counting sort takes an input of n integers whose values range from 1 to k. The general strategy for each input element x is to determine the number of inputs <= x and then put x directly into the correct output position.

Solving Recurrences

Useful Mathematical Properties
  • Summation of 1..n = n(n+1)/2
  • x = loga y --> (a to some power x equals y) --> y = ax
  • loga a = 1
Asymptotic Notation
  • Big-Oh - An upper bound:
    f is Ο(g) if for some c and n0, 0 <= f(n) <= c * g(n) for all values of n >= n0
  • Big-Omega - A lower bound:
    f is Ω(g) if for some c and n0, 0 <= c * g(n) <= f(n) for all values of n >= n0
  • Big-Theta - An asymptotically tight bound:
    f is Θ(g) if for some c1, c2 and n0, 0 <= c1 * g(n) <= f(n) <= c2 * g(n) for all values of n >= n0
Recursion Trees
  • Each node in the tree is labeled with its complexity (excluding recursive calls).
  • The complexity of recursive calls is represented by child nodes in the tree.
  • The total complexity is determined by summing the work done at each level in the tree.
Master Method

Many (but not all) recurrences can be solved using the master method. Consider recurrences of the form:
T(n) = a * T(n/b) + f(n)
where a >= 1 and b > 1 and f(n) is an asymptotically positive function.

  • Case 1 - If for some constant e > 0,
    f(n) = Ο(n (logb a - e))
    then T(n) = Θ(n (logb a))
  • Case 2 - If f(n) = Θ(n (logb a))
    then T(N) = Θ(n (logb a) lg n)
  • Case 3 - If for some constant e > 0,
    a) f(n) = Ω(n (logb a + e))
    and
    b) a * f(n/b) <= c * f(n) for some c < 1
    then T(n) = Θ(f(n))

Divide-and-Conquer Techniques

Divide-and-conquer is a recursive algorithm design technique in which subproblems are divided until they are small enough to solve directly. Solutions to subproblems are then combined to solve the problem at-large.

Divide-and-conquer algorithms have a related recurrence relation that can be used to determine the run time complexity of the algorithm.

The quick sort algorithm is an example of a divide-and-conquer algorithm.

Dynamic Programming

Elements of Dynamic Programming

Dynamic programming is applicable when subproblems share subsubproblems. It is typically utilized in solving optimization problems.

There are four steps to developing a dynamic programming algorithm:

  • Characterize the structure of an optimal solution.
  • Recursively define the value of an optimal solution.
  • Compute the value of an optimal solution in a bottom-up fashion.
  • Construct an optimal solution from computed information.
Assembly Line Scheduling - An Example

Two lines, each containing n stations. The problem is to determine which stations to choose from line 1 and which to choose from line 2 in order to minimize the total time through the factory.

  • We observe that the problem has optimal substructure: The optimal path to station Si,j contains an optimal path to either station S1,j-1 or S2,j-1.
  • Recursive definition: f1[j] = min(f1[j-1] + a1,j; f2[j-1] + t2,j-1 + a1,j)
  • Compute bottom-up with a table of the following form:
      Station 1 Station 2   ..   Station n
    line1 f/l f/l f/l f/l
    line2 f/l f/l f/l f/l
    Where f is the cost to get to that station and l records the line number of the j-1th station we used to achieve the optimal solution.
  • Optimal solution is constructed from table produced in prior step.

Greedy Strategies

Elements of Greedy Strategies

Greedy strategies are a special case of dynamic programming. In addition to exhibiting optimal substructure, problems that are appropriately solved by a greedy strategy have the greedy choice property:

A globally optimal solution can be obtained by making a locally optimal (greedy) choice.

Greedy algorithms differ from traditional dynamic programming solutions in that they are a top-down strategy.

Activity Selection - An Example

Problem is to select a max size set of mutually compatible activities. The key to solving with a greedy strategy is to sort activities in order of finish time. The greedy solution always chooses the first compatible activity (i.e. the one with the earliest finish time). That solution is greedy in that it maximizes the remaining available time.

Note that the greedy strategy described above is appropriate for soling the activity selection problem because the problem exhibits:

  • Optimal substructure
  • Greedy choice property
Coin Changing - An Example

The coin changing problem highlights the fact that greedy strategies are not always optimal. In that problem, we are asked to produce the minimal number of coins representing a dollar amount.

By way of example, consider the scenario in which we are asked to compose $0.31 from these available denominations: $0.25, $0.10, $0.01. In that case, the Greedy solution would be:
25,1,1,1,1,1,1
However, the optimal solution is:
10,10,10,1

No comments:

Post a Comment