Sunday, November 28, 2010

Course Notes -- Design of Programming Languages (CS5318)

This week I'm revisiting some old course notes that I originally published on www.big-oh.net. I'm moving away from self-hosting content on that domain, migrating blog-style content into the cloud ... Blogger in this case.

Following are notes that I used to study for the Design of Programming Languages section of the Texas State University Computer Science graduate comprehensive examination.


Language Design Criteria

The sub-headings below catalog some of the most important language design criteria.

Efficiency

The efficiency of a programming language relates to the speed/cost of performing basic activities of a language's lifecycle. Many facets of a programming language can be evaluated with respect to efficiency.

  • Efficiency of Execution - Influenced by language features like recursion and dynamic memory allocation.
  • Efficiency in Translation - How many passes does the compiler need?
  • Efficiency in Implementation - Influenced by the number of features provided and by level of orthogonality.
  • Efficiency in Programming - Time to train programmers and time to solve problems in the language.
  • Efficiency in Maintenance - Largely determined by readability.
Regularity

The regularity of a language relates to the interaction of its features. A language's regularity can be further decomposed into the following characteristics.

  • Generality - A language is general if there are no special cases in the availability or use of the language's operators. An example of failed generality in the C language is that individual items can be compared with the "==" operator, but entire structs cannot be compared using that operator.
  • Orthogonality - The constructs of a language are orthogonal if they do not behave differently in different contexts and can be easily combined in all meaningful ways. An example of failed orthogonality in the C language is that local variables can only be defined at the beginning of the block.
  • Uniformity - In a uniform language, things that are similar should look similar and things that are different should look different. An example of failed uniformity in the C language is that the difference between "=" and "==" is confusing.
Simplicity

Simple languages have the following properties.

  • Simple languages have fewer features, which makes them easier to implement.
  • Simple languages may have increased implementation efficiency, but decreased programming efficiency and decreased expressiveness.

Abstraction

Abstractions hide unnecessary details, those that are only relevant to the internal implementation. For example, the String data type is an abstraction in that it hides the implementation details of storing and manipulating character data. Another example abstraction is that the "if" statement is an abstraction in that it eliminates the need to consider logic-based jump commands.

The table below describes a taxonomy of abstraction levels in programming languages.

Abstraction Level Data Control
Basic Variables as abstractions for memory locations. Variable assignment and jump statements.
Structured Arrays, records, structs, classes, lists, etc. Guarded commands.
Unit Packages, components, libraries, etc.
Parallel Threads, processes, tasks, etc.

Computational Paradigms

A single programming language may support one or more computational paradigms. The sub-headings below details some of the computational paradigms covered by the comprehensive exam.

Imperative

The imperative computational paradigm is closely tied to the structure of the von Neumann computer model.

  • A.k.a. procedural.
  • Examples include Fortran, C, Pascal and Algol
  • Characterized by sequenced instruction execution, loops, and variables that represent memory locations.
Functional

The functional computational paradigm structures programs in a manner that mirrors the structure of lambda-calculus.

  • A.k.a. applicative.
  • Examples include Lisp, Miranda and Scheme.
  • Reasoning about this type of programs is easier because it is based on recursive function theory.
  • Characterized by (recursive) function calls as the basis for program control. Instead of having variables that represent memory locations, this paradigm has function parameters that accept input values.
Logic

The logical computational paradigm is a higher-level paradigm that abstracts away the need to describe "what to do," requiring that the programmer describe only "what s/he wants."

  • A.k.a. declarative.
  • Examples include Prolog and SQL.
  • Characterized by variables that represent partial computations, which are inputs to Horn clause resolution.

Programming Language Levels

When writing a program in any language, errors can arise at many different levels:

  • Lexical - identifiers, operators, separators, comments
    // not a valid operator or identifier
    int x#1 = 2;
  • Context-Free Syntax - dictates the form of statements, functions, modules, etc.
    // missing parentheses violates 
    // if statement syntax
    if x == y { a=b; }
  • Context-Sensitive Syntax - type checking, forward references, variable declaration, etc.
    // duplicate variable declaration
    double a;
    ...
    int a; 
      
  • Dynamic Semantics - divide by zero, use of un-initialized variable, etc.
    double a = 0;
    double b = 12 / a;
      

Grammars

Definition

A grammar is defined by:

  • A finite set of non-terminal symbols.
  • A finite set of terminal symbols.
  • A single start symbol.
  • A finite set of production rules.
Context-Free Grammars

A Context-free grammar consists of a series of grammar rules that observe the form:

Non-terminal -> {terminals and non-terminals}

This type of grammar is context-free because each left-hand-side non-terminal can always be replaced by any right-hand-side choice, no matter where the non-terminal might appear.

Additionally, a context-free grammar also has a distinguished non-terminal called a start symbol. The language for the grammar is the set of all strings of terminals that can be derived from the start symbol.

By defining a programming language using a context-free grammar, the grammar also implicitly defines the steps that a parser must take to construct a parse tree for a string.

Attribute Grammars

Attribute grammars are an extension of context-free grammars. Because context-free grammars describe only the lexical and syntax levels of a language, they are unable to describe many features of modern programming languages. For example, context-free grammars are unable to enforce type checking. Attribute grammars provide the ability to enforce static semantic rules, like type checking.

An attribute grammar has these properties:

  • Each grammar symbol has a set of (inherited or synthesized) attributes.
  • Each grammar rule has a set of (semantic of predicate) functions, which are defined over the attributes of the symbols in the rule.

No comments:

Post a Comment