Catch errors that are not caught by parsing (because many constructs are not context-free).

Examples (C):

All identifiers are declared
Types
...

The set of legal programs is a subset of the parse-able programs.

Scope

Match identifier declarations with uses

Important static analysis step

Example 1:

	void foo (int n) {
	  int a = 0, i;
    printf("a = %d\n", a);
		for (i = 0; i < n; i++) {
      int a = 1;
      printf("a + j = %d\n", a + j); //a's value is 1. where is j. error?
		}
    printf("a = %d\n", a);
  }

The scope of an identifier is the porition of the program in which that identifier is accessible
The same identifier may refer to different things in different parts of the program
- Different scopes for same name don't overlap
An identifier may have restricted scope
Most languages have static scope
- Scope depends only on the program text, not run-time behaviour. e.g., C
A few languages are dynamically scoped
- Lisp (earlier versions, now moved to static scoping), SNOBOL
- Scope depends on execution behaviour of a program

A dynamically-scoped variable refers to the closest enclosing binding in the execution of the program

Example:

void foo(int x)
{
  int a = 4;
  bar(3);
}
void bar(int y)
{
  printf("a = %d\n", a); //refers to a in the closest enclosing binding in the execution of the program
}

We will talk about dynamic scope later ...
In C, identifier bindings are introduced by:
- Function definitions (introduces method names; only allowed at top-level)
- Struct definitions (introduces type names aka class names)
- Variable declarations (introduces objects of certain types; objects represent regions of memory)
- Structure field definitions (introduces objects)
- Function argument declarations (introduces objects of certain types)
- Typedefs (introduces type names)
Not all identifiers follow the most-closely nested rule
- Function definitions in C cannot be nested
- Forward declarations are allowed for functions and variables (extern keyword). Thus a variable can be used before it is defined.
- In C++, you can use member functions for the type-variables even if they have never been declared/defined (type-variables are the arguments to the template keyword)
- In C++ class declaration, you can use a member variable in a method before it is defined.
- Some languages allow identifiers to be overridden within the same scope, others don't. Some languages allow identifiers to be overridden in different nested scopes, others may not.

Symbol Tables

Much of semantic analysis can be expressed as a recursive descent of an AST
- Before: Process an AST node n
- Recurse: Process the children of n
- After: Finish processing the AST node n
When performing semantic analysis on a portion of the AST, we need to know which identifiers are defined
A symbol table is a data structure that tracks the current bindings of the identifiers
Example: the scope of a variable declaration is one subtree of the AST
```
   { //new scope
     int x = 4;  //declarations
     f(x);
   }
   Tree:
   NewScope -->(left child) declarations x = 4
   NewScope -->(right child) statements
	
```
On entry to the new scope, the new declarations will be added to the symbol table before recursing down right child (statements). On returning from the new scope, the new declarations will be removed and the old declarations of x (if any) will be restored in the symbol table.
For implementing a symbol table, we can use a stack of scopes (assuming C89)
- enter_scope(): start a new nested scope
- find_symbol(x): search stack starting from top, for the first scope that contains x. Return first x found or NULL if none found
- add_symbol(x): add a symbol to the current scope (top scope)
- check_scope(x): true if x defined in the current scope (top scope). allows to check for double definitions.
- exit_scope(): exit current scope
For implementing forward declarations, or C++ style use-before-define styles, one may have to make multiple passes over the AST. First pass: gather all member names. Second pass: Do the checking. In general, semantic analysis requires multiple passes (often more than two). Easier to break the analysis into simpler passes, as opposed to one big complicated pass.

Types

What is a type
- The notion varies from language to language
- Typically
  - A set of values
  - A set of operations on those values
- Classes are one instantiation of the modern notion of type
Consider the assembly language fragment:
```
  add $r1,$r2,$r3
	
```
What are the types of $r1, $r2, $r3? Can't say. For all you know, r1 may represent a bit-pattern that represents a string. The only thing we can say is that these have the base-type bitvector.
Types restrict the set of legal programs further
- e.g., addition on a string seems quite unlikely
- Tradeoff: catch common mistakes (e.g., addition to string is quite unlikely), but may eliminate some more-optimized programs (we can still specify all programming functionality, just that the remaining set of programs may not be as optimized as the original set of programs).
- Different programming languages take different tradeoff points, e.g., OCaml has immutable variables (i.e., values are written only at time of creation) while C has mutable variables (values can be written at any time).
- Well-typed programs (or well-defined programs) are a subset of parseable programs that have all symbol uses matched with its corresponding definition.
  - A type checking logic can execute either at compile time or at runtime to disambiguate well-typed programs from not-well-typed programs
  - or this checking responsibility can be left to the programmer (undefined behaviour)
Certain operations are legal for values of each type
- It doesn't make sense to add a function pointer and an integer in C.
- It does make sense to add two integers
- But both have the same assembly language implementation!
A language's type system specifies which operations are valid for which types
The goal of type checking is to ensure that operations are used with correct types
- Enforces intended interpretation of values, because nothing else will!
Three kind of languages
- Statically typed: All or almost all checking of types is done as part of compilation (C, Java)
- Dynamically typed: Almost all checking of types is done as part of program execution (Scheme, Lisp, Python, Perl)
- Type unchecked: No type checking. All strings in the language (e.g., CFG) are valid strings. e.g., machine code, or some parts of C, e.g., integer overflow.
Another variation of typing: undefined behaviour. Accessing word beyond the length of an array is not type-checked in C. But if this happens at runtime, the program is not well-defined (or well-typed). The type-system is in programmer's head, but nobody will check it for you!
Competing views on static vs. dynamic typing
- Long-standing debate which is better
Static typing proponents say:
- Static checking catches many programming errors at compile time
- Avoids overhead of runtime checks
Dynamic typing proponents say:
- Static type systems are restrictive. Restricts the programs that you can write (even though they may be well-typed at runtime)
  - Array de-reference quite hard to verify statically. Restricting programs to only those programs that can be verified statically is a non-starter. Java decides to use runtime checks (dynamic type-checking).
- Rapid prototyping difficult within a static type system
Some strange things about typing
- A lot of code is written in statically typed languages with an "escape" mechanism
  - Unsafe casts in C, Java
  - Philosophy: if the programmer knows what she is doing, let her do whatever she wants. e.g., addition on strings. If the programmer is making a mistake, the program will behave badly, but the language does not provide any guarantees (after all you took the matter in your hands)
- People retrofit static typing to dynamically typed languages
  - Avoid runtime costs
  - Debugging
  - This can only be best-effort, no guarantees. Some dynamic checks may remain
In C
- The user declares types for identifiers
- The compiler infers types for expressions
Type checking is the process of verifying fully typed programs. Types are already available. We just check if the type rules are obeyed.
Type inference is the process of filling in missing type information.
Both type-checking and type-inference rely on the same set of typing rules. Type-checking and type-interference are different terms, but people often use them interchangeably for this reason.

Type checking formalism

Inference rules have the form
- If Hypothesis is true, then Conclusion is true
The hypothesis and conclusion can be written in arbitrary logic, e.g., propositional logic where the atomic propositions are formed using relational operators over values drawn from a set. Another example is first-order logic.
Rules of inference are a compact notation for "If-Then" statements
The notation is easy to read with practice
We will start with a simplified system and gradually add features
Type checking computes via reasoning
- If E1 and E2 have certain types, then E3 has a certain type
Building blocks
- Symbol ^ is "and"
- Symbol => is "if-then"
- x:T is "x has type T"
If e1 has type Int and e2 has type Int, then e1+e2 has type Int
(e1:Int ^ e2:Int) => e1+e2:Int
(e1:Int ^ e2:Int) => e1+e2:Int is a special case of Hypothesis1 ^ Hypothesis2 ... ^ HypothesisN => Conclusion

This is an inference rule

By tradition inference rules are written

|- Hypothesis ... |- Hypothesis
-------------------------------
     |- Conclusion

e.g., C types have hypotheses and conclusions of the form
```
|- e:T
	
```
|- means "it is provable that ..."

Some rules

i is an integer literal (constant)
----------------------------------  [Int]
     |- i:int

|-e1:int    |-e2:int
----------------------------------  [Add]
     |-e1+e2:int

These rules give templates describing how to type integers and + expressions

By filling in the templates, we can produce complete typings for expressions

1 is an int literal
-------------------
  |- 1:int

2 is an int literal
-------------------
  |- 2:int

|-1:int   |-2:int
-----------------
   |-1+2:int

Type Checking

Type checking proves facts e:T
- Proof is on the structure of the AST
- Proof has the shape of the AST
- One type rule is used for each AST node
In the type rule used for a node e:
- Hypotheses are the proofs of types of e's subexpressions
- Conclusion is a type of e
Types are computed in a bottom-up pass over the AST

Subtyping relation between types

S <: T is read "S is a subtype of T" or "T is a supertype of S".
S is a subtype of T if any term of type S can be safely used in a context where a term of type T was expected.
The precise semantics of subtyping here crucially depends on the particulars of how "safely be used" and "any context" are defined by a given type formalism or programming language. The type system of a programming language essentially defines its own subtyping relation.
Consider two types int and bool. Let's hypothetically say that every operator that is defined for a term of type bool is also defined for a term of type int, e.g., 1 && 3 evaluates to say 4 (for some strange definition of the semantics of && on integers. But that does not necessarily mean that int is a subtype of bool.
Due to the subtyping relation, a term may belong to more than one type. Subtyping is therefore a form of type polymorphism.
Subtype is the same as "derived" type in an OO language
Supertype is the same as "base" type in an OO language

Example: the following code is type-correct

class A { ... };
class B : A { ... };
A a = B();

Subtyping relation is reflexive, transitive, and anti-symmetric:
- Reflexive: S <: S
- Transitive: S <: T and T <: U implies S <: U.
- Antisymmetric: S <: T and T <: S implies S=T.
In C/C++:
- char is not a subtype of int. Similarly, int is not a subtype of float.
  - Consider a function declaration of type void foo(int i) { ... }. Strictly speaking, it is not legal to call foo('c') (where 'c' is of char type. Similar is the case for float and int.
  - It is a different matter that compilers implement implicit type conversion, also called implicity type casting where they implicitly cast char c to int. This casting could also have been done explicitly by the programmer, to make the program legal. Recall that type casting is a backdoor provided in C to allow type conversions for programmer convenience.
  - Why did the C designers choose not to make char a subtype of int? Perhaps, because they wanted to allow the programmer to have (among other things) two versions of a function foo, one that accepts a value of type char as an argument, and another that accepts a value of type int as an argument, also called function overloading.
  - Why not allow function overloading but still keep char as a subtype of int, and choose the most precisely fitting function definition for foo in case there are multiple choices? That is an interesting proposal, but then we also need to tackle the case where foo takes multiple arguments, and one definition is foo(int, char) and another is foo(char, int). It is quite unclear what to do in this case.
  - Implicit type casting is an optional feature in a compiler.
  - The rules for implicit type casting in a compiler can also be expressed using inference rules. However, in practice, so far, both type checking and implicit type casting is done directly in imperative code, instead of through a DSL to express these inference rules. Yet, it is much easier to learn about these rules as succinct and neat inference rules.
  - There is a push towards specifying the language standards (e.g., C standard) through formal inference rules, they are currently specified using english text.

Soundness of a static typesystem

Let the execution semantics define that an expression is evaluated to a value v. Further, let value v be of type T.
Let the static typing rules define that an expression is of type T, i.e., e:T.
A static typesystem is sound with respect to an execution semantics if whenever e:T, then e evaluates to a value of subtype S<:T.
The soundness of a static typesystem can be ascertained by checking the soundness of eeach type rule.
We want only sound type rules, but some sound type rules are better (i.e., more precise) than others:
- The following type rule is sound
```
|- 1 : int   |- 2 : int
-----------------------
    |- 1+2 : void
     
```
- The following type rule is also sound
```
|- 1 : int   |- 2 : int
-----------------------
    |- 1+2 : int
     
```
Even though both type rules are sound, the second type rule is more precise than the first one. In other words, the second type rule admits more desirable programs as type-correct than the first rule. For example (1+2)+3 will be type-incorrect with the first rule, but will be type-correct with the second rule.
[Optional discussion] What does completeness mean here? A static typesystem would be complete with respect to an execution semantics if whenever e evaluates to a value of type T, then e:T is provable. However, achieving completeness is not possible in the presence of loops. For example, completeness requires that all programs that can produce a value v:T should also statically evaluate to type T; however, in the presence of loops, a value may be computed only if the loop terminates. If the loop never terminates, the type-checker should assign a type that agrees with non-termination (e.g., void to indicate that this value can never be used, or a stronger type that indicates non-termination explicitly). This would amount to statically identifying if the loop terminates, but that is the halting problem. There is more discussion on completeness later.

Our programming language

Our programming language is a subset of C/C++ with the following exceptions/features:

Declarations appar only at the beginning of a scope.
Even statements represent values and are thus associated with a type. For this reason, we will just call a statement, an expression.
Object-oriented features are supported with subtyping, i.e., the programmer can define custom types and subtypes.
Primitive types include bool and int. Important: void is the supertype of all types.

Type rules for our programming language

------------------  [false]
  |- false : bool

------------------  [true]
  |- true : bool

 s is a character literal
------------------------- [char]
  |- s : char

 s is a string literal
----------------------- [string]
  |- s : char *

    |-b:bool
----------------------- [Not]
  |- !b : bool

  |-x:int  |-y:int
----------------------- [Relational]
  |- x<y : bool

Notice that !i where i:int is not well-typed in our language. Similarly, true<false is not well-typed in our language.

Two expressions (or statements) may be sequenced using the semi-colon operator, such that the type of the compound expression is the same as the type of the second expression. If a new scope is created with an expression in its body, then the type of that new scope is the same as the type of its body.

|- e1:T1    |- e2:T2
-------------------- [Sequence]
  |- e1;e2 : T2

 |- e:T
---------- [Scope]
 |- {e}:T

For, example, the expression { { 2 + 3; 4; }; 5 } evaluates to 5.

Assignment expression

Now let's consider an assignment expression (or statement). Recall that an assignment expression evaluates to the value that has been assigned, and its type is the type of the assigned variable.

 |-x:T0  |-y:T1 T1<:T0
-------------------------- [Assignment]
  |- x := y : T1

This rule seems to be giving a valid type to the following expression: 1 := 2, but this would be incorrect in our language. To support an assignment, we need to distinguish between a reference type and a value type.

For a primitive type T, its corresponding reference type is written T&. A reference type represents an object (region of memory) holding the corresponding value type. By definition, T& is a subtype of T, i.e., T& <: T.

Example: the following subtyping relations can be present in our language: U<:T, T&<:T, U&<:U, U&<:T& (to form a diamond structure).

Thus, the assignment rule now becomes:

 |-e0:T0&  |-e1:T1 T1<:T0
------------------------------- [Assignment]
  |- e0 := e1 : val(T1)

Here val(T1) returns the value type of T1 if T1 is a reference type; else it returns T1 itself. Notice that the conclusion assigns a value type to the assignment expression, which rejects (x := y) := z expressions, but accepts x := (y := z). If we want to admit (x := y) := z, we may want to change the inferred type in the conclusion to T1& (the type of x), but that would be quite non-intuitive, as that would likely mean that we are assigning z to y in the execution semantics.

The assignment rule admits the following code to be well-typed:

class A { ... };
class B : A { int y; ... };
A a = B();

If we replace this rule with a less general version like the following, then the code above does not remain well-typed.

 |-x:T&  |-y:T
------------------- [Assignment]
  |- x := y : T

This above rule is sound but not precise, in the sense it does not admit all the programs that we would like to admit. This is similar to how a previous imprecise rule did not admit (1+2)+3. Notice that the conclusion assigns the more specific type T1, which admits (A a = B()).y as type correct.

While loop

 |-e:TB  |-x:T  TB<:bool
---------------------------- [While]
  |- while (e) x : void

This is a design decision. One could have said that the type of a while loop is the type of the last statement that executes. But then what if the statement didn't execute even once? Hence a type 'void' means that the statement has a type which is not usable.

Address-of

   |-s:T&
----------------------- [address-of]
  |- &s : T *

Notice that the address-of operator is only well-typed when used on a reference type, e.g., it is illegal to say &1.

If-then-else

    |- e1 : TB
    |- e2 : T2
    |- e3 : T3
    TB <: bool
--------------------------------  [if-then-else]
|- if e1 then e2 else e3 : void

Notice that it is by design that if-then-else always returns type, even when T2=T3.

Ternary

    |- e1 : TB
    |- e2 : T2
    |- e3 : T3
    TB <: bool
--------------------------------  [if-then-else]
|- e1 ? e2 : e3 : val(lub(T2,T3))

An upper bound U(T2,T3) of types T2 and T3, is defined so that the following hold: T2<:U and T3<U. A least upper bound lub(T2,T3) is defined so that lub is an upper-bound of T2 and T3, and the further, for any other upper bound U(T2,T3), lub <: U holds.

The lub need not be unique for a partially ordered set (poset), even in the presence of a void type. However, for our language, it turns out that lub is unique. If lub is not unique, this rule should be modified appropriately, e.g., to resolve this ambiguity. One option is to have a hypothesis that requires that there must exist a unique lub of T2 and T3 for this expression to be well-typed.

The val operator ensures that this cannot be now used in the LHS of an assignment.

Variable

But what is the type of a variable reference?

   x is a variable
  ----------------  [Var]
      |- x:?

The local, structural rule does not carry enough information to give x a type. We need more information in the rules.

A type environment gives types for free variables

A type environment is a function from ObjectIdentifiers to Types
A variable is free in an expression if it is not defined within the expression
- x is free in expression x
- x is free in expression x+y
- x is free in expression {int y = ...; x+y;}
- y is a bound variable in expression {int y = ...; x+y;}

The sentence O |- e:T is read

Under the assumption that free variables have the types given by O, it is provable (|-) that expression e has the type T

i is an integer literal (constant)
----------------------------------  [Int]
    O |- i:int

 O |- e1:int    O |- e2:int
----------------------------------  [Add]
    O |- e1+e2:int

Notice that the same assumptions are required in both the hypotheses and the conclusion.

And we can write our first rule that uses type environment O

O(x) = T
-------------  [Var]
O |- x:T&

Notice that we choose to keep the value type in O, and use it to assign the reference type to x --- this is a succinct way of saying that all variables must be of a reference type.

We allow a declaration only at the beginning of a scope; moreover, there can be at most one declaration at the beginning of a scope, and multiple declarations require multiple nested scopes.

  O[T0/x] |- e1:T1
   T0 != void
----------------------------    [Let-No-Init]
 O |- { T0 x; e1 } : T1

O[T/x] represents one function, where

O[T/x](x) = T
O[T/x](y) = O(y)

Notice that we use T0/X instead of T0&/X --- that's because the Var rule only expects the value type of the variable in the type environment, and uses it to assign a reference type to the variable.

When we enter the scope, we add a new assumption in our type environment. When we leave the scope, we remove the assumption.

This terminology reminds us of the symbol table. So the type-environment is stored in the symbol table.

The type environment gives types to the free identifiers in the current scope
The type environment is passed down the AST from the root towards the leaves
- To check the type of a new scope, we will see which rules can be applied
- The Let-Init/Let-No-Init rule will indicate that we should try type-checking the body of the new scope with an updated environment (top-down): O[T/x]
Types are computed up the AST from the leaves towards the root.

Our language also allows the initialization of a newly-declared variable.

      O[T0/x] |- e1:T1
         O |- e2:T0
         T0 != void
----------------------------    [Let-Init]
 O |- { T0 x = e2; e1 } : T1

Notice that this rule says that the initializer e2 should have the same type T0 as x. This is imprecise, we should ideally allow such assignment as long as e2 is of a subtype of T0.

A better version of [Let-Init] using subtyping:

      O[T/x] |- e1:T1
         O |- e2:T0
       T0 <: T
        T != void
----------------------------    [Let-Init]
 O |- { T x = e2; e1 } : T1

Notice that T0 only needs to be a subtype of T and not a subtype of T&. This allows int x = 3. Notice that we disallow the construction of a term of type void&. Consequently, we also disallow the construction of a term of type void*.

Typing Methods

C++ example

        O |- e1:T1
        ...
        O |- en:Tn
------------------------------ [Dispatch]
   O |- f(e1,e2,...,en):?

Also maintain a mapping for method signatures in O

O(f) = (T1,...,Tn,T(n+1))

means that there is a method f such that

f(x1:T1,x2:T2,x3:T3,...,xn:Tn):T(n+1)

Hence, our dispatch rule becomes:

        O |- e1:T1
        ...
        O |- en:Tn
        O |- f:(T1',..,Tn',T(n+1)')
        T1<=T1',...Tn<=Tn'
---------------------------------------- [Dispatch]
   O |- f(e1,e2,...,en):T(n+1)

Notice that this tackles both call-by-value and call-by reference. For example, T1' could be a reference type. Further, a reference-type can be used as an argument where a value-type was expected; but not vice-versa.

Object-oriented languages also implement encapsulation (e.g., some functions are only visible in certain classes, etc.). To handle them, one can extend O to be a function from a tuple of class-name and the identifier-name, e.g., C.f instead of f.

Example: The following shows an example syntax of a method declaration and dispatch.

class C {
   ....
   ...foo(T1 a1, ..., Tn an) .... //method declaration.
};

C c;
c.foo(....);

The following shows an example syntax of a method declaration and dispatch with inheritance (subtyping) where D<:C.

class C {
   ....
   ...foo(T1 a1, ..., Tn an) .... //method declaration.
};

D c;
c.foo(x1,...,xn);

In this case, we will use scoping rules to resolve "which foo?". For example, in this case, the foo is C's foo. Notice that the scoping rule will need to know the type of c and potentially the types of x1..xn to be able to identify "which foo?", and so scoping and typing work hand-in-hand. Irrespective, once the question has been answered, we annotate each such dispatch with the class name corresponding to the method being dispatched, e.g., the dispatch in the example above becomes:

D c;
c@C.foo(....);

This disambiguation is required to determine the type of the return value, e.g., C.foo may have a different return type than B.foo.

To express inheritance, we will use the subtype-relation for the this object in the method dispatch.

     O |- e0:T0&
     O |- e1:T1
     ...
     O |- en:Tn
     O |- T.f : (T1',..,Tn',T(n+1))
     T0 <: T
     T1<:T1', ...,  Tn<:Tn'
---------------------------------------- [Static-Dispatch]
   O |- e0@T.f(e1,e2,...,en):T(n+1)

Notice that we require e0 to be a reference type. In other words, e0 should represent of a region of memory, also called an object.

Type polymorphism examples

The following definitions can co-exist.

```
int foo(int)
```
```
long foo(long)
```
```
string foo(string)
```

Notice that string foo(int) cannot co-exist with the first foo.

Type resolution in this case is performed by looking at the types of the arguments, and those definitions are enabled where the types of actual arguments are respective subtypes of the formal arguments. If no definitions are enabled, this is a type error. If more than one definitions are enabled, that is usually a type-ambiguity error. (Notice that using resolution rules here like, most closely matching type, etc., are unlikely to be intuitive for the programmer when there are multiple arguments and one argument matches more closely for one candidate while another argument matches more closely for another candidate).

this keyword

For object-oriented languages, to resolve the this keyword, we also need to know the "current class" C in which the expression is sitting. To support this, we change the form of a typing rule (also called a sentence in logic) to:

O,C |- e:T

We use C=null for a context which is free of any "current class".

For example,

 O,C |- e1:int    O,C |- e2:int
----------------------------------  [Add]
    O,C |- e1+e2:int

The rule for the this keyword would be the following:

  C != null
---------------  [This]
O,C : this: C*

Summary

Type rules are defined on the structure of expressions (through structural induction)
Types of variables are modeled by an environment

Warning: Type rules are very compact! A lot of information in compact rules and it takes time to interpret them.

Implementing Type Checking

C type checking can be implemented in a single traversal over the AST. This is not true for some other languages like OCaml for example, where multiple passes may be required.
Type environment is passed down the tree
- From parent to child
- Other languages may require multiple passes to construct the type environment at each node
Types are passed up the tree
- From child to parent

Let's consider the following rule:

 O,C |- e1:int    O,C |- e2:int
----------------------------------  [Add]
    O,C |- e1+e2:int

This says that to type-check e1+e2, then we have to type-check e1 and e2 separately in the same O,C environment.

TypeCheck(Environment, e1+e2) = {
  T1 = TypeCheck(Environment, e1);
  Check T1 == int;
  T2 = TypeCheck(Environment, e2);
  Check T2 == int;
  return int;
}

Let's consider a more complex example:

         O |- e2:T0
      O[T&/x] |- e1:T1
       T0 <: T
        T != void
----------------------------    [Let-Init]
 O |- { T x = e2; e1 } : T1

Let's look at the corresponding type-checking implementation:

TypeCheck(Environment, { T x = e0; e1 }) = {
  T0 = TypeCheck(Environment, e0);
  T1 = TypeCheck(Environment.add(x:T&), e1);
  Check subtype(T0,T1);
  Check T != void;
  return T1;
}

Recall that we claimed that type inference and type checking are similar, in that they use the same typing rules. The typing rules can also be used to implement type inference. We can modify the TypeCheck function, so that along with the Environment, it also passes down the acceptable types for a sub-expression, or more generally type constraints. At the end, these constraints can be solved to identify the types of the leaf nodes of the AST to implement type inference. If the constraints have no solution, it is a type error. If the constraints have more than one solution, either there is a type ambiguity error, or there is a rule to resolve the ambiguity. For example, the most general type may be used when all the candidate types are ordered by the subtyping relation, i.e., the type that is the supertype of all other candidates, e.g., \alpha-list is more general than int-list.

Static vs. Dynamic Typing

Languages with static type checking also have a notion of dynamic types, just that they may or may not implement the dynamic checks. The dynamic type of a term is determined through the execution semantics.

C c

C()

A run-time notion
Even languages that are not statically typed have the notion of dynamic type

The static type of an expression captures all dynamic types the expression could have
- A compile-time notion

If a typesystem is sound, we do not need to implement dynamic checks for the properties that have been checked statically. For example, we do not need to dynamically check during execution of 1+2 that 1 and 2 are both of type int.

Consider C++ program:

class A { ... }
class B : public A { ... }
void foo() {
  A* foo = new A;
  B* bar = new B;
  ....
  foo =  bar;
}

Static type of foo is "A*", but the dynamic type of foo is "A*" and "B*" at different program points.

Formalizing the relationship, for a sound typesystem: for all expressions E, dynamic_type(E) <: static_type(E)

In all executions, E evaluates to values of the type inferred by the compiler. You have to actually run the program to get the dynamic_type. In the early days of programming languages, equality (instead of subtype relation) was proved for their type systems.

Notice that rejecting all programs is sound. That's why we also care about precision, i.e., how many programs we can willingly accept.

Completeness

All dynamically type-correct programs are accepted by the static typechecker. Consider the following program:

C c;
D d; //where D is a subtype of C
....
D d2 = cond ? c : d;

Let's say that at runtime cond can never be false, and so this program is dynamically type-correct. However, statically figuring out that cond is never false is undecidable.

Show a Venn diagram: statically-type-correct < dynamically-type-correct < parseable. For soundness, statically-type-correct < dynamically-type-correct should hold. For completeness, dynamically-type-correct < statically-type-correct should hold.

All operations that can be used on an object of type C can also be used on an object of type C' <= C.

Such as fetching the value of an attribute
Or invoking a method on the object

Subclasses only add attributes or methods (they never remove attributes/methods).

The dynamic type is obtained through the execution semantics of the program, e.g., if we have an expression x++, and the current state of the program determines x to be of type int&, there is a dynamic typing rule that says that the type of the new expression is also int&. Consider the example { T x = e0; x; } where e0:T0 <: T; here the dynamic type of the expression is T0 (not T!). Similarly, the dynamic type of if x then y:T1 else Y:T2 will be either T1 or T2 (not lub(T1,T2) or void).

In other words, soundness states that the static type system will not accept any incorrect program (that will not pass the dynamic type check of equal power). A dynamic type check for array-bounds is not of equal power (it has more checks enabled); a sound static type check for array-bounds would reject all incorrect programs (but it will also reject a few more that will actually pass the dynamic type check).

Soundness of static type system: all dynamically-type-incorrect programs will be rejected. Ensured by ensuring that static_type is a supertype of dynamic_type. A trivial static-type system that rejects all programs is sound (but not useful).

Completeness: all dynamically-type-correct programs will be accepted. Not possible to ensure both soundness and completeness in general.

Example where static type system can be too restrictive

While a static type system may be sound, it may be too restrictive and may disallow certain programs that may be dynamically well-typed.

class Count {
  int i = 0; //default value = 0
  Count inc() { i := i + 1; return *this; }
};

Now consider a subclass Stock of Count:

class Stock : public Count {
  string name; //name of the item
};

And the following use of Stock:

Stock a;
Stock b = a.inc();
... b.name ...

This code does not type-check because the return value of inc is Count which is not a subtype of Stock.

This seems like a major limitation as now the derived classes (subtypes) will be unable to use the inc method.

a.inc() has dynamic type Stock.

So dynamically speaking, it is legitimate to write the following: Stock b := a.inc(); But this is not well-typed as

a.inc() has static type Count.

Common workarounds to the incompleteness of a static type system:

We can extend the type system. Different languages extend the type system in different directions
Option 1: Use dynamic_cast: returns nullptr at runtime if not-successful; else returns a pointer to the object of the new type. Allows bypassing the static type system. May entail runtime cost.
Option 2: C++ Template : pass subtype as argument to the supertype
```
template<typename T>
class Count<T> {
  int i = 0;
	T inc() { i <-- i + 1; return *static_cast<T *>(this); } //static_cast gets checked at compile-time!
}

class Stock : public Count<Stock>
{
  ...
}
```
The use of static_cast is only a convenience, as the compiler will check if it is guaranteed that this can be type-casted to T* at compile-time (potentially using an analysis that is external to the type-rules that we have discussed so far). One could replace static_cast with dynamic_cast with the same effect (except that the compiler will no longer check the cast at compile time).

Error Recovery

Detecting where errors occur is easier than in parsing
Introduce a new type No_type for use with ill-typed expressions
Define No_type <: C for all types C. Avoids cascading type errors due to one type-error.
Thus, every operation is defined for No_type
- With a No_type result
The type hierarchy is not a tree anymore, it is a DAG with No_type at the bottom. A DAG may also occur due to multiple inheritance for example.