We must specify for every C expression what happens when it is evaluated. (this is not exactly C, but close).
The definition of a programming language:
We have specified evaluation rules indirectly
This is a complete description
Assembly-language descriptions of language implementations have irrelevant detail
Another approach: reference implementations. Again, there will be artifacts of the particular way it was implemented that you didn't mean to be a part of the language (over-specification)
Many ways to specify semantics
Operational semantics
Denotational semantics
Axiomatic semantics
Notation: Logical rules of inference, as in type checking
Recall the typing judgement
Context |- e:CIn the given context, expression
e has type C.
We use something similar for evaluation
Context |- e:vIn the given context (the meaning of context is different from the context for typing judgements), expression
e evaluates to value v.
Example
Context |- e1: 5 Context |- e2 : 7 ----------------- Context |- e1+e2 : 12The
Context does not do much in this rule. In general, Context may provide mappings from variables to values for the variables used in e1 and e2.
Example
y <-- x + 1We track variables and their values with:
A variable environment maps variables to locations
E = [a:l1, b:l2]
Store maps memory locations to values
S = [l1 --> 5, l2 --> 7]
S' = S[12/l1] defines a store S' such that
S'(l1) = 12 S'(l) = S(l) if l != l1
C++ values are objects
X(a1=l1, ..., an=ln) is a C++ object where
X is the class of the object
ai are the attributes (including inherited ones)
li is the location where the value of ai is stored
Notice that the specification is deliberately keeping things as abstract as possible (avoiding over-specification). e.g., the layout of the object is not specified.
There are constant values in C/C++ that do not have associated memory locations; they only have values., e.g., (int)5, (bool)true, (char const *)"constant string", \ldots
There is a special value void:
void value
The evaluation judgement is:
s0, E,S |- e:v, S'
s0 the current (this/self) object
E the current variable environment
S the current store
e terminates then
e is v
S'
s0 and the current environment E does not change by evaluating an expression. These are invariant under evaluation. However, the contents of the memory may change.
If a linked-list of two elements starting at variable head exists in the current context, then the following is a possible state of E and S:
E = (head -> l1) S = (l1 -> NODE(1,l2); l2 -> NODE(2,nil))
Also notice: there is a qualification which says that if the evaluation of e terminates. Read as "if e terminates..."
"Result" of evaluation is a value and a new store
Some things don't change
Start with simple operational-semantic rules and work our way up to more complex rules.
------------------------------ [bool-true] s0,E,S |- true: bool(true), S
-------------------------------- [bool-false] s0,E,S |- false: bool(false), S
i is an integer literal ----------------------- [int] s0,E,S |- i: int(i), S
s is a string literal ---------------------------------- [string-lit] s0,E,S |- s: (char const *)(s), S
E(id) = lid S(lid) = v ---------------- [id] s0,E,S |- id:v,SThe rule above is specifically for the
rvalue
use of id. For uses as an lvalue,
we use explicit rules for each potential use as an lvalue, e.g.,
for assignment or address-of operations.
-------------------- [this] s0,E,S|- this: s0, S
s0,E,S |- e:v, S1 E(id) = lid S2 = S[v/lid] --------------------------- [assignment] s0,E,S |- id <-- e:v, S2(e.g.,
x <-- 1 + 1)
s0,E,S |- e1:v1, S1 s0,E,S1 |- e2:v2, S2 --------------------------- [add] s0,E,S |- e1+e2 : v1+v2, S2(notice that the store used while evaluating
e2 includes the side-effects of evaluating e1). These stores also dictate the order of evaluation of the expressions --- e1 needs to be evaluated first to get S1 which is needed by evaluation of e2
s0,E,S |- e1:v1, S1 s0,E,S1 |- e2:v2, S2 ----------------------- [stmt] s0,E,S |- e1;e2 : v2,S2
s0,E,S |- e:v,S1
----------------------- [stmt-block]
s0,E,S |- {e} : v,S1
Example: Consider the expression
{ X <-- 7 + 5; 4; }
Let's say that initially, s0=s0, E=x:l, l<-0. Let's evaluate this expression in this context (start-state/environment/store).
....
.....
--------------------------------------------- -------------------------------
s0,[X:l],[l<-0] |- X<-7+5:12,[l<-12} s0,[x:l],[l:..] |- 4:4,[l:...]
------------------------------------------------------------------------
s0,[X:l],[l<-0] |- {X <- 7+5; 4 } : 4, [l:12]
s0,E,S |- e1:bool(true),S1 s0,E,S1 |- e2:v,S2 -------------------------------------- [ite-true] s0,E,S |- if e1 then e2 else e3 : v,S2
s0,E,S |- e1:bool(false),S1 s0,E,S1 |- e3:v,S3 -------------------------------------- [ite-false] s0,E,S |- if e1 then e2 else e3 : v,S3
s0,E,S |- e1 : bool(false), S1
------------------------------------ [while-false]
s0,E,S |- while (e1) {e2} : void, S1
s0,E,S |- e1 : bool(true), S1
s0,E,S1 |- e2 : v, S2
s0,E,S2 |- while e1 {e2} : void, S3
------------------------------------ [while-true]
s0,E,S |- while (e1) {e2} : void, S3
Partial rule
s0,E,S |- e1 : v1, S1
s0,?,? |- e2: v2, S2
---------------------------------------------
s0,E,S |- { Decl(id:T <-- e1); e2 } v2, S2
In what context should e2 be evaluated?
E but with a new binding of id to a fresh location lnew.
S1 but with lnew mapped to v1.
We write lnew = newloc(S) to say that lnew is a location not already used in S.
newloc is like the memory allocation function.newloc identifies a location that is unique from the domain of input S. Notice that this is a superset of the range of E, e.g., for a linked-list, only the head variable maps to a location in E but multiple links point to other allocated locations (and newloc should return a location distinct from existing locations).Complete rule
s0,E,S |- e1 : v1, S1
lnew = newloc(S1)
s0,E[lnew/id],S1[v1/lnew] |- e2: v2, S2
---------------------------------------------
s0,E,S |- { Decl(id:T <-- e1); e2 } v2, S2
Notice that the S2 at the end of the scope will still contain mappings for lnew. We call this a memory leak. There are two types of consequences of this:
lnew in its values (mapped through S). This also puts additional burden on the code generator as the code generator is now burdened to preserve lnew even at the end of the scope in which it was declared. For example, it can no longer allocate id on the stack, as the lifetime of an AR on stack may be smaller than the lifetime of id.
As an aside: if we wanted to allow only a bounded set of locations, then we could potentially remove lnew from S2 (say S2-lnew)
in the conclusion of the declaration rule.
new T
T
Let's assume that for each type A, there is a default value D_A. In reality, C/C++ have a notion of uninitialized variables, and associated undefined behaviour. We use:
D_int = int(0)
D_bool = bool(false)
D_(T *) = (T *)(NULL)
For a class A, we write
class(A) = (a1:T1 <-- e1, a2:T2 <-- e2, ..., an <-- en}where
ai are the attributes (including the inherited ones)
Ti are the attributes' declared types
ei are the initializers
class(C) = (a1:.., a2:..., b1:.., b2:.., c1:.., c2:..)
class(T) = (a1:T1 <- e1, ..., an:Tn <- en)
l1,l2,...,ln = newlocs(S,n)
v = T(a1=l1,..an=ln)
S1=S[DT1/l1,...,DTn/ln]
E'=[a1:l1, ..., an:ln]
v,E',S1 |- { a1 <-- e1; ... an <-- en; } : vn, S2
----------------------------------------------------------
s0,E,S |- new T : v,S2
Notice that E' has nothing to do with E, E' is due to a completely different scope. Also notice that the evaluation of e1, ..., en follows the same order as in class T (greatest ancestor first). Also notice that the initializers are evaluated with a new value of the self object (v). Also notice that vn is ignored.
newlocs(S,n) generates n distinct locs that are not already present in S.
Summarizing:
this is the current object
e0.f(e1,...,en)
e1,..., en
e0 be the target object
X be the dynamic type of the target object
f the definition of f (with n args).
n new locations and an environment that maps f's formal arguments to those locations
self to the target object and evaluate f's body
For a class A and a method f of A (possibly inherited):
impl(A, f) = (x1, ..., xn, ebody)where
xi are the names of the formal arguments
ebody is the body of the method
Complete rule
s0,E,S |- e1 : v1,S1
s0,E,S1 |- e2 : v2,S2
...
s0,E,S(n-1) |- en : vn,Sn
s0,E,Sn |- e0 : v0,S(n+1)
v0 = X(a1=l1, ..., am=lm)
impl(X,f) = (x1,..,xn, ebody)
lxi = newloc(Sn+1) for i=1..n
E' = [a1:l1, ..., am:lm][lx1/x1,...,lxn/xn]
S(n+2) = S(n+1)[v1/lx1,...,vn/lxn]
v0,E',S(n+2) |- ebody:v,S(n+3)
------------------------------------------------------- [dispatch]
s0,E,S |- e0.f(e1,...,en) : v, S(n+3)-{lx1,lx2,...,lxn}
We are interested in the class-tag of v0 (X) and its attributes. Notice that this is the dynamic type of the object v0. (This provides dynamic dispatch).
What are the names in scope of the method f: all attributes and all the formal arguments. E' assigns locations to all these names.
The construction of E' is interesting because it has two square brackets: this is because it is possible that a formal parameter may have the same name as an attribute name, and we want to capture this overriding semantics.
The function body is evaluated in an updated store, where the locations of formal arguments are replaced by expression values (call-by-value). Also the self object is v0. Notice that E' has nothing to do with E (static scoping). Dynamic scoping may have E' that is dependent on E.
Notice that we delete the locations lxi (corresponding to call arguments) after finishing the execution of ebody.