We must specify for every C expression what happens when it is evaluated. (this is not exactly C, but close).
The definition of a programming language:
We have specified evaluation rules indirectly
This is a complete description
Assembly-language descriptions of language implementations have irrelevant detail
Another approach: reference implementations. Again, there will be artifacts of the particular way it was implemented that you didn't mean to be a part of the language (over-specification)
Many ways to specify semantics
Operational semantics
Denotational semantics
Axiomatic semantics
Notation: Logical rules of inference, as in type checking
Recall the typing judgement
Context |- e:CIn the given context, expression
e
has type C
.
We use something similar for evaluation
Context |- e:vIn the given context (the meaning of context is different from the context for typing judgements), expression
e
evaluates to value v
.
Example
Context |- e1: 5 Context |- e2 : 7 ----------------- Context |- e1+e2 : 12The
Context
does not do much in this rule. In general, Context
may provide mappings from variables to values for the variables used in e1
and e2
.
Example
y <-- x + 1We track variables and their values with:
A variable environment maps variables to locations
E = [a:l1, b:l2]
Store maps memory locations to values
S = [l1 --> 5, l2 --> 7]
S' = S[12/l1]
defines a store S' such that
S'(l1) = 12 S'(l) = S(l) if l != l1
C++ values are objects
X(a1=l1, ..., an=ln)
is a C++ object where
X
is the class of the object
ai
are the attributes (including inherited ones)
li
is the location where the value of ai
is stored
Notice that the specification is deliberately keeping things as abstract as possible (avoiding over-specification). e.g., the layout of the object is not specified.
There are constant values in C/C++ that do not have associated memory locations; they only have values., e.g., (int)5
, (bool)true
, (char const *)"constant string", \ldots
There is a special value void
:
void
value
The evaluation judgement is:
s0, E,S |- e:v, S'
s0
the current (this/self) object
E
the current variable environment
S
the current store
e
terminates then
e
is v
S'
s0
and the current environment E
does not change by evaluating an expression. These are invariant under evaluation. However, the contents of the memory may change.
If a linked-list of two elements starting at variable head
exists in the current context, then the following is a possible state of E
and S
:
E = (head -> l1) S = (l1 -> NODE(1,l2); l2 -> NODE(2,nil))
Also notice: there is a qualification which says that if the evaluation of e
terminates. Read as "if e
terminates..."
"Result" of evaluation is a value and a new store
Some things don't change
Start with simple operational-semantic rules and work our way up to more complex rules.
------------------------------ [bool-true] s0,E,S |- true: bool(true), S
-------------------------------- [bool-false] s0,E,S |- false: bool(false), S
i is an integer literal ----------------------- [int] s0,E,S |- i: int(i), S
s is a string literal ---------------------------------- [string-lit] s0,E,S |- s: (char const *)(s), S
E(id) = lid S(lid) = v ---------------- [id] s0,E,S |- id:v,SThe rule above is specifically for the
rvalue
use of id
. For uses as an lvalue
,
we use explicit rules for each potential use as an lvalue, e.g.,
for assignment or address-of operations.
-------------------- [this] s0,E,S|- this: s0, S
s0,E,S |- e:v, S1 E(id) = lid S2 = S[v/lid] --------------------------- [assignment] s0,E,S |- id <-- e:v, S2(e.g.,
x <-- 1 + 1
)
s0,E,S |- e1:v1, S1 s0,E,S1 |- e2:v2, S2 --------------------------- [add] s0,E,S |- e1+e2 : v1+v2, S2(notice that the store used while evaluating
e2
includes the side-effects of evaluating e1
). These stores also dictate the order of evaluation of the expressions --- e1 needs to be evaluated first to get S1 which is needed by evaluation of e2
s0,E,S |- e1:v1, S1 s0,E,S1 |- e2:v2, S2 ----------------------- [stmt] s0,E,S |- e1;e2 : v2,S2
s0,E,S |- e:v,S1 ----------------------- [stmt-block] s0,E,S |- {e} : v,S1
Example: Consider the expression
{ X <-- 7 + 5; 4; }Let's say that initially,
s0=s0
, E=x:l
, l<-0
. Let's evaluate this expression in this context (start-state/environment/store).
.... ..... --------------------------------------------- ------------------------------- s0,[X:l],[l<-0] |- X<-7+5:12,[l<-12} s0,[x:l],[l:..] |- 4:4,[l:...] ------------------------------------------------------------------------ s0,[X:l],[l<-0] |- {X <- 7+5; 4 } : 4, [l:12]
s0,E,S |- e1:bool(true),S1 s0,E,S1 |- e2:v,S2 -------------------------------------- [ite-true] s0,E,S |- if e1 then e2 else e3 : v,S2
s0,E,S |- e1:bool(false),S1 s0,E,S1 |- e3:v,S3 -------------------------------------- [ite-false] s0,E,S |- if e1 then e2 else e3 : v,S3
s0,E,S |- e1 : bool(false), S1 ------------------------------------ [while-false] s0,E,S |- while (e1) {e2} : void, S1
s0,E,S |- e1 : bool(true), S1 s0,E,S1 |- e2 : v, S2 s0,E,S2 |- while e1 {e2} : void, S3 ------------------------------------ [while-true] s0,E,S |- while (e1) {e2} : void, S3
Partial rule
s0,E,S |- e1 : v1, S1 s0,?,? |- e2: v2, S2 --------------------------------------------- s0,E,S |- { Decl(id:T <-- e1); e2 } v2, S2In what context should
e2
be evaluated?
E
but with a new binding of id
to a fresh location lnew
.
S1
but with lnew
mapped to v1
.
We write lnew = newloc(S)
to say that lnew
is a location not already used in S
.
newloc
is like the memory allocation function.newloc
identifies a location that is unique from the domain of input S
. Notice that this is a superset of the range of E
, e.g., for a linked-list, only the head
variable maps to a location in E
but multiple links point to other allocated locations (and newloc
should return a location distinct from existing locations).Complete rule
s0,E,S |- e1 : v1, S1 lnew = newloc(S1) s0,E[lnew/id],S1[v1/lnew] |- e2: v2, S2 --------------------------------------------- s0,E,S |- { Decl(id:T <-- e1); e2 } v2, S2Notice that the
S2
at the end of the scope will still contain mappings for lnew
. We call this a memory leak. There are two types of consequences of this:
lnew
in its values (mapped through S
). This also puts additional burden on the code generator as the code generator is now burdened to preserve lnew
even at the end of the scope in which it was declared. For example, it can no longer allocate id
on the stack, as the lifetime of an AR on stack may be smaller than the lifetime of id
.
As an aside: if we wanted to allow only a bounded set of locations, then we could potentially remove lnew
from S2
(say S2-lnew
)
in the conclusion of the declaration rule.
new T
T
Let's assume that for each type A
, there is a default value D_A
. In reality, C/C++ have a notion of uninitialized variables, and associated undefined behaviour. We use:
D_int = int(0)
D_bool = bool(false)
D_(T *) = (T *)(NULL)
For a class A
, we write
class(A) = (a1:T1 <-- e1, a2:T2 <-- e2, ..., an <-- en}where
ai
are the attributes (including the inherited ones)
Ti
are the attributes' declared types
ei
are the initializers
class(C) = (a1:.., a2:..., b1:.., b2:.., c1:.., c2:..)
class(T) = (a1:T1 <- e1, ..., an:Tn <- en) l1,l2,...,ln = newlocs(S,n) v = T(a1=l1,..an=ln) S1=S[DT1/l1,...,DTn/ln] E'=[a1:l1, ..., an:ln] v,E',S1 |- { a1 <-- e1; ... an <-- en; } : vn, S2 ---------------------------------------------------------- s0,E,S |- new T : v,S2Notice that
E'
has nothing to do with E
, E'
is due to a completely different scope. Also notice that the evaluation of e1
, ..., en
follows the same order as in class T
(greatest ancestor first). Also notice that the initializers are evaluated with a new value of the self object (v
). Also notice that vn
is ignored.
newlocs(S,n)
generates n
distinct locs that are not already present in S
.
Summarizing:
this
is the current object
e0.f(e1,...,en)
e1,..., en
e0
be the target object
X
be the dynamic type of the target object
f
the definition of f
(with n
args).
n
new locations and an environment that maps f
's formal arguments to those locations
self
to the target object and evaluate f
's body
For a class A
and a method f
of A
(possibly inherited):
impl(A, f) = (x1, ..., xn, ebody)where
xi
are the names of the formal arguments
ebody
is the body of the method
Complete rule
s0,E,S |- e1 : v1,S1 s0,E,S1 |- e2 : v2,S2 ... s0,E,S(n-1) |- en : vn,Sn s0,E,Sn |- e0 : v0,S(n+1) v0 = X(a1=l1, ..., am=lm) impl(X,f) = (x1,..,xn, ebody) lxi = newloc(Sn+1) for i=1..n E' = [a1:l1, ..., am:lm][lx1/x1,...,lxn/xn] S(n+2) = S(n+1)[v1/lx1,...,vn/lxn] v0,E',S(n+2) |- ebody:v,S(n+3) ------------------------------------------------------- [dispatch] s0,E,S |- e0.f(e1,...,en) : v, S(n+3)-{lx1,lx2,...,lxn}We are interested in the class-tag of
v0
(X
) and its attributes. Notice that this is the dynamic type of the object v0. (This provides dynamic dispatch).
What are the names in scope of the method f
: all attributes and all the formal arguments. E'
assigns locations to all these names.
The construction of E'
is interesting because it has two square brackets: this is because it is possible that a formal parameter may have the same name as an attribute name, and we want to capture this overriding semantics.
The function body is evaluated in an updated store, where the locations of formal arguments are replaced by expression values (call-by-value). Also the self object is v0. Notice that E' has nothing to do with E
(static scoping). Dynamic scoping may have E' that is dependent on E.
Notice that we delete the locations lxi
(corresponding to call arguments) after finishing the execution of ebody
.