We must specify for every C expression what happens when it is evaluated. (this is not exactly C, but close).
The definition of a programming language:
We have specified evaluation rules indirectly
This is a complete description
Assembly-language descriptions of language implementations have irrelevant detail
Another approach: reference implementations. Again, there will be artifacts of the particular way it was implemented that you didn't mean to be a part of the language (over-specification)
Many ways to specify semantics
Operational semantics
Denotational semantics
Axiomatic semantics
Notation: Logical rules of inference, as in type checking
Recall the typing judgement
Context |- e:CIn the given context, expression
e
has type C
.
We use something similar for evaluation
Context |- e:vIn the given context (the meaning of context is different from the context for typing judgements), expression
e
evaluates to value v
.
Example
Context |- e1: 5 Context |- e2 : 7 ----------------- Context |- e1+e2 : 12The
Context
does not do much in this rule. In general, Context
may provide mappings from variables to values for the variables used in e1
and e2
.
Example
y <-- x + 1We track variables and their values with:
A variable environment maps variables to locations
E = [a:l1, b:l2]
Store maps memory locations to values
S = [l1 --> 5, l2 --> 7]
S' = S[12/l1]
defines a store S' such that
S'(l1) = 12 S'(l) = S(l) if l != l1
C++ values are objects
X(a1=l1, ..., an=ln)
is a C++ object where
X
is the class of the object
ai
are the attributes (including inherited ones)
li
is the location where the value of ai
is stored
Notice that the specification is deliberately keeping things as abstract as possible (avoiding over-specification). e.g., the layout of the object is not specified.
There are constant values in C/C++ that do not have associated memory locations; they only have values., e.g., (int)5
, (bool)true
, (char const *)"constant string", \ldots
There is a special value void
:
void
value
The evaluation judgement is:
s0, E,S |- e:v, S'
s0
the current (this/self) object
E
the current variable environment
S
the current store
e
terminates then
e
is v
S'
s0
and the current environment E
does not change by evaluating an expression. These are invariant under evaluation. However, the contents of the memory may change.
Also notice: there is a qualification which says that if the evaluation of e
terminates. Read as "if e
terminates..."
"Result" of evaluation is a value and a new store
Some things don't change
Start with simple operational-semantic rules and work our way up to more complex rules.
------------------------------ [bool-true] s0,E,S |- true: bool(true), S
-------------------------------- [bool-false] s0,E,S |- false: bool(false), S
i is an integer literal ----------------------- [int] s0,E,S |- i: int(i), S
s is a string literal ---------------------------------- [string-lit] s0,E,S |- s: (char const *)(s), S
E(id) = lid S(lid) = v ---------------- [id] s0,E,S |- id:v,S
-------------------- [this] s0,E,S|- this: s0, S
s0,E,S |- e:v, S1 E(id) = lid S2 = S[v/lid] --------------------------- [assignment] s0,E,S |- id <-- e:v, S2(e.g.,
x <-- 1 + 1
)
s0,E,S |- e1:v1, S1 s0,E,S1 |- e2:v2, S2 --------------------------- [add] s0,E,S |- e1+e2 : v1+v2, S2(notice that the store used while evaluating
e2
includes the side-effects of evaluating e1
). These stores also dictate the order of evaluation of the expressions --- e1 needs to be evaluated first to get S1 which is needed by evaluation of e2
s0,E,S |- e1:v1, S1 s0,E,S1 |- e2:v2, S2 ----------------------- [stmt] s0,E,S |- e1;e2 : v2,S2
s0,E,S |- e:v,S1 ----------------------- [stmt-block] s0,E,S |- {e} : v,S1
Example: Consider the expression
{ X <-- 7 + 5; 4; }Let's say that initially,
s0=s0
, E=x:l
, l<-0
. Let's evaluate this expression in this context (start-state/environment/store).
.... ..... --------------------------------------------- ------------------------------- s0,[X:l],[l<-0] |- X<-7+5:12,[l<-12} s0,[x:l],[l:..] |- 4:4,[l:...] ------------------------------------------------------------------------ s0,[X:l],[l<-0] |- {X <- 7+5; 4 } : 4, [l:12]
s0,E,S |- e1:bool(true),S1 s0,E,S1 |- e2:v,S2 -------------------------------------- [ite-true] s0,E,S |- if e1 then e2 else e3 : v,S2
s0,E,S |- e1:bool(false),S1 s0,E,S1 |- e3:v,S3 -------------------------------------- [ite-false] s0,E,S |- if e1 then e2 else e3 : v,S3
s0,E,S |- e1 : bool(false), S1 ------------------------------------ [while-false] s0,E,S |- while (e1) {e2} : void, S1
s0,E,S |- e1 : bool(true), S1 s0,E,S1 |- e2 : v, S2 s0,E,S2 |- while e1 {e2} : void, S3 ------------------------------------ [while-true] s0,E,S |- while (e1) {e2} : void, S3
Partial rule
s0,E,S |- e1 : v1, S1 s0,?,? |- e2: v2, S2 --------------------------------------------- s0,E,S |- { Decl(id:T <-- e1); e2 } v2, S2In what context should
e2
be evaluated?
E
but with a new binding of id
to a fresh location lnew
.
S1
but with lnew
mapped to v1
.
We write lnew = newloc(S)
to say that lnew
is a location not already used in S
.
newloc
is like the memory allocation function
Complete rule
s0,E,S |- e1 : v1, S1 lnew = newloc(S1) s0,E[lnew/id],S1[v1/lnew] |- e2: v2, S2 --------------------------------------------- s0,E,S |- { Decl(id:T <-- e1); e2 } v2, S2
new T
T
Let's assume that for each type A
, there is a default value D_A
. In reality, C/C++ have a notion of uninitialized variables, and associated undefined behaviour.
D_A = void
for any A
(could have had default values for some types, e.g., D_int = int(0)
).
For a class A
, we write
class(A) = (a1:T1 <-- e1, a2:T2 <-- e2, ..., an <-- en}where
ai
are the attributes (including the inherited ones)
Ti
are the attributes' declared types
ei
are the initializers
class(C) = (a1:.., a2:..., b1:.., b2:.., c1:.., c2:..)
class(T) = (a1:T1 <- e1, ..., an:Tn <- en) l1 = newloc(S) for i = 1,..,n v = T(a1=l1,..an=ln) S1=S[DT1/l1,...,DTn/ln] E'=[a1:l1, ..., an:ln] v,E',S1 |- { a1 <-- e1; ... an <-- en; } : vn, S2 ---------------------------------------------------------- s0,E,S |- new T : v,S2Notice that
E'
has nothing to do with E
, E'
is due to a completely different scope. Also notice that the evaluation of e1
, ..., en
follows the same order as in class T
(greatest ancestore first). Also notice that the initializers are evaluated with a new value of the self object (v
). Also notice that vn
is ignored.
Summarizing:
this
is the current object
e0.f(e1,...,en)
e1,..., en
e0
be the target object
X
be the dynamic type of the target object
f
the definition of f
(with n
args).
n
new locations and an environment that maps f
's formal arguments to those locations
self
to the target object and evaluate f
's body
For a class A
and a method f
of A
(possibly inherited):
impl(A, f) = (x1, ..., xn, ebody)where
xi
are the names of the formal arguments
ebody
is the body of the method
Complete rule
s0,E,S |- e1 : v1,S1 s0,E,S1 |- e2 : v2,S2 ... s0,E,S(n-1) |- en : vn,Sn s0,E,Sn |- e0 : v0,S(n+1) v0 = X(a1=l1, ..., am=lm) impl(X,f) = (x1,..,xn, ebody) lxi = newloc(Sn+1) for i=1..n E' = [a1:l1, ..., am:lm][x1/lx1,...,xn/lxn] S(n+2) = S(n+1)[v1/lx1,...,vn/lxn] v0,E',S(n+2) |- ebody:v,S(n+3) ---------------------------------------------- [dispatch] s0,E,S |- e0.f(e1,...,en) : v, S(n+3)We are interested in the class-tag of
v0
(X
) and its attributes. Notice that this is the dynamic type of the object v0. (This provides dynamic dispatch).
What are the names in scope of the method f
: all attributes and all the formal arguments. E'
assigns locations to all these names.
The construction of E'
is interesting because it two square brackets: this is because it is possible that a formal parameter may have the same name as an attribute name, and we want to capture this overriding semantics.
The function body is evaluated in an updated store, where the locations of formal arguments are replaced by expression values (call-by-value). Also the self object is v0. Notice that E' has nothing to do with E
(static scoping). Dynamic scoping may have E' that is dependent on E.