COL729 Lab 1 : Anticipated (Needed) Expressions Analysis

Due date - 23:55, 7 April 2021

Weightage: 40%

Overview

In this assignment you will implement Anticipated or Needed Expressions Analysis (notes, video). Make sure you understand it before proceeding ahead.

In order to implement the analysis, we need (broadly) two things: (1) a framework to represent programs as flow graphs which and, (2) a DFA frawework within which we can implement our analysis. For this assignment we will work with the compilerai framework (developed in-house here at IITD). We give a brief overview of the framework below (for more details refer to doc/ directory of superopt-project).

Compilerai framework

Programs are represented as Control Flow Graphs (CFGs) and are referred to as TFGs (Transfer Function Graphs) in the code.

Instructions

The source code for the framework is available in the form of a Git repository in the server machine provided to you for lab work. You will clone it in your home directory and work on it from there only (see also: doc/GettingStarted.md and doc/discipline.md).

Getting started

Please follow the following steps to get started:

  1. Clone the Git repo in your local: scratch directory: git clone --recursive /home/abhishek/scratch/superopt-project
    You are advised to use scratch as the rootfs does not have enough free space for everyone.
  2. Change directory to superopt-project.
  3. Copy the tars directory: cp -r /home/abhishek/scratch/superopt-project/tars ..
  4. Build the project: make.
    If you get any issues in this step, please make a post on Piazza with full logs (i.e. the full output of make).

At this point we have completed the initial setup and can now start working with the code.

Change directory to superopt, this is the repository where you will implement the analysis. From now we will assume that you are already in the root of superopt. The primary places where you need to make changes have been marked with COL729_LAB1 TODO. Search for them using git grep 'COL729_LAB1 TODO'.

Working with the code

Recall that in our framework a program is represented using TFG, in the code we have a class named tfg for the same. In fact, tfg is part of a hierarchy of classes which build upon each other to achieve more complex functionality. At the bottom of this hierarchy is the graph class which is mostly concerned with "graph" properties i.e. edges and nodes. On the other hand, at the top are specialized classes such as tfg_llvm_t which represent a TFG created from LLVM IR. The relevant source code paths are include/graph/ and include/tfg/.

The DFA framework builds upon graph and provides an implementation of the worklist algorithm. To implement an analysis, one needs to subclass the data_flow_analysis and provide overriding implementation for the xfer_and_meet() function which accepts 3 parameters: (1) the corresponding edge, (2) input value (i.e. IN), (3) output value (i.e. OUT) passed as reference. The function is supposed to compute the new OUT value using the IN value and the old OUT value (by using meet() for the latter), and indicate as return value whether there was a change in OUT value. The DFA framework itself is defined in include/graph/dfa.h, and the files include/graph/avail_exprs_dfa.h, include/graph/liveness_dfa.h contain implementation of available expressions and liveness analysis using the DFA framework. The available expressions analysis is actually implemented as a super dataflow analysis. You can find the implementation of its xfer_and_meet() in lib/graph/graph_with_aliasing.cpp, line 1362 under the function name avail_exprs_xfer_and_meet().

Studying the implementation of xfer_and_meet of available expressions analysis would give you enough information to implement the anticipated expressions analysis. Most things remain same in both analyses: you would need the expressions generated and killed in each edge (populate_gen_and_kill_sets_for_edge()), you would have to kill those expressions which depend on a killed expression (get_locs_potentially_read_in_expr_using_locs_map() and set_intersect(deps, killed_locs)) and so on.

On locs: As you must have noticed, instead of returning expressions (or set of expr_ref), the function populate_gen_and_kill_sets_for_edge() returns locs. A loc is one of the following: an LLVM variable, a register, or a unit of memory whose size and location are known (these are called memory slots or memslot for short). It represents a unit of memory about which the framework can reason about. For example, a write to an LLVM variable would modify the loc corresponding to that variable, a write to a memory location would modify all memory slots whose address can potentially alias with the target of memory write and so on. It is possible to convert a loc (graph_loc_id_t in code) to an expression (expr_ref) and vice-versa using two special mappings maintained by TFG: exprid2locid_map and locid2expr_map.
On expressions: We represent an expression with the expr class (see include/expr/expr.h). In the code, however, you would mostly see references to expr_ref only. This is because an expr_ref is a reference to an immutable expr object which is reference-counted in order to simplify memory management.

In order to simplify things for the lab assignment and allow you to focus on only the important parts, some skeleton code has been provided. In particular:

Running and testing your changes

Once you have filled-in the required code, you need to first uncomment the lines 575-587 and comment line 588 to run the anticipated expressions analysis.

Steps for building and running the code:

  1. Run make in superopt-project to build your changes.
  2. Now switch directory to lab1_test and run ./c2bc.sh a.c to compile C file a.c to LLVM bitcode a.c.bc.
  3. While still in lab1_test, run ./run.sh a.c.bc to convert LLVM bitcode to TFG and run the analysis. The output TFG file name will be named ${INPUT_FILENAME}.etfg, e.g., a.c.bc.etfg for a.c.bc. Log (stdout) of run is automatically saved in last_run.log.
    You can turn on debug output by appending --dyn_debug=DEBUG_CLASS to llvm2tfg command in run.sh. DYN_DEBUG statements in your code will get executed if their DEBUG_CLASS matches the supplied DEBUG_CLASS. For example, in order execute this statement:
    DYN_DEBUG(anticipated_analysis, cout << "Here" << endl);
    you will supply --dyn_debug=anticipated_analysis to llvm2tfg.
Visual representation of TFG: For debugging you may also find the etfg2dot tool helpful. etfg2dot converts an LLVM TFG to Graphviz's dot file, which can then be viewed graphically using xdot or similar dot viewers. You can build etfg2dot by switching to superopt and then running make etfg2dot. For running, use this command: superopt/build/etfg_i386/etfg2dot --collapse false ${FunctionName} ${TFGfile}, e.g. superopt/build/etfg_i386/etfg2dot --collapse false sum a.c.bc.etfg.
Interpreting the results
The result of your anticipated expressions analysis will be available in serialized format in the output TFG file: search for text =Anticipated expressions in. Example:
=Anticipated expressions in src.llvm.sum
=anticipated expressions at L0%0%1 in src.llvm.sum
=anticipated_exprs begin
=expr
1 : 0 : BV:32
2 : input.src.llvm-%n : BV:32
3 : bvslt(1, 2) : BOOL
=expr
=anticipated_exprs end
=anticipated expressions at Lentry%1%1 in src.llvm.sum
=anticipated_exprs begin
...
...
...
=anticipated_exprs end
=anticipated expressions at Lfor.cond%1%1 in src.llvm.sum
...
...
...
The above example represents the following information: at PC L0%0%1, the expression bvslt(0, input.src.llvm-%n), i.e. (0 < %n) is anticipated (we can probably guess here that the first statement of the corresponding C code has a check with n > 0 in it, where n is a parameter).
Requirements
Your implementation should be able find all anticipated expressions at every PC in the input program. As the framework can transparently handle things like aliasing and function calls, your implementation should not assume absence of any program features.

Turn in

Submit your changes to superopt repository in the form of a compressed patch file:

NOTE: It is your responsibility to ensure that a sequence of gunzip -c ${your_group_members_entry_nums}.gz | patch -p1, when executed in superopt root with HEAD at col728_lab1, produces a tree equivalent to yours.

NOTE: Please start early in this lab. This is the first time we are using this framework for teaching, therefore the documentation (and code) is not very beginner friendly. We may not have anticipated all newbie problems and you might have trouble understanding the code/docs initially.