[edit]
# Separating Analysis from Execution

## Analyzing Evaluator

## The Separation

**Test Your Understanding**

What type of argument(s) does the procedure returned by analyze accept?

## Level Confusion

## So What?

## An Example

To work with the ideas in this section, get the analyzing metacircular evaluator:

```
cp ~cs61as/lib/analyze.scm .
```

The Metacircular Evaluator implementation in Lesson 12 is simple, but it is very inefficient because of how the syntactic analysis of expressions is interleaved with their execution. Thus, if a program is executed many times, its syntax is analyzed many times. Let's consider an example.

Suppose we’ve defined the `factorial`

function as follows:

```
(define (fact num)
(if (= num 0)
1
(* num (fact (- num 1)))))
```

What happens when we compute `(fact 3)`

?

```
eval (fact 3)
self-evaluating? ==> #f
variable? ==> #f
quoted? ==> #f
assignment? definition?
if? ==> #f
lambda? ==> #f
begin? ==> #f
cond? ==> #f
application? ==> #t
eval fact
self-evaluating? ==> #f
variable? ==> #t
lookup-variable-value ==> <procedure fact>
list-of-values (3)
eval3 ==> 3
apply <procedure fact> (3)
eval (if (= num 0) ...)
self-evaluating? ==> #f
variable? ==> #f
quoted? ==> #f
assignment? ==> #f
definition? ==> #f
if? ==> #t
eval-if (if (= num 0) ...)
if-predicate ==> (= num 0)
eval (= num 0)
self-evaluating? ==> #f
...
if-alternative ==> (* num (fact (- num 1)))
eval (* num (fact (- num 1)))
self-evaluating? ==> #f
...
list-of-values (num (fact (- num 1)))
...
eval (fact (- num 1))
...
apply <procedure fact> (2)
eval (if (= num 0) ...)
```

Four separate times, the evaluator has to examine the procedure body, decide that it’s an if expression, pull out its component parts, and evaluate those parts (which in turn involves deciding what type of expression each part is).

This is one reason why interpreted languages are so much slower than compiled languages: The interpreter does the syntactic analysis of the program over and over again. The compiler does the analysis once, and the compiled program can just do the part of the computation that depends on the actual values of variables. In this section, we will study the analyzing evaluator to see how to prevent the repetitive analysis of a program's syntax.

`eval`

takes two arguments, an expression and an environment. Of those, the expression argument is the same every time we revisit the same expression, whereas the environment will be different each time. For example, when we compute `(fact 3)`

, we evaluate the body of `fact`

in an environment in which `num`

has the value `3`

. That body includes a recursive call to compute `(fact 2)`

, in which we evaluate the same body, but now in an environment with `num`

bound to `2`

.

Our plan is to look at the evaluation process, find those parts which depend
only on `exp`

and not on `env`

, and do those only once. The procedure that
does this work is called `analyze`

.

What is the result of `analyze`

? It has to be something that can be combined
somehow with an environment in order to return a value. The solution is that
`analyze`

returns a procedure that takes only `env`

as an argument, and does
the rest of the evaluation.

Instead of

```
(eval exp env) ==> value
```

we now have

```
1. (analyze exp) ==> exp-procedure
2. (exp-procedure env) ==> value
```

What type of argument(s) does the procedure returned by analyze accept?

When we evaluate the same expression again, we only have to repeat step 2. What we’re doing is akin to memoization, in that we remember the result of a computation to avoid having to repeat it. The difference is that now we’re remembering something that’s only part of the solution to the overall problem, instead of a complete solution.

We can duplicate the effect of the original `eval`

this way:

```
(define (eval exp env)
((analyze exp) env))
```

`analyze`

`analyze`

has a structure similar to that of the original `eval`

:

```
(define (analyze exp)
(cond
((self-evaluating? exp)
(analyze-self-eval exp))
((variable? exp)
(analyze-var exp))
...
((foo? exp) (analyze-foo exp))
...))
```

The difference is that the procedures such as `eval-if`

that take an expression and an environment as arguments have been replaced by procedures such as `analyze-if`

that take only the expression as argument. How do these analysis procedures work? As an intermediate step in our understanding, here is a version of `analyze-if`

that exactly follows the structure of `eval-if`

and doesn’t save any time:

`eval-if`

:

```
(define (eval-if exp env)
(if (true? (eval (if-predicate exp) env))
(eval (if-consequent exp) env)
(eval (if-alternative exp) env)))
```

`analyze-if`

:

```
(define (analyze-if exp)
(lambda (env)
(if (true? (eval (if-predicate exp) env))
(eval (if-consequent exp) env)
(eval (if-alternative exp) env))))
```

This version of `analyze-if`

returns a procedure with `env`

as its argument,
whose body is exactly the same as the body of the original `eval-if`

.
Therefore, if we do

```
((analyze-if some-if-expression) some-environment)
```

the result will be the same as if we’d said

```
(eval-if some-if-expression some-environment)
```

in the original metacircular evaluator.

But we’d like to improve on this first version of `analyze-if`

because it
doesn’t really avoid any work. Each time we call the procedure that ```
analyze-
if
```

returns, it will do all of the work that the original `eval-if`

did.

The first version of `analyze-if`

contains three calls to `eval`

. Each of
those calls does an analysis of an expression and then a computation of the
value in the given environment. What we’d like to do is split each of those
`eval`

calls into its two separate parts, and do the first part only once, not
every time:

```
(define (analyze-if exp)
(let ((pproc (analyze (if-predicate exp)))
(cproc (analyze (if-consequent exp)))
(aproc (analyze (if-alternative exp))))
(lambda (env)
(if (true? (pproc env))
(cproc env)
(aproc env)))))
```

In this final version, the procedure returned by `analyze-if`

doesn’t contain
any analysis steps. All of the components were already analyzed before we call
that procedure, so no further analysis is needed.

The biggest gain in efficiency comes from the way in which `lambda`

expressions are handled. In the original metacircular evaluator, leaving out
some of the data abstraction for clarity here, we have

```
(define (eval-lambda exp env) (list ’procedure exp env))
```

The evaluator does essentially nothing for a `lambda`

expression except to
remember the procedure’s text and the environment in which it was created. But
in the analyzing evaluator we analyze the body of the procedure (using the
`analyze-sequence`

procedure); what is stored as the representation of the
procedure does not include its text! Instead, the evaluator represents a
procedure in the metacircular Scheme as a procedure in the underlying Scheme,
along with the formal parameters and the defining environment.

(Be sure to read Section 4.1.7 from SICP to see how all of the syntactic analysis procedures are implemented).

The analyzing evaluator turns an expression such as

```
(if A B C)
```

into a procedure

```
(lambda (env)
(if (A-execution-procedure env)
(B-execution-procedure env)
(C-execution-procedure env)))
```

This may seem like a step backward; we’re trying to implement `if`

and we end
up with a procedure that does an `if`

. Isn’t this an infinite regress?

No, it isn’t. The `if`

in the execution procedure is handled by the underlying
Scheme, not by the metacircular Scheme. Therefore, there’s no regress; we
don’t call `analyze-if`

for that one. Also, the `if`

in the underlying Scheme
is much faster than having to do the syntactic analysis for the `if`

in the
meta-Scheme.

The syntactic analysis of expressions is a large part of what a compiler does. In a sense, this analyzing evaluator is a compiler! It compiles Scheme into Scheme, so it’s not a very useful compiler, but it’s really not that much harder to compile into something else, such as the machine language of a particular computer.

A compiler whose structure is similar to this one is called a *recursive descent* compiler. Today, in practice, most compilers use a different
technique (called a stack machine) because it’s possible to automate the
writing of a parser that way. (I mentioned this earlier as an example of data-
directed programming.) But if you’re writing a parser by hand, it’s easiest to
use recursive descent.

(Be sure to read section 4.1.7 of SICP before proceeding).

Here is a nice example of evaluating `factorial`

using the analyzing
evaluator. Let's consider the following Scheme code:

```
(define factorial
(lambda (n)
(if (= n 1)
1
(* (factorial (- n 1)) n))))
(factorial 2) ;; low argument, so that the example is not too long)))
```

There are two statements here: one definition and one application.

We start with the definition, which we will call `d`

here (where `d`

stands
for '(define (factorial n) ...)'):

```
(eval d env)
((analyze d) env)
((analyze-definition d) env)
```

`analyze-definition`

will first analyze the `definition-value`

and then create
an execution procedure that, when executed, will `define`

the variable name to
the analyzed `definition-value`

.

This point is crucial. We are not just assigning a `lambda`

to `factorial`

, we
are assigning an *analyzed* `lambda`

to `factorial`

. This will provide a
performance boon later on.

So, to figure out the value of `factorial`

, we analyze the `lambda`

, with...
`analyze-lambda`

, of course (through the dispatch in analyze).

The boon that `analyze-lambda`

provides is really from analyzing the body
*once*, and then making a procedure Abstract Data Type with an `analyzed`

body
(a scheme procedure), instead of a simple list of instructions, like in the
old `eval`

.

The point is that, on invocations of our `lambda`

, we won't have to deal with
parsing. Parsing will *only* be done upon creating the `lambda`

.

Let's see this in action.

(NOTE: I'll be using `:=`

as a way to denote storing:

```
var := value
```

This isn't really scheme, but I think it's easier than having a bunch of `let`

statements.)

```
(analyze-lambda '(lambda (n) ...)')
```

Now we need to `analyze`

the body, then store it for later, so that we don't
redundantly `analyze`

the body again.

```
analyzed-body := (analyze (lambda-body '(lambda (n) (if ...))'))
(analyze-if '(if (= n 1)
1
(* (factorial (- n 1)) n))')
```

`analyze-if`

analyzes everything it's given, stores it, and then creates a new
execution procedure with those stored values.

```
if-pred := (analyze '(= n 1)')
; this is the execution procedure: (lambda (env)
; (execute-application (analyzed/= env)
if-true := (analyze '1')
; this is the execution procedure: (lambda (env) 1)
if-false := (analyze '(* (factorial (- n 1)) n)')
; this is too long to write out, but it's
; kind of like if-pred
;;this is the execution procedure we return:
;;let's call this execution procedure 'analyzed-fact-if'
(lambda (env)
(if (true? (if-pred env))
(if-true env)
(if-false env)))
```

And now that we know that result, let's go back to `analyze-lambda`

.

```
analyzed-body := analyzed-fact-if
(analyze-lambda '(lambda (n) ...)')
=> (lambda (env) (make-procedure '(n) analyzed-body env'))
```

We store the last expression into the `factorial`

variable, and we're done
defining `factorial`

. Note that we only analyze the body *ONCE*: during the
analysis stage. We never `analyze`

during the evaluation stage! This means
that during evaluation, every time we call this `factorial`

function, we know
its body contains an `if`

statement, and that the `if`

statement checks if `n`

equals `0`

(and what to do if the predicate is true or false).

Now, on to evaluating factorial. This is where you'll see all the cryptic analyzing work pay off.

```
(eval '(factorial 2) env') ; env has factorial definition
((analyze '(factorial 2)') env)
((analyze-application '(factorial 2)') env)
((lambda (env) (execute-application ...)) env)
((procedure-body {internal factorial value})
(extend-environment ...)) ; extend-environment is same as old eval
;; let's call the extended environment, env2
(analyzed-body env2) ; analyzed-body from definition above
((lambda (env)
(if (true? (if-pred env))
(if-true env)
(if-false env)))
env2)
(if (true? (if-pred env2)) ; (= n 0)
(if-true env2) ; 1
(if-false env2)) ; (* (factorial (- n 1)) n)
```

Here, `n = 2 != 0`

, so we'll end up calling executing `(if-false env2)`

. `if-false`

will do an application of `*`

to `(factorial (- n 1))`

and `n`

, but
these arguments have already been `analyzed`

(when we did `analyze-lambda`

).
So we evaluate the analyzed `(factorial (- n 1))`

, which is:

```
(analyzed-factorial {result of calling analyzed (- n 1)})
(analyzed-body env3)
;env3 := env2 extended with n := (- {previous n} 1) = (- 2 1) = 1
(if (true? (if-pred env3)) ; (= n 0)
(if-true env3) ; 1
(if-false env3)) ; (* (factorial (- n 1)) n)
```

We recurse again, in the same fashion:

```
(if (true? (if-pred env4)) ; (= n 0)
(if-true env4) ; 1
(if-false env4)) ; (* (factorial (- n 1)) n)
```

Here, `n`

actually equals `0`

, so we call `(if-true env4).`

`if-true`

disregards `env4`

and returns the number `1`

. Then, we go back to all the
execute-application primitive applications and multiply everything together.

And we get... `2`

.

So, we're done.

Notice that during the evaluation phase, we never check the syntax of a
statement. The syntax has already been looked at, and `analyzed`

. We simply
carry out what these `analyzed`

statements tell us to do. Think about the gain
in efficiency here when computing something like `(factorial 100)`

.

12 - Analyzing Evaluator and Lazy Evaluator