Introduction
If src has been computed before then some previous instruction has had the same csrc and has entered it in the cache (see Step 5). Thus the cheapest expression for this src is the cheapest s-expr in csrc's equivalence class. If a cheaper expression for src is found, it replaces src in the original register transfer. Otherwise, src is not disturbed.
A similar, procedure replaces addr with its cheapest equivalent.
Cheapness is determined by a machine-dependent function described below. cacher also creates and maintains use lists, which link the instructions that use each particular s-expr. When an instruction is first encountered, it is added to the use lists of each s-expr it uses. When an instruction is changed to use a cheaper reference, it is removed from the use lists of the s-exprs it had used, and it is added to the use lists of the new s-exprs that it now uses. When a register's use list becomes empty, the instruction that loaded that register is deleted and removed from the use lists of the s-exprs it had used. This may trigger further deletions and removals, recursively. This prompt removal from use lists assumes that the code generator uses each temporary just once. This simplifies management of temporaries for all parties yet sacrifices nothing because the code generator may use as many registers as it needs.
Finally, cacher may be asked to keep some values out of registers. For example, the PDP-11 has so few allocatable registers that it may be better to save them for values harder to access than, say, constants. Code generators implement this by marking instructions that load such values. cacher treats marked instructions like all others, except that it will not reuse their cdsts in step 4 above. A marked instruction may, however, be deleted if it provides input to a larger redundant computation.
Though simple, cacher does several optimizations at once. Among these are redundant load elimination, common subexpression elimination, dead-variable identification, and peephole definition.
Redundant Load Elimination
Since register references are cheaper than memory references, the algorithm above will remove loads. Also, some instructions have side effects (e.g., divisions often yield a remainder as well as a quotient) that cannot be used in higher-level CSE because they do not appear at a higher level. Similarly, expanding comparisons requires, on some machines, a subtraction followed by a comparison with zero. The difference may be redundant, but it cannot be recognized as such at a higher level because it does not appear until after the machine-dependent stages of code expansion. The situations that create machine-specific common subexpressions are ad hoc and hard to enumerate, but they occur nonetheless. 
Dead-variable
Identification cacher records where cells are last used, so it passes this information to the peephole optimizer and the register assigner, which can better combine instructions and assign registers if they know where cells die. Compilers usually identify dead variables earlier, but just as machine-level CSE permits a few new optimizations, so does machine-level dead-variable analysis. For example. many calling sequences return function values in a fixed register. After the function return, the calling sequence often moves the value to another register, in case the special register is needed again. If it is not needed again, this move will prove unnecessary. Conventional compilers identify such avoidable moves deep in a machine-dependent code generator. cacher identifies such moves with a far more general operation.
Register Assignment
assigner maps the pseudo-registers onto the real registers. It assigns a real register to each pseudoregister, and it replaces each use of the pseudo-register with the associated real one. It frees the real register when the pseudo-register dies. ences. This function must be revised for each machine, but the change typically effects fewer than ten lines of code.
The interference function reports a conflict when assignment to cdt invalidates a cache entry src. This happens when cdst appears in sr-c. when At indexes an array used in src, when cdst indexes a global array and src uses parameter array, and when cdst indexes a parameter array and src uses global array. These rules involve fewer than ten lines of machine-dependent code, though languages with more opportunities for aliasing [Aho] than Y (e. g., pointers) might require a few more, assigner is retargeted by replacing the patterns that identify the names and numbers of the machine registers and by giving code templates for loading and storing such registers. These changes are typically simpler than those made to cacher.
Discussion
Conventional code generators optimize as early as possible.
This often simplifies the requisite analysis and avoids machine-dependence.
but it may sacrifice some code quality.
Whenever an intermediate code is expanded, it is possible for the expansion to introduce optimizable patterns that will be missed by 'early' optimizers.
Experience with cacher shows that at least one optimization traditionally applied to machineindependent triples or quadruples can be applied at reasonable cost to equivalent register transfers.
It is now natural to seek other optimizations that can be usefully applied to object code. For example, address expansion often produces code that can be moved out of loops and that needs global register allocation. Work in progress is adapting such existing optimization to the machine level, but other optimizations may merit similar treatment.
