We present a VLSI synthesis environment dedicated to the design of image processing architectures. The environment includes a "front-end" data-flow emulator for validation of the algorithms and the RTL-synthesis system called ALPHA. The latter implements a stochastic search in the design space and produces efficient solutions considering the "restricted" domain of concerned applications. Two simulated Annealing (SA) algorithms run in sequence for data-path synthesis (scheduling and module selection) and then for control synthesis and data-path completion (binding). An interesting feature of the first optimization is the use of the data-flow graph regularity to predict the control influence in terms of the future design. A few designs have already been compiled under this environment including a default detector presented here.
INTRODUCTION
Today's demand for both high performance and severely constrained machine vision functions leads to an ever increasing diversification of algorithms while design cycles get shorter. Although the computing power of general purpose micro-processors and DSPs is dramatically increasing accordingly, it still is surprisingly difficult to find a good enough ( A global approach to the automatic synthesis of complex Image Processing systems appears then necessary. The automatic mapping of algorithms onto specific architecture targets is the majority unsolved yet problem.
We advocate for a "functional approach" to describing both algorithms and architectures through a unique formalism. The "functional decomposition" of vision tasks has been studied in our lab for approximately a decade and has led to the design of three complete environments: one for algorithmic design based on analogies [1] , the second for hardware automata emulation based on ASICS assembling and inituitive programming [2] and the third, an intermediate version, to be used here [3, 4] . It is a specialized highly parallel computer coupled to the high level 
3 Scheduling Algorithm
Once an arbitrary solution has been given, a first optimization procedure is run to find the optimal scheduling of the data-flow graph on a synchronous machine. Module selection is achieved simultaneously. This is done by using a Simulated 328 F.S. VERDIER AND B. ZAVIDOVIQUE Annealing based stochastic search. Let us describe in detail the technique that already appears in [12] and [13] .
The main optimization phase consists in applying a randomly selected transformation on the data-flow graph representing the architecture. The simulated annealing process iterates these transformations until a "good" solution is reached (Fig. 6 ). There A first efficient use of the regularity notion appears in [15] . The regularity of a data-flow graph has been also considered successfully in [16] . Our method consists in computing the regularity of a scheduled graph from a statistical measure of the diversity of its motifs.
Conjecture 1 The more regular a scheduled graph (in terms of its data transfers and the number of operations in a given machine-cycle), the simpler its associated control (in terms of area, number of control signals) (Fig. 8) .
One can find an illustration of this conjecture with systolic architectures where the control part is reduced to a mere clock distribution. We shall see that in such a case the regularity is infinite. Examples in Figure 8 depict the impact of increased regularity on the control complexity and on the data path implementation as well. The increase in interconnection complexity (bus, registers, multiplexors) in the irregular example of Figure 8 leads to a raise of control complexity (more control signals) and area.
We can notice that"
Remark 1 As mentioned in [19] , the configuration with the lowest global cost is not necessarily Let CDFG be the directed hierarchical data-flow graph:
and type(edger) (type(node/), type(nodej)) (2) we also have the following sets" (Fig. 9 ).
The second structure is a set of slices. Each slice is a set of functional nodes having resource conflicts in common (multiple non-exclusive similar operations executed on the same machine cycle, overlapped multicycled operations). By regrouping these nodes in the same structure, one can easily detect any module parallelism and therefore find a new instance candidate for a particular node. A particular slice contains not only a set of exclusive nodes (in terms of hardware sharing) but also a list of the available hardware instances for their implementation (an example of the slices is shown Fig. 10 ).
5 Netlist Generation
The Figure 12 shows the edge detected input image, the image of misaligned pixels an the final binarized output image produced by the algorithm emulation on the Functional computer.
The interesting feature in this selected example is the hierarchical specification. It is illustrated by the separate description of the main process and the 1-direction macro (Fig. 13) . The synthesis of the algorithm has been done by compiling separately the 1-direction macro and by including its hardware structure into the module database. Indoing so, semantic level of available modules is shown to be extended by the synthesis system .END FIGURE 13 Hierarchical FP description of the default detector.
itself. Table I gives the final features of the two synthesis results.
First, the 1-direction macro has been synthesized as a structural netlist of register transfer level operators. Figure 14 shows the evolution of the optimization parameters during the synthesis step. The resulting netlist has been compiled, placed, routed and saved in the form of a macro-cell. Second, the main algorithm has been synthesized and each occurence of the 1-direction operation has been chosen to be implemented by the multicycled macro-cell. The clock frequency is adapted to the cell working frequency. Figure 15 gives the layout of the synthesized chip. 15 Layout of the synthesized default detector chip.
in a systematic manner for more complex architectures and compared to another design method experimented in our laboratory based on the mere derivation from the emulator architecture.
