Abstract-We introduce an algorithm for the efficient computation of the continuous Haar transform of 2D patterns that can be described by polygons. These patterns are ubiquitous in VLSI processes where they are used to describe design and mask layouts. There, speed is of paramount importance due to the magnitude of the problems to be solved and hence very fast algorithms are needed. We show that by techniques borrowed from computational geometry we are not only able to compute the continuous Haar transform directly, but also to do it quickly. This is achieved by massively pruning the transform tree and thus dramatically decreasing the computational load when the number of vertices is small, as is the case for VLSI layouts. We call this new algorithm the pruned continuous Haar transform. We implement this algorithm and show that for patterns found in VLSI layouts the proposed algorithm was in the worst case as fast as its discrete counterpart and up to 12 times faster.
I. INTRODUCTION
The Haar transform (HT) is often a tool of choice in image processing due to its edge detection property, low complexity and the simplicity of its implementation. It is particularly suited for piecewise constant functions that have a very sparse and accurate representation in the Haar domain. An important class of twodimensional (2D) piecewise constant functions is the class of functions described by a union of disjoint polygonal subsets of R 2 . Such a description is often used in different areas of image processing such as contour detection, segmentation, tomography image reconstruction [1] or for VLSI layouts description [2] . A polygonal shape is usually described by an ordered list of its vertices. This description has the advantage of being very compact and natural to understand. Many algorithms in computational geometry make efficient use of this description to solve various problems like intersections of polygons, area computations or point inclusion [3] . However, this description has no fixed length which makes it more cumbersome for use in other applications in image processing including machine learning, pattern matching or measuring similarity. The Haar transform provides such a fixedlength representation.
Optical lithography is the process that allows mass production of VLSI circuits [4] . The HT has been used so far in lithography to compress the Fourier precompensation filters for electron beam lithography [5] and also to regularize the obtained mask in inverse lithography [6] . More recently, Kryszczuk et al. introduced the direct printability prediction of VLSI layouts using machine learning techniques [7] . They use fixed-length feature vectors from orthogonal transforms and train a classifier to predict the printability of VLSI layouts without having to go through the detailed, and thus computationally expensive, simulation of the physical process of lithography. The HT is a perfect candidate to provide features in that case due to its close match to the polygons found in VLSI layouts. However, to obtain these features one has first to perform the transform of the enormous amount of data contained in modern VLSI layouts. It is thus crucial to have a very fast algorithm to yield HT coefficients from the vertex description of the polygons.
The most straightforward way would be to first create a discrete image by sampling the polygons, and then use the discrete Haar transform (DHT) on the resulting image. However, the polygons describe an inherently continuous function, which allows us to compute the continuous Haar transform (CHT) coefficients instead. By using techniques borrowed from computational geometry to compute the inner products with the CHT basis functions, we are able to massively prune the transform flow-diagram in addition to avoiding sampling completely. This leads to a dramatic decrease of the computational load when the number of vertices is small. We call this new algorithm pruned continuous Haar transform (PCHT). The outputs of the DHT and the PCHT are identical for 2D polygonal patterns. The PCHT was concretely implemented in a lithography tool and proved to have significantly lower runtime compared to the DHT for the particular case of rectilinear polygons from VLSI design layouts.
The main contribution of this work is PCHT, a fast algorithm, and to the best of our knowledge the first of its kind for the CHT of 2D piecewise constant polygonal patterns. Its efficiency compared to the DHT for the case of VLSI layouts is demonstrated with potential high impact for pattern matching techniques envisioned in computational lithography.
In Section II, we will first briefly introduce the signal model we consider, namely 2D piecewise constant polygonal patterns. In Section III, we give a reminder on the 1D and 2D Haar transform while Section IV presents the PCHT algorithm. The results of the application to VLSI design layouts are given in Section V and we conclude in Section VI.
II. SIGNAL MODEL
The signal model we consider is one of 2D piecewise constant polygonal patterns. This is the class of images described by the union of a finite number of disjoint simple polygons. Mathematically, a polygon is described by a set of points called vertices. A polygon P ⊂ R 2 is typically defined by its boundary, which is a collection of straight segments, called edges. The polygon contains all the points inside the boundary. The description of a polygon is the list of its K vertices, ordered clockwise:
where any two successive vertices describe one edge of the boundary. In addition, simple polygons have the property that no two edges intersect each other. A subclass of simple polygons termed rectilinear comprises all those with only right angles and is the building block of VLSI layouts. An example of such polygons is shown in Figure 1 . A 2D piecewise constant polygonal pattern is described by a collection of M disjoint polygons each with an associated weight {(Pi, wi)} M −1 i=0 and with disjoint interiors, i.e. Int (Pi) ∩ Int (Pj) = ∅ ∀i = j, where Int(P) is the interior of polygon P, and wi ∈ R. Finally the continuous image model is
where we use an indicator function ½P(x, y) = 1 if (x, y) ∈ P and 0 otherwise.
III. THE HAAR TRANSFORM
The 1D Haar basis is an orthonormal basis on [0, T ), composed of the family of functions
The functions ϕ and ψ are respectively defined as
As the 2D Haar basis is separable, we can define it in terms of the 1D basis. Now we want to work over a surface T = [0, Tx) × [0, Ty) we call a tile. The scaling function is given by:
where j = kx = ky = 0. The other basis functions are given by the possible combinations of ϕ and ψ, one of them being: 
where j ∈ N is the scale and kx, ky ∈ {0, . . . , 2 j − 1} the shifts in the x and y directions respectively. Similarly, we get ψ
Given the basis functions defined in (2) and (3), we derive the Haar transform as the inner product between the function f to transform and the Haar basis functions. Using the L 2 (T ) and the l 2 (T ) inner products, with a discretized tileT and discretized basis functions, we respectively get the dyadic continuous and discrete transforms [8] . The CHT and DHT coefficients are identical for 2D piecewise constant polygonal patterns.
Both the CHT and the DHT can be computed using the fast orthogonal wavelet transform (FWT) [9] . This algorithm is constructed using the two-scale relationships that link the basis functions at different scales:
where gn and hn are the taps of two discrete-time filters [8] .
The Haar filters are defined as gn
This results in a Cooley-Tukey butterfly structure [10] where only the inner products with the scaling function at the lowest level need be computed. The full flow diagram for a length-8 1D transform is shown in light grey in Figure 2 .
Using the separability of the 2D transform and the two-scale relationships, we obtain the relations between the 2D basis functions of the different scales. For example, ψ By replacing hngm in the sum by gngm, gnhm and hnhm we obtain ϕ j,kx,ky , ψ (lh) j,kx,ky and ψ (hh) j,kx,ky respectively. As in the 1D case, these relations induce a 2D butterfly structure. Therefore transform coefficients can be computed as a linear combination of inner products with the scaling function at different scales. 
IV. PRUNED CONTINUOUS HAAR TRANSFORM

A. Algorithm Derivation
Let us consider the FWT described in Section III. First, using the signal model from (1) and the linearity of the inner product, we can decompose the transform into a sum of inner products of individual polygons with basis functions:
wi ½P i , ϕ j,kx,ky .
Thus, from now on we consider only the transform of a single polygon. The second idea is to use computational geometry techniques to compute the inner product. The continuous inner product between the indicator of a polygon and the scaling function is the area of the geometrical intersection of the polygon and the support of the scaling function, multiplied by 2 j / TxTy. The Haar transform acts as a discontinuity detector and all the transform coefficients will be zero except for basis functions that intersect the boundary of the polygon. As a consequence, the basis functions completely inside or outside a polygon yield a zero inner product. Moreover, all the coefficients below such a basis function in the transform tree are also zero (see Figure 2) . Therefore the transform can be written as a divide-and-conquer algorithm. Divide the tile in four rectangular parts recursively until the part considered is completely inside or outside the polygon. Pseudocode for the PCHT is given in Algorithm 1, in which T j,kx,ky = [kxTx/2 j , (kx +1)Ty/2 j )×[kyTy/2 j , (ky +1)Ty/2 j ) is the support of ϕ j,kx,ky . An example of the pruned transform flow-diagram for the 1D case is shown in black in Figure 2 . In order to compute all the transform coefficients, the algorithm is called in the following way:
where X j,kx,ky = f , ϕ j,kx ,ky and s0 = 1/ TxTy. The transform coefficients are C 
V. APPLICATION TO VLSI LAYOUTS
We now show a practical example of the application of this algorithm to compute the CHT coefficients of a VLSI layout. In practice a layout is described using a vector format such as OASIS [2] . Compared to general simple polygons, those found in VLSI layouts have two additional properties. All vertices are placed on the integer grid and all edges are parallel to either the x or y axis. We call these polygons rectilinear. The routine used to compute the intersection area in Algorithm 1 is specifically adapted Algorithm 1 PCHT(P, j, kx, ky, s) Require: The polygon to transform P, the scale j, the shifts kx and ky and the scaling factor s. Ensure: C (hl) ,C (lh) ,C (hh) contain the transform coefficients.
1: i ← IntersectionArea P, T j,kx,ky 2: if i = 0 or i = TxTy/2 2j or j = J then 3: Return i 4: end if 5: x ← PCHT(P, j + 1, 2kx, 2ky, 2s) 6: y ← PCHT(P, j + 1, 2kx + 1, 2ky, 2s) 7: z ← PCHT(P, j + 1, 2kx, 2ky + 1, 2s) 8: t ← PCHT(P, j + 1, 2kx + 1, 2ky + 1, 2s)
for rectilinear polygons from classical computational geometry techniques [3] .
We implemented the PCHT and DHT algorithms in a computational lithography tool that we ran on a 3GHz Intel Xeon 5450 running Linux in 64-bit mode. All the code is C++, single-threaded and was compiled using GCC 4.1.2 with option "-O3". The DHT was custom implemented taking into account the knowledge that the input image is binary. For both transforms the output coefficients were discarded instead of being stored in order to minimized the impact of memory transfers on the runtime measurements. We ran a benchmark of the PCHT and the DHT on different layers of a 22nm VLSI layout. Layers M1 and M2 are metal layers that contain both rectangles and other polygons, while the contact array (CA) layer contains only rectangles. Examples of tiles from the different layers are shown in Figure 3 . Figure 4 shows the runtime as a function of the number of vertices K in 1024nm×1024nm tiles from the M1 layer, the worst configuration for the PCHT in our experiment. The empirical distribution of the number of vertices in the tiles is shown in light grey. Although the runtime of the continuous transform grows with K, it outperforms its discrete counterpart for about half the tiles. The peaks around K = 190 are caused by the very low number of tiles in that range, as shown by the empirical distribution, and all these tiles having a worse than average runtime. The DHT also shows a slight dependence on K due to the time needed to create the discrete image. The average speed-up of the runtime as a function of the tile size is shown in Figure 5 . The speed-up is defined as the ratio of the runtimes of the DHT and pruned CHT. Because it has the highest vertex density, the M1 layer shows the least improvement, between 1 and 3 times speed-up. For large tiles, layers M2 and CA show speed-up over 6 and 12 times, respectively. In a practical scenario where the output coefficients need to be stored for further use we expect the PCHT to outperform even more the DHT as its pruned structure avoids completely the computation of zero coefficients and thus their storage and associated memory transfers. On the other hand, the DHT has no knowledge of which coefficients will be zero and thus either tries to store every output coefficients or an if statement can be used if we know in advance that most coefficients will be zero, as is the case here. In addition, further optimizations such as parallelization and optimization for the cache size are possible.
VI. CONCLUSIONS
We introduced the PCHT, a new algorithm for the computation of the CHT of 2D polygonal patterns. We showed significant speedup compared to the DHT in an implementation targeting rectilinear polygons found in VLSI layouts. We expect the PCHT to impact machine learning techniques being developed for application in computational lithography, such as printability prediction [7] , as well as in the VLSI design process in general.
The natural next step for our work would be to analyze the computational complexity of PCHT in order to validate theoretically its superiority over the DHT. The performance of both algorithms implementations should also be reassessed when the coefficients are stored in memory in order to account for the impact of memory transfers. Another important implementation step is the parallelization of the code as current and future increases in computation power come primarily through multi-core chips. The recursive nature of the PCHT algorithm makes it a perfect candidate for parallelization.
We also intend to apply our vertex based approach to other transforms such as the continuous Fourier series. A fast algorithm to compute the Fourier series would also be very valuable in lithography where the fast Fourier transform is routinely used in optical lithography process simulation despite the introduction of aliasing due to the infinite bandwith of the polygons. Moreover, we expect an optimized vertex based fast continuous Fourier series algorithm to be inherently faster than the fast Fourier transform. 
