DNA probe arrays have emerged as a core genomic technology that enables cost-effective gene expression monitoring, mutation detection, single nucleotide polymorphism analysis and other genomic analyses. DNA chips, are manufactured through a highly scalable process, Very Large-Scale ImmobilizedPolymer Synrhesis (VL-SIPS), that combines photolithographx technologies adapted from the semiconductor industry with combinatonal chemistry. Commercially available DNA chips contain more than a half million probes and are expected to exceed one hundred million probes in the next generation. This paper is one of the first attempts to apply VLSI CAD methods to the problem of probe placement in DNA chips, where the main objective is to minimize total border cost (i.e., the number of nucleotide mismatches between adjacent sites).
Introduction
DNA probe arrays -DNA arrays or DNA chips for short -have recently emerged as one of the core genomic technologies. They provide cost-effective means for obtaining fast and accurate results in a wide range of genomic analyses, including gene expression monitoring, mutation detection, and single nucleotide polymorphism analysis (see [231 for a survey). Existing applications already cover many diverse fields ranging from healthcare to environmental sciences and law enforcement, and the number of applications is growing at an exponential rate [14, 271. The rapid adoption of DNA arrays is due to a unique combination of robust manufacturing technology that leverages semiconductor wafer processes, massive parallel measurement capabilities, and highly accurate and reproducible results. Permission to make digital or hard copies of all or part of this work for personal or classroom usc is granted wilhout fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy n t h e w i s~ to reoublish. to oost on servers or to redistribute to lists.
DNA arrays are manufactured through a highly scalable process, referred to as V e q Lorge-Scale lrrrrriobilized Polymer Synrhesis (VLSIPS) , that combines photolithographic technologies adapted from the semiconductor industv with combinatorial chemistry [I, 2, 131 . Similar to Very Large Scale lntegration (VLSI) circuit manufacturing, multiple copies of a DNA array are simultaneously synthesized on a wafer, which is typically made out of quartz. When synthesis is complete, the wafers are diced and arrays are packaged individually. Depending on the number of distinct probes per may, a single 5" x 5" wafer can yield hetween 49 and400 arrays.
The VLSIPS manufacturing process can be briefly described as follows. To initiate synthesis, linker molecules including a photolabile protective group are attached to the wafer, forming a regular 2-dimensional pattem of synthesis sites. Probe synthesis then proceeds in successive steps, with one nucleotide (A, C , T, or G) being synthesized at a selected set of sites in each step. To select which sites will receive nucleotides, photolithographic masks are placed over the wafer. Exposure to light &-protects linker molecules at the non-maked sites. Once the desired sites have been activated in this way, a solution containing a single type of nucleotide (which bears its own photolabile protection group to prevent any probe from growing by more than one nucleotide) is flushed over the wafer's surface. Protected nucleotides attach to the unprotected linkers, initiating the probe synthesis process. In each subsequent step, a new mask is used to enable selective de-protection and single-nucleotide synthesis. This cycle is repeated until all probes have been fully synthesized.
Current commercial DNA arrays integrate hundreds of thousands of different probes on a surface only slightly larger than lcm'. Next-generation designs, enahled by the rapid scaling of the DNA array manufacturing processes into the sub-micron domain, are expected to integrate up to hundreds of millions of different probes [Z. 231. As the number and size of DNA array designs is expected to ramp up in coming years, there is an urgent nced for high-quality software tools to assist in the design and manufacturing process. Existing design tools, dominated by ad-hoc heuristics with unknown suboptimality properties, are not well suited to handle the next generation of high-density arrays.
The Border Minimization Problem
Let MI ,Mz,. . . ,MK denote the sequence of masks used in the synthesis of an array, and let si E { A , c, T , G} be the nucleotide synthesized after exposing mask Mi. Every prohe in the array must be a subsequence of the nucleoride deposirion sequence S = s l s z . . .sx.
In case a probe corresponds to multiple subsequences of S, one such subsequence, or "embedding" of the probe into S. must be chosen as the s) nthoi> Total border length i s 24 (7 on the A mask. 4 on the C mask, 6 on the T mask, and 7 on the C mask). 
Previous

Contributions of this Work
In this paper, we make several contributions. First, we propose the first panirioning-based algorithms for DNA probe placement to previous methods (Section 3.1). Second, we give a simple inplace probe re-embedding algorithm that gives better solution quality than the "chessboard" and batched greedy algorithms proposed in I181 (Section 3.2). Third, we experimentally study the scalability and suboptimality of existing and newly-proposed DNA probe placement algorithms (Section 3.3). We observe that border cost normalized by the number of pairs of adjacent may sites is decreasing with array size for all studied algorithms (whereas it remains constant for a random placement). This is an encouraging observation for future scaling of the DNA array technology. For all probes p E Rf,;
Insen p into the yet-unfilled partition of R., whose centoid has minimum distance to p that the possible interactions between probes must be modeled by a complete graph and, furthermore, the border cost between two neighboring placed partitions can only be determined after the detailed placement step which finalizes probe placements at the border between the two partitions. In tlus section we describe a new centroid-based quadriseclion method that applies the recursive partitioning paradigm to DNA probe placement. Assume that at a certain depth of the recursive partitioning procedure, a probe set R is to be quadrisectioned into four partitions R1,R2,R3 and Rq. We would like to iteratively assign each probe p E R to some partition R; such that a minimum number of conflicts will result? To approximately achieve this goal within practical runtimes, we pmpose to base the assignment on the number of conflicts between p and some representative, or centroid, probe C ; t R,. In our approach, for every panition R we select four centroids, one for each of the four new (sub-)partitions. To achieve balanced partitions, we heuristically find four probes in R that have maximum total distance among themselves, then use these as the centroids. This procedure, described in Figure 2 , is reminiscent of the k-center approach to clustering studied by Alpert et al.
[4], and of methods used in large-scale document classification 191.
After a given maximum partitioning depth L i s reached, a final detailed placement step is needed to place each partition's probes within the partition's corresponding region on the chip. For this step, we use the row-epitaxial algorithm of [191, which for completeness of exposition is replicated in Figure 3 .
The complete partitioning-based placement algorithm for DNA arrays is given in Figure 4 . At a high level, our method resembles global-detailed approaches in the VLSl CAD literature [17, 211.
The algorithm recursively quadrisects every partition at a given level, assigning the probes so as to minimize distance to the centroids of subpartitions4 In the innermost of the three nested for loops of Figure 4 . we apply a multi-start heuristic, trying r different random probes as seed CO and using the result that minimizes total distance to the centroids. Once the maximum level L of the recw- sive partitioning is attained, detailed placement is executed via the row-epitaxial algorithm. Additional details and commentary are as follows.
Our construction of partitions has an obvious dependence on the order in which probes p are considered. Our work has not yet examined the impact of the probe ordering degree of freedom on centroid-based partitioning.
. Within the innermost of the three nestedfor loops, our implementation actually performs, and benefits from, a dynamic updale of the partition centroid whenever a probe is added into a given partition. Intuitively, this can lead to "elongated rather than "round" clusters, hut can also correct for unfortunate choices of the initial four centroids5 0 The straightforward implementation of RepfxO-based detailed placement within a given partition will treat the last locations "unfairly", e.g., for the last location considered, be only one candidate probe will remain. To balance the options for every position, our implementation permits "borrowing" probes from the next region in the Repfx() procedure. For every pasition, we select the best probe from at most m probes, where m is a pre-determined constant, in the current region and rhe nexf region. (Except as noted, we set m to 2owO for all of our experiments.) Our ReptxO implementation is also "borderaware", that is, it takes into account Hamming distances to the placed probes in adjacent regions.
Time Complexity
Let the number of probes in a chip be n. The procedure SelecfCenfroidj) for a partitioned region at recursion depth I takes O( 2 )
steps, and grouping all the probes into four partitions also takes O( 5 ) steps. Therefore, the mntime for every recursion depth I is
O(34') = O(n).
Since there are L recursion depths, the overall runtime for partitioning is O(Ln). For the Reprx() procedure. ai most m = 2oooO comparisons are executed for every position. Therefore, the total runtime is O(n(L+m)). Since L 5 log4n, the runtime is O(n(log,n + m ) ) .
Comparison of Probe Placement Heuristics
Partitioning-Based vs. Other Methods
We compare four two-dimensional placement algorithms.
TSP+Threading 1161:
This algorithm computes a TSP tour in the complete graph with the probes as vertices and edge costs given by Hammifig distances. The tour is then threaded into the two-dimensional array of sites using the I-threading method described in [16].
Row-Epitaxial 1191:
An implementation of the epitaxial algorithm in [IS] . where the computation is sped up by (a) filling array sites in a predefined order (row by row), and @) considering only a limited number of candidate probes when filling each array site. Unless otherwise specified, the number of candidates is hounded by 20000 in our experiments. 3 W'i 3. Sliding-Window Matching (SWM) 1191: After an initial placement is obtained by 1-threading of the probes in lexicographic order, this algorithm iteratively improves the placement by selecting an "independent" set of probes from a sliding window and then optimally re-placing them using a minimum-weight perfect matching algorithm (cf. "row-
ironing" [6]).
Section 2 above.
4.
Recursive partitioning based placement: As described in Table 1 compares the results produced by the first three (previously known) algorithms on random instances with chip sizes between 100 and 500 and probe length equal to 25. We find that Row-Epitaxial is the algorithm with highest solution quality, while SWM is the fastest, offering competitive solution quality withmuch less mntime. Besides total border cost, we also report the border cost normalized by the number of pairs of adjacent array sites, i.e., we also give the average number of conflicts per pair of adjacent sites. This number decreases with increasing chip size, which can be attributed to greater freedom of choice available when placing a hgher number of probes. Table 2 presents results for OUT new recursive partitioning method with different maximum recursion depths L = 1,2,3. Comparing to the results produced by Row-Epitaxial. the best previously known technique (Table I) , we find that recursive partitioning based placement achieves on the average similar or better results with improved runtime.
A Note on In-Place Re-Embedding
While VLSl physical design automation may be viewed as primarily a "Place & Route" process, VLSIPS (DNA array) physical design automation is rather a "Place & Embed process. Recall that embedding exploits the possibility of asynchronous embedding of probes within the mask sequence to further reduce the total number of conflicts. Here, we note our development of a new in-place re-embedding algorithm -i.e., a method which, given a two-dimensional probe placement, improves the embedding of the probes without re-placing them -and compare the new method against two previous in-place re-embedding algorithms.
Each of the following 3 algorithms uses the dynamic programming algorithm in [IS] for optimal re-embedding of a probe with respect to the embeddings of its neighbors:
I . Batched greedy [181:
This algorithm optimally re-embeds a probe that gives the largest decrease in conflict cost, until no further decreases are possible. To improve the mntime, the greedy choices are made in phases. in a batched manner: in each phase the gains for all probes are computed, and then a maximal set of non-adjacent probes is selected for reembeddeding by traversing the probes in non-increasing order of gain.
Chesshoard 1181:
In this method, the 2-dimensional placement grid is divided into "blacr" and "white" locations as in the chessboard (or checkerboard) grid of Akers [3]. The sites withm each set represent an maximum independent set of locations. The Chessboard algorithm altemates between optimal re-embedding of probes placed in "black" (respectively "white") sites with respect to their neighbors (all of which are at opposite-color locations).
3. Sequential: Our new algorithm performs optimal reembedding of probes in a sequential row-by-row fashion. We believe that a main shortcoming of Batched Greedy and Table S : Total border cost, normalized border cost, gap from asynchmnous lower-bound [18] , and CPU seconds (averages over IO random instances) for the recursive partitioning algorithm followed by sequential in-place re-embedding.
-
LowerBound 1
Chessboard is that these methods always re-embed an independent set of sites on the DNA chip. Our intuition is that dropping this requirement permits faster propagation of the effects of any new embedding, and hence convergence to a better local optimum. Table 3 compares the three algorithms on random instances with chip sizes between 100 and 500 and probe length 25. The initial two-dimensional placements were obtained using TSP+l-threading. All algorithms are stopped when the improvement cost achieved in one iteration over the whole c h p drops below 0.1% of the total cost. The results show that re-embedding of the probes in a sequential row-by-row order leads to a reduction in the border cost by 0.8% compared to the chessboard algorithm.
3.3
Another series of experiments executes the complete placement and embedding flow. We compare the four two-dimensional placement algorithms discussed in Section 3.1, when each is followed by sequential in-place re-embedding. Results are given in Tables  4-5. Table 4 compares existing methods: TSP+IThr, row-epitaxial (REPTX) with 20000 lookahead probes and sliding window matching (SWM) with 6 x 6 windows and an overlap of 3. TSP+lThr is dominated by both REPTX and SWM in both conflict cost and running time. REF'TX produces less conflicts than SWM hut SWM is considerably faster. ms tradeoff between speed and quality is, of course, reminiscent of the VLSI CAD experience with heuristics for difficult optimizations. Table 5 gives the number of conflicts for the proposed recursive quadrisection partitioning technique. L indicates the recursion depth for which the given results are obtained.
Comparing these results to those in Table 4 , we see that DNA chip placement using recursive-partitioning outperforms the best previous flow (row-epitaxial + sequential re-embedding) by an average of 4.0%.
Comparison of Placement and!Embedaing Flows
3.4
Quantified sub-optimality of placement and embedding algorithms As noted in the introduction, next-generation DNA probe mays will contain up toone hundred million probes, and therefore present instance complexities for placement that will far outstrip those of VLSI designs. Thus, it is of interest to study not only runtime scaling, but also scaling of suboptimality. for available heuristics. To this end, we apply the experimental framework for quantifying suboptimality of placement heuristics that was originated by Baese and by Hagen et cells; this corresponds to a "placement example with known optimal wirelength" (PEKO). In our DNA probe placement context, there is no need to generate a netlist hypergraph. Rather, we realize the concept of minimum (border) cost edges (adjacencies) by constmcting a set of probes, and their placement, using 2-dimensional Gray codes [ll] . Our a nstruction generates qk probes which are placeable such that every probe has border cost of 1 to each of its neighboring probes. T h s construction is illustrated in Figure 5 .
Instances with known suboptimal solutions. Because constructed instances with known optimum solutions may not be representative of "real" instances, we also apply a technique [I 51 that allows real instances to be scaled, such that they offer insights into scaling of heuristic suboptimality. A technique of Hagen et al. is applied as follows. Beginning with an instance I consisting of a ("real") DNA chip probe set, we induce three isomorphic versions of I by three distinct mappings of the nucleotide set {A,C,G,T} onto itself. Each mapping yields a new prohe set that can be placed with optimum border cost exactly equal to the optimum border cost of I . Our scaled instance I' consists of the union of the original prohe set and its three isomorphic copies. Observe that one placement solution for I' is to optimally place I and its isomorphic copies as individual chips, and then to adjoin these placements as the four quadrants of a larger chip. Thus, an upper bound on the optimum border cost for I' is 4 times the optimum h r d e r cost for I , plus the harder cost between the copies of I ; see Figure 6 .
If a heuristic H places I' with cost cN(1') 2 4. cH(1). then we may infer that the heuristic's suboptimality is growing by at leastafactor M . Ontheotherhand,ifcH(I') < 4 . c~( I ) .
then at least on this class of instances, the heuristic's solution quality would be said to scale well. Table 6 shows results from executing the various placement heuristics on PEKO-style test cases, with instance sizes ranging from 16 x 16 through SI2 x SI2 (recall that our Gray code construction yields instances with4k probes). We see from these results that sliding-window matching approaches closest to the optimal values, with a suboptimalily gap of around 30%. Overall. DNA array placement algorithms appear to be performing better than their VLSI counterparts 17) when it comes to results on special-case instances with known optimal cost. Of course, resuirs from placemenr algorithms (wherhelrer for VIS1 or DNA chips) on special benchmark insrances should nor be generalized ro arbirrary benchmarks. Our results, though, do illustrate this point: algorithms that perform best for arbitrary benchmarks are not necessarily the best performers for specially constructed benchmarks. Table 7 shows results from executing the various placement heuristics on scaled versions of random DNA probe sets, with the original instances ranging in size from 100 x 100 to 500 x 500, and the scaled instances thus ranging in size from 200 x 200 to 1000 x 1000. T h i s table shows that in general, placement algorithms for DNA arrays offer excellent suboptimalily scaling. We,believe that this is primarily due to the already noted fact that algorithm quality (as reflected by normalized border costs) improves with instance size. The larger number of probes in the scaled instances gives more freedom to the placement algorithms, leading to heuristic placements that have total conflict well under.the constructive upper bound. Proposing a new DNA probe array placement algorithm that recursively places the probes on the chip in a manner similar to top-down VLSI placers, via a cenrmid-based strategy. and a new probe embedding processing technique for asynchronous re-embedding of placed probes within the mask sequence. Experimental results show that combining the new algorithms results in average improvement of 4.0% over best previous flows.
Studying and quantifying the performance of existing and newly proposed DNA Place & Embed algorithms in experiments on benchmarks with known optimal cost as well as sca/ing suboprimaliry experiments, in a manner similar to recent studies in the VLSI CAD field.
We conclude with some remarks that address observed similarities and contrasts between VLSl placement and VLSlPS placement. First, while VLSI placement performance in general degrades as the problem size increases, it appears that this is not the case for DNA array placement. Current algorithms are able to find DNA array placements with smaller normalized border cost when the number of probes in the design grows. In fact, our experiments show that the gap between the lower bound of [I81 and the actual number of
