Hardware-accelerated collision detection using bounded-error fixed-point arithmetic by Raabe, Andreas et al.
Hardware-Accelerated Collision Detection
using Bounded-Error Fixed-Point Arithmetic
Andreas Raabe,1 Stefan Hochgürtel,1 Joachim K. Anlauf,1 Gabriel Zachmann 2
1Technical Computer Science
Bonn University, Germany
{raabe, hochguer, anlauf}@cs.uni-bonn.de
2Computer Graphics
Clausthal University, Germany
zach@in.tu-clausthal.de
ABSTRACT
A novel approach for highly space-efficient hardware-accelerated collision detection is presented. This paper focuses on the
architecture to traverse bounding volume hierarchies in hardware. It is based on a novel algorithm for testing discretely oriented
polytopes (DOPs) for overlap, utilizing only fixed-point (i.e., integer) arithmetic. We derive a bound on the deviation from the
mathematically correct result and give formal proof that no false negatives are produced.
Simulation results show that real-time collision detection of complex objects at rates required by force-feedback and physically-
based simulations can be obtained. In addition, synthesis results prove the architecture to be highly space efficient. We compare
our FPGA-optimized design with a fully parallelized ASIC-targeted architecture and a software implementation.
1 INTRODUCTION
Detecting collisions between a pair of graphical objects
is a fundamental task in many areas such as physically-
based simulation, automatic path finding, or tolerance
checking. Applications are in games, animation sys-
tems, and virtual reality, e.g., virtual assembly simula-
tion, or medical training and planning systems.
In most of the applications in these areas, the goal
is to avoid collisions, or to enable real-time physically-
based simulation. Most approaches today are reactive,
i.e., they first place objects at a new trial position, check
for collisions, and then compute new forces or posi-
tions, based on physical laws, so as to remove any col-
lisions.
This approach demands very efficient collision detec-
tion, because it must perform many collision checks
per simulation cycle. An emerging application area
is the mobile devices market (smart phones, portable
games devices). Here, the challenges, besides speed,
are size and energy consumption. Another particularly
demanding application is force-feedback, where up-
dates of about 1000Hz must be done in order to achieve
stable force computations.
Since collision detection is such a fundamental yet
challenging task, it is highly desirable to have hardware
acceleration available just like 3D graphics accelera-
tors. The benefit is two-fold: a) the system can process
Permission to make digital or hard copies of all or part of this
work for personal or classroom use is granted without fee provided
that copies are not made or distributed for profit or commercial
advantage and that copies bear this notice and the full citation on the
first page. To copy otherwise, or republish, to post on servers or to
redistribute to lists, requires prior specific permission and/or a fee.
Journal of WSCG, ISSN 1213-6972, Vol.14, 2006
WSCG’2006, January 30 – February 3, 2006
Plzen, Czech Republic.
Copyright UNION Agency – Science Press
objects with higher polygon counts, and b) the system’s
CPU can be freed from computing collisions.
In this paper, we present a novel, efficient architec-
ture for hierarchical collision detection of two rigid ob-
jects. It is based on a novel algorithm for testing a
pair of bounding volumes for overlap, which can even
be implemented on fixed-point arithmetic. We derive
a tight bound on the deviation between overlap testing
on fixed-point vs. floating-point arithmetic; this ensures
that no false negatives are obtained while producing
very few false positives.
We also present an implementation on FPGA hard-
ware along with simulation results concerning its speed
and synthesis results concerning its size. Finally, we
compare these results with an earlier, parallelized ASIC-
targeted architecture, and with a software implementa-
tion.
2 RELATED WORK
Considerable work has been done on hierarchical col-
lision detection in software [3, 4, 8, 14, 15]. Some of
the bounding volumes (BVs) utilized are spheres, axis-
aligned bounding boxes (AABB), oriented bounding
boxes (OBB), and discretely oriented polytopes (DOP).
The first work on dedicated hardware for collision de-
tection was presented in [16, 17]. However, they pre-
sented only a functional simulation, while we present
a RT level implementation along with synthesis results.
[12] presented a design that was targeted on ASICs, and
was optimized for speed only, and, thus, utilize a to-
tal of over 4 million gates and a 756 bits wide bus to
a DDR2-RAM. Recently, a commerical hardware was
announced that supposedly can do collision detection,
among other things [1]. However, no details have been
published, in particular, no performance results.
Most other hardware-related research has tried to uti-
lize existing graphics accelerator boards (GPU) [2, 5,
Journal of WSCG 17 ISBN 1213-6972  ISBN 80-86943-09-7
6, 10, 11]. While earlier approaches, such as [11], can
basically handle only convex objects, later algorithms,
such as [2, 10], have extended these to more general
cases such as unions of convex objects or closed ob-
jects. The general class of “polygon soups” can be han-
dled by [5], but they use a hybrid approach where the
graphics hardware only identifies potentially colliding
sets.
All of the approaches using graphics hardware have
the disadvantage that they either compete with the ren-
dering process for the same hardware resource, or an
additional graphics board must be spent for collision
detection. The former slows down the overall frame
rate considerably, while the latter would be a tremen-
dous waste, since most of the resources of the hardware
would not be utilized at all. Furthermore, most of these
approaches work in image space, which reduces preci-
sion significantly.
3 BASICS
3.1 Hierarchical Collision Detection with
k-DOPs
In this section, we give a short outline of the algorithm
of hierarchical collision detection in conjunction with
a special kind of bounding volumes, the k-DOP. Addi-
tionally, we will quickly recap the separating axis theo-
rem and one of its application, the Separating Axis Test
(SAT).
In this paper hierarchical collision detection is uti-
lized to avoid checking every triangle of an object O
for intersection with all triangles of object Q. The ac-
celeration data structure is a so-called bounding volume
hierarchy (BVH), where each leave corresponds to one
triangle and inner nodes correspond to groups of trian-
gles. Each node has a bounding volume (BV) attached
that bounds all triangles associated with it. In order to
achieve a feasible hardware design, we use a binary tree
here.
If two objects are checked for overlap, both hierar-
chies are traversed simultaneously. If their BVs inter-
sect, the next level of BVs is checked. Since two ob-
jects will usually intersect only in a very small num-
ber of primitives, this yields a significant speed-up in
the average case. In practical cases, the complexity is
in O(logn) (n = number of primitives) because only a
small diagonal “slice” of constant width down the BVH
needs to be visited [9].
In this work, we use k-DOPs as BVs because they
were proven to yield very fast collision queries by ex-
tensive benchmarking in software [15], and performed
very well in our hardware studies [12], too.
k-DOPs are defined over a fixed orientation matrix
D =
(
D1, . . . ,Dk/2,Dk/2+1, . . . ,Dk
)
of vectors in R3.
Each vector Di is antiparallel to Di+k/2.
An individual k-DOP is defined by k distances di, one
along each vector Di, thus defining a half-space. These
DOP coefficients (d1, . . . ,dk) are the distances of the
associated halfspaces to the origin.1 The k/2 pairs of
DOP coefficients (di,di+k/2) form a so-called slab [15].
The intersection of these slabs forms the BV:
DOP =
⋂
i=1,...,k
Hi, Hi : Dix−di ≤ 0 (1)
The orientation matrix D, consisting of all the vectors
Di, is fixed and equal for all objects. This yields very
memory-efficient description for every k-DOP: only the
k coefficients di need to be stored.
3.2 Separating Axis Test (SAT)
In this paper we use the so called Separating Axis Test
(SAT) introduced by [4, 14].
[4] have shown that two convex polytopes are dis-
joint if and only if there exists a separating axis orthog-
onal to a face of either polytope or orthogonal to an
edge from each polytope (Separating Axis Theorem).
If only a subset of these axes are tested, false posi-
tives might occur, i.e., the polytopes are disjoint while
the (incomplete) test yields an intersection. The com-
plete SAT is always correct.
To perform the test, both polytopes must be projected
onto each of the candidate separating axes. For each
axis, a pair of intervals on that axis results. If one of
these pairs is disjoint, then the polytopes must be dis-
joint (see Fig. 1).
4 EFFICIENT SAT FOR K-DOPS
In this section, we will derive an efficient Separating
Axis Test for k-DOPs. Additionally, we will show how
the resulting overlap test can be done in fixed-point
arithmetic such that no false negatives occur. Finally,
we will derive a bound on the deviation of the projec-
tion of the fixed-point DOP with respect to the mathe-
matically correct image.
4.1 Precomputation
Since with DOPs the set of vectors {D1, . . . ,Dk} is
fixed, we can exploit that all possible face orientations
of the DOPs within a DOP-tree are the same.
Assume object O is placed relatively to object Q by
rotation M and translation T. Let DT(O) and DT(Q)
denote the DOP-trees of these objects. As described in
Section 3.1, let (A1, . . . ,Ak) be the orientations of the
DOPs’ faces shared by all DOPs in DT(O) after apply-
ing rotation M. Analogously, let (a1, . . . ,ak) denote the
DOP coefficients for DOPs in DT(O), let (B1, . . . ,Bk)
denote the vectors shared by all DOPs in DT(Q), and let
(b1, . . . ,bk) denote the corresponding DOP coefficients.
1 Note that the origin is not necessarily the center of the DOP nor even
contained in it.
Journal of WSCG 18 ISBN 1213-6972  ISBN 80-86943-09-7
originA
origin B
V
max
B
V
min
A
bmin
bmax
amax
amin
p
a
j0
a
j1
b
j0
Li
diff
+k/2
b
j
1
+k/2
Figure 1: Two DOPs are projected onto test axis Li.
Since their images do not intersect Li is a separating
axis.
Note that everything independent of (a1, . . . ,ak) and
(b1, . . . ,bk) is constant throughout the whole DOP-trees.
Hence it can be precalculated at startup to initialize the
algorithm (and, later-on, the hardware). Precomputing
as much as possible significantly reduces the resulting
hardware costs. Since this is done only once per pair of
DOP-trees, it is not time-critical.
First, we can precompute the n test axes Li. All of
the following is done for each Li, so for the sake of
simplicity we omit the index i from now on.
Second, the projection p = L ·T is precomputed.
Third, for each L a DOP has two vertices vminA and
vmaxA whose projections onto L have maximum dis-
tance. Each of those vertices is formed by the inter-
section of three faces of the DOP. The correspondences
( jA,0, jA,1, jA,2) of the orientations whose faces meet in
vminA are calculated.
Fourth, and most importantly, in the actual projection
amin = v
min
A ·L
=
(
a jA,0 a jA,1 a jA,2
) · (A jA,0 A jA,1 A jA,2
)−1 ·L
we can precompute the last dot product
PA :=
(
A jA,0 A jA,1 A jA,2
)−1 ·L (2)
PB can be precomputed analogously. The mapping
vectors for vmaxA and vmaxB are −PA and −PB respec-
tively. This exploits that k/2 pairs of DOP orientations
are anti-parallel. Note that this is an estimate to the
correct solution, since not all possible combinations of
DOP-coefficients share all maximum vertices. But it is
impossible for any vertex made-up of the intersection
of three faces to be inside the DOP, hence only false
positives can result.
4.2 Intersection Testing
Using these precomputations, we can project onto the
test axes very efficiently:
amin =
(
a jA,0 a jA,1 a jA,2
) ·PA
amax =
(
a jA,0+k/2 a jA,1+k/2 a jA,2+k/2
) · (−PA)
(3)
origin A
a
j1
a'
j1
a
j0
a'
j0
a'max
a'min
e
rro
r
c
a
u
se
d
b
y
ro
u
n
d
in
g
PA
projection by P
projection by P'
fix
e
d
-p
o
in
t
g
rid
fixed-point DOP
original DOP
Figure 2: A DOP and its enclosing fixed-point equiva-
lent. Both rounding the DOP to fixed-point numbers
and projecting it with P′ instead of P increases the
DOP’s image. When checked for intersection false
positives can occur.
This is done for bmin and bmax analogously.
The condition for separation is straight-forward now.
Let
diff1 := (amin + p)−bmax
diff2 := bmin− (amax + p)
(4)
diff := max(diff1,diff2) (5)
then the intervals [amin,amax] and [bmin,bmax] are dis-
joint if and only if diff > 0. And from the Separating
Axis Theorem we know that
(diff > 0)⇒ separation. (6)
Eqs. (3)–(6) show the computations that need to be done
for each DOP test (and hence cannot be precomputed).
4.3 Fixed-Point Arithmetic
In hardware floating-point arithmetic is very expensive
with respect to circuit size and depth. Unfortunately,
simply rounding DOP coefficients to fixed-point num-
bers would result in false negatives, because the inter-
vals on the test axes could become smaller than the pro-
jection of the enclosed object. These false negatives are
inacceptable, because we might miss collisions. Naïve
rounding of the mapping vectors PA and PB would lead
to even more false negatives since distance of the im-
ages could be overestimated. Hence we need to round
in a manner such that each fixed-point DOP image con-
tains the according floating-point image (see Fig. 2).
First, we need to handle the smaller scale of fixed-
point numbers by dividing all DOP coefficients of all
DOPs by the largest absolute value of the DOP coeffi-
cients in the scenario. This way, 16 bit accuracy still
allows for having DOPs the size of a skyscraper and of
Journal of WSCG 19 ISBN 1213-6972  ISBN 80-86943-09-7
a 6mm screw. 36 bit even allow for DOPs the size of
the sun and of a screw.
Let rounding of the DOP coefficients to b bits after
the point towards +∞ be denoted by a′i = daie. Clearly,
the rounded (i.e., fixed-point) DOP contains the original
one. Then, εi = a′i − ai is the resulting rounding error,
with 0 ≤ εi < 2−b.
By ensuring that the dihedral angle between all pairs
of neighboring faces of a DOP is larger than pi/2, all
PA,i are in the interval [−1,0] [7].2
Rounding PA,i towards −∞ to c bit accuracy results
in a rounding error 0 ≤ ηi = PA,i−P′A,i < 2−c.
By simply truncating PA,i, the resulting image would
become too small in case of negative DOP coefficients,
whereas always rounding up would create the same
problem with positive coefficients. Fortunately, we can
solve this during calculation simply by adding 2−c to
bPA,ic before multiplication with negative DOP coeffi-
cients.
Let a′ := (a′jA,0 ,a
′
jA,1 ,a
′
jA,2) and a
′
k := (a
′
jA,0+k/2,
a′jA,1+k/2,a
′
jA,2+k/2). Let sn(x) be the sum of all xi < 0.
Then, correct rounding of the images amounts to:
a′min = P′A ·a′+2−c sn(a′)
a′max =−(P′A ·a′k +2−c sn(a′k))
(7)
Finally, when computing diff1, we can simply trun-
cate p to z bits (p′ = bpc). This can create only false
positives, because a smaller p′ only decreases the ap-
parent distance between the two DOP images. For diff2
we need to round p up to dpe, which, again, can be done
efficiently by adding 2−z to bpc.
Overall, calculating the distances of the fixed-point
DOP images amounts to
diff′1 = (a′min + p′)−b′max
diff′2 = b′min− (a′max +(p′+2−z))
(8)
diff′ = max(diff′1,diff′2) (9)
Now the condition for separation can be given analo-
gously to Eq. 6:
((diff′1 > 0) or (diff′2 > 0))⇒ separation. (10)
Simulations done early in the design process showed
that fixed-point accuracy influences calculation time
(Fig. 3). Below 18 bits accuracy, an increasing num-
ber of false positives occurs compared to the floating-
point implementation and decreases calculation speed.
Above 18 bits, a second memory burst is needed to
fetch DOP coefficients from DDR-RAM.
2 This is no hard restriction since every well-constructed DOP should
not have acute angles to improve tightness of fit (even for oblong
objects in random orientation).
4.4 Bound on Fixed-Point Deviation
In this section we will derive a bound on the deviation
of the fixed-point image from the mathematically cor-
rect image. Let err denote this deviation (called fixed-
point error in the following)
err := diff−diff′ (11)
Since diff is defined as max(diff1,diff2) (and diff′
analogously) we know that
err1 := diff1−diff′1
err2 := diff2−diff′2
(12)
min(err1,err2)≤ err ≤ max(err1,err2) (13)
Inserting Eqs. (3)–(5) and (7)–(9) into Eq. (12) yields
err1 =(PA ·a−P′A ·a′)−2−c · sn(a′)
+(PB ·bk −P′B ·b′k)−2−c · sn(b′k)
+(p− p′)
(14)
and err2 can be calculated analogously.
To calculate bounds on err1 and err2 we need to
bound the errors caused by products of mapping vec-
tors and DOP coefficients. Since this is all very similar,
we show how it is done, for example, for PA ·a−P′A ·a′.
PA ·a−P′A ·a′
=(PA−P′A) ·a′+PA · (a−a′)
=
2
∑
i=0
(PA,i−P′A,i) ·a′jA,i +
2
∑
i=0
PA,i · (a jA,i −a′jA,i)
=
2
∑
i=0
a′jA,i≥0
(PA,i−P′A,i) ·a′jA,i +
2
∑
i=0
a′jA,i <0
(PA,i−P′A,i) ·a′jA,i
+
2
∑
i=0
PA,i · (a jA,i −a′jA,i)
(15)
The cross sum of any mapping vector can be inter-
preted as the image of a vertex of the maximum DOP
(all di = 1). Assume rmax to be the greatest distance
and rmin = 1 to be the smallest distance of a vertex of
the maximum DOP to the origin. Then −rmax is a lower
bound and −rmin is an upper bound for the cross sum of
any mapping vector P.
Let sp(x) be the sum of all xi ≥ 0 analogously to
sn(x), the sum of all xi < 0.
With the known boundaries
-1 ≤ a′i ≤ 1 -2-b ≤ ai-a′i ≤ 0
-1 ≤ PA,i≤ 0 0 ≤ PA,i-P′A,i≤ 2-c
0 ≤ p-p′ ≤ 2-z
Journal of WSCG 20 ISBN 1213-6972  ISBN 80-86943-09-7
we can bound all summands of Eq. (15):
0 · sp(a′)≤
2
∑
i=0
a′jA,i≥0
(PA,i−P′A,i) ·a′jA,i ≤ 2−c · sp(a′)
2−c · sn(a′)≤
2
∑
i=0
a′jA,i <0
(PA,i−P′A,i) ·a′jA,i ≤ 0 · sn(a′)
−rmin ·0 ≤
2
∑
i=0
PA,i · (a jA,i −a′jA,i) ≤−rmax · (−2−b)
Along with Eq. (15) this amounts to
2-c · sn(a′)≤ PA ·a−P′A ·a′ ≤ 2-c · sp(a′)+ rmax ·2-b
(16)
Inserting Eq. (16) into Eq. (14) yields
err1 ≤2−c · sp(a′)+ rmax ·2−b−2−c sn(a′)
+2−c · sp(b′k)+ rmax ·2−b−2−c sn(b′k)+2−z
(17)
and
err1 ≥2−c · sn(a′)−2−c sn(a′)
+2−c · sn(b′k)−2−c sn(b′k)+0 = 0
(18)
Calculating bounds on err2 can be done analogously
and results in
err2 ≤2−c · sp(b′)+ rmax ·2−b−2−c sn(b′)
+2−c · sp(a′k)+ rmax ·2−b−2−c sn(a′k)−0+2−z
(19)
and
err2 ≥2−c · sn(b′)−2−c sn(b′)
+2−c · sn(a′k)−2−c sn(a′k)−2−z +2−z = 0
(20)
Since we ensured that the dihedral angles between all
pairs of neighboring faces exceed pi/2, rmax is bounded
by
√
3 [7]. Combining this with Eqs. (17)–(20) and
inserting the result in Eq. (14) yields the overall result
0 ≤ err ≤
√
3 ·2−b+1 +6 ·2−c +2−z (21)
This gives a bound on the deviation of the image size
of a fixed-point DOP with respect to the exact image.
Additionally, it formally proves that no false negatives
can occur.
5 THE ARCHITECTURE
5.1 The Pipeline
Combining Eqs. (7)–(10) results in the overlap condi-
tion
P′A ·a′+2-c sn(a′)+P′B ·b′k +2-c sn(b′k)+ p′ > 0
or
P′B ·b′+2-c sn(b′)+P′A ·a′k +2-c sn(a′k)-(p′+2-z) > 0
⇒ separation
(22)
Eq. (22) is divided into seven stages to enable pipelin-
ing.
 0
 5
 10
 15
 20
 25
 30
 8  12  16  20  24  28  32  36  40  44
ti
m
e
 (
m
s
e
c
)
fixed point precision (bits)
32bit float
fixed point
Figure 3: Speed of fixed-point arithmetic for different
bit widths. Beyond 18 bits a second, and beyond 40
bits a third memory burst is needed.
Selection. Stage one selects the 12 out of k DOP coeffi-
cients defining the outer (maximal) vertices for a given
candidate separating axis based on the correspondences
( jA, jB).3 Correspondences jA and jB contain the indices
of 6 of them. The indices of the 6 remaining ones can be
derived by simply increasing these indices each by k/2.
Since we assume wrap around indexing here, this does
not need any combinational hardware, but can be done
by simply feeding the coefficients into the multiplexers
in modified order.
Scalar Products and Fixed-Point Correction. Stages
two to five implement the calculation of the scalar prod-
ucts and the fixed-point correction term. So, DOP
coefficients have to be multiplied by P′-vector entries
and summed up by an adder tree. Additionally, p′
(−(p′ + 2−z) in case of diff′2) is added. Concurrently,
negative DOP coefficients are selected and accumu-
lated. Stage six adds the results of both summations.
Multiplying by 2−c is done implicitly by shifting.
Result. Testing max(diff′1,diff′2) > 0 is done by negat-
ing the conjunction of the sign bits.
5.2 Overall Design
The overall architecture is shown in Fig. 4. The cal-
culation is initialized by the host system by sending
(P′A,P′B, p′, jA, jB) and the addresses of the DOP-trees
to the hardware. A controller keeps track of DOP over-
lap tests that must still be executed and requests the
needed DOP coefficients and triangle data. The mod-
ule "GetData" reads them from memory concurrently
to the current calculation. As soon as the parameters
are loaded and the last calculation is finished, it feeds
them into the pipeline (or the triangle-unit respectively).
The pipeline receives not only the DOP coefficients but
(from the controller) the data for the next axis test.
3 There are 2 k-DOPs, 2 maximal vertices per DOP, and 3 coefficients
defining each vertex (see Section 4.1).
Journal of WSCG 21 ISBN 1213-6972  ISBN 80-86943-09-7
Pipeline
ControllerBV
-Sta
ck
PipeData
(addresses,last)
BV-
control
GetData
a b
Host FPGA
Axis-
control
Tria
n
g
le
-U
n
it
bnewanew
DDR-RAM
A
P
I
control
Iast
Iast
addresses
addresses
result
(separation
onaxis)
BV-para-
meterstest axis
Triangle-data
test
axes
Triangle-
intersections
control
Figure 4: The complete intersection test hardware.
For each DOP pair, n axes are tested. A shift regis-
ter ("PipeData") holds additional bookkeeping informa-
tion. For every pipeline stage it contains the indices of
the processed DOPs and whether the contained calcu-
lation is the last axis test to be executed for the current
DOP pair. If this last axis test leaves the pipeline and
none of the test axes is a separating axis the controller
schedules the child DOPs to be tested. If a separating
axis is found, the remaining calculations belonging to
the same DOP-pair are obsolete. No new axis tests are
initiated and the results of the calculations that are still
in the pipeline will be ignored; no new DOP tests are
scheduled.
[13] showed that scheduling DOP tests in a stack is
far superior to queue control with regards to memory
usage. So, as soon as the stack, pipeline, and the Get-
Data module are empty, and no intersecting triangles
were found, the objects do not intersect and this is re-
ported to the host application. On the other hand, every
intersecting pair of triangles is reported to the host im-
mediately.
To check triangles for intersection we utilize the same
algorithm that was already proposed in [16] and imple-
mented in VHDL in [13]. It transforms both triangles
so that one of them becomes the “unit” triangle. That
way, the checks to be performed on the other triangle
become very simple and standardized.
5.3 Control
As mentioned in Section 3.2, it is not necessary to test
all axes Li whether they are separating axes. Even
more, [14] has shown that it is not efficient to test all
axes for OBBs since the probability of the BV to be dis-
joint decreases rapidly with every non-separating test
axis found so far. Fig. 5 shows that this applies for
DOPs, too.
On the other hand, we want to eliminate disjoint
branches of the DOP trees as early as possible to re-
duce expensive loading of DOP coefficients. Therefore,
 1e-06
 1e-05
 1e-04
 0.001
 0.01
 0.1
 1
 1  10  100  1000
s
e
p
a
ra
ti
o
n
s
/B
V
-t
e
s
t
test axes count
24
Figure 5: The more axes are tested for intersection the
less probable it is for other axes to be separating.
 0
 5
 10
 15
 20
 25
 30
 0  4  8  12  16  20  24  28  32  36  40  44  48  52  56  60
ti
m
e
 (
m
s
e
c
)
number of test axes (n)
Figure 6: For fixed k = 24, our design performs best
using n = 24 on the target architecture.
we determined which n ≤ N gives the best trade-off be-
tween axis-testing and parameter-loading. As Fig. 6 in-
dicates, n = 24 yields the optimum performance for 24-
DOPs and the given memory architecture. 24 axis-tests
suffice to test all candidate separating axes generated
from the 12 face-orientations of each DOP. Although
this exceeds the time to load a complete set of DOP-
coefficients (only 20 clock-cycles) by 4 cycles, testing
24 axes seems to reduce the number of false positives
enough to yield a performance gain.
Still, there is no reason to stop testing axes if the next
DOP-pair is not completely loaded yet. This can hap-
pen, for instance, if the memory subsystem is occupied
by triangle data. As shown in Fig. 7 continued axes
testing until the next set of DOP-coefficients is fetched
from memory speeds-up calculation.
6 RESULTS
The target architecture is a Xilinx Virtex II (XC 2V6000,
speed grade -4) on an Alpha Data ADM-XRC-II board
with 256 MB DDR-RAM at 100MHz connected via a
64 bit wide bus. The FPGA features 144 18-bit multi-
pliers and 6 million gate equivalents. CoCentric from
Journal of WSCG 22 ISBN 1213-6972  ISBN 80-86943-09-7
 0
 10
 20
 30
 40
 50
 0.6  0.7  0.8  0.9  1  1.1  1.2  1.3  1.4  1.5  1.6
ti
m
e
 (
m
s
e
c
)
distance
FPGA-accelerated (n = 24)
FPGA-accelerated (n >= 24)
Figure 7: Testing further axes until next DOP-pair is
loaded yields a speed-up.
Synopsys was used to compile SystemC RTL to VHDL
code. Synthesis, Place, Route and Mapping were done
with Xilinx ISE 6.3.
6.1 Synthesis Results
Although 19-bit accuracy performs best on our test data
with respect to calculation time (Fig. 3), we decided
to implement the pipeline for 35 bits fixed-point 24-
DOPs to tolerate bigger differences in DOP size (see
Section 4.3). Since the target architecture features 18-
bit multipliers only, this results in two extra pipeline
stages to implement 35-bit pipelined multipliers.
Overall, the pipeline utilizes a total of 7278 out of
33792 slices (21% = 1,260,000 million gate equiva-
lents). Maximum clock frequency is 111.117MHz.
6.2 Benchmarking
All results presented here were obtained with two iden-
tical objects (a car headlight) with 5947 triangles [13].
They are placed at different distances from each other
and with different rotations. For each constellation, the
time to detect all intersecting triangles is determined.
Fig. 8 shows the comparison of our new architecture
with a state-of-the-art software intersection test running
on a 1 GHz Pentium III with 512 MByte main mem-
ory. Memory bandwidth and speed are identical on both
systems and hence allow for a direct performance com-
parison. The presented acceleration hardware yields a
speed up of about factor 4.
7 CONCLUSION AND FUTURE WORK
We have presented a novel algorithm for hierarchi-
cal collision detection of pairs of virtual objects. We
have also presented a highly space-efficient, FPGA-
optimized architecture implementing this algorithm on
an FPGA using fixed-point arithmetic. The fixed-point
calculations do not produce any false negatives, and we
have given bounds on the deviations from floating-point
arithmetic.
 0
 10
 20
 30
 40
 50
 60
 70
 80
 90
 100
 110
 120
 130
 140
 150
 160
 0.6  0.7  0.8  0.9  1  1.1  1.2  1.3  1.4  1.5  1.6
ti
m
e
 (
m
s
e
c
)
distance
software
FPGA-accelerated
Figure 8: The presented architecture is approximately
4 times faster than a state-of-the-art software intersec-
tion test.
Simulation results for collision queries using this ar-
chitecture proved that a speed-up of 4 compared to
state-of-the-art software intersection tests on a standard
CPU can be obtained. Taking earlier ASIC-targeted re-
sults into account [13], we conclude that an ASIC im-
plementation of our novel algorithm and architecture
will perform by one or two orders of magnitude faster
than a software implementation and even the FPGA-
implementation. Synthesis results proved the design to
be highly space-efficient.
In addition, our novel DOP overlap test algorithm
lends itself well to parallelization. Only a slight mod-
ification of the controller is necessary to use multiple
pipelines to test multiple candidate separating axes in
parallel. In conjunction with its low area consumption,
this allows for an easy implementation of a highly par-
allelized architecture with an expected speed-up linear
to the number of pipelines.
Here, memory bandwidth becomes the limiting factor
for speed of collision queries. Possible solutions could
be compression of DOP coefficients and the introduc-
tion of a cache.
Another important topic is fixed-point accuracy. Here,
a lot of different ways to get smaller projections are
conceivable.
Collision detection of deformable objects is another
important issue. It remains an open problem, which al-
gorithms and data structures are best suited for hard-
ware implementation. Furthermore, we will evalu-
ate different kinds of primitives like quadrangles and
NURBS.
REFERENCES
[1] Ageia. White paper, May 2005. http:
//www.ageia.com/pdf/wp_2005_3_
physics_gameplay.pdf.
[2] George Baciu, Wingo Sai-Keung Wong, and Han-
qiu Sun. RECODE: an image-based collision de-
Journal of WSCG 23 ISBN 1213-6972  ISBN 80-86943-09-7
tection algorithm. The Journal of Visualization
and Computer Animation, 10(4):181–192, Octo-
ber - December 1999. ISSN 1049-8907.
[3] Jens Eckstein and Elmar Schömer. Dynamic Col-
lision Detection in Virtual Reality Applications.
In Proc. The 7-th Int’l Conf. in Central Europe on
Comp. Graphics, Vis. and Interactive Digital Me-
dia ’99 (WSCG’99), pages 71–78, Plzen, Czech
Republic, February 1999. University of West Bo-
hemia.
[4] Stefan Gottschalk, Ming Lin, and Dinesh
Manocha. OBB-Tree: A Hierarchical Struc-
ture for Rapid Interference Detection. In SIG-
GRAPH 96 Conference Proceedings, Holly Rush-
meier, Ed., pages 171–180. ACM SIGGRAPH,
Addison Wesley, August 1996. held in New Or-
leans, Louisiana, 04-09 August 1996.
[5] Naga K. Govindaraju, Stephane Redon, Ming C.
Lin, and Dinesh Manocha. CULLIDE: Interac-
tive Collision Detection Between Complex Mod-
els in Large Environments using Graphics Hard-
ware. In Graphics Hardware 2003, pages 25–32,
July 2003.
[6] Alexander Gress and Gabriel Zachmann. Object-
Space Interference Detection on Programmable
Graphics Hardware. In SIAM Conf. on Geomet-
ric Design and Computing, M. L. Lucian and
M. Neamtu, Eds., pages 311–328, Seattle, Wash-
ington, November13–17 2003. Nashboro Press.
[7] Stefan Hochgürtel, Andreas Raabe, Gabriel Zach-
mann, and Joachim K. Anlauf. Collision Detec-
tion for k-DOPs using SAT with Error Bounded
Fixed-Point Arithmetic. Tech. rep., University
of Bonn, September 2005. http://www.
collisionchip.de.
[8] P. M. Hubbard. Collision detection for interactive
graphics applications. IEEE Transactions on Visu-
alization and Computer Graphics, 1(3):218–230,
September 1995. ISSN 1077-2626.
[9] Jan Klein and Gabriel Zachmann. The Ex-
pected Running Time of Hierarchical Colli-
sion Detection. In SIGGRAPH 2005, Poster,
Los Angeles, August 2005. http://www.
gabrielzachmann.org/.
[10] Dave Knott and Dinesh K. Pai. CInDeR: Colli-
sion and Interference Detection in Real-Time Us-
ing Graphics Hardware. In Proc. of Graphics In-
terface, Halifax, Nova Scotia,Canada, June11–13
2003.
[11] Karol Myszkowski, Oleg G. Okunev, and
Tosiyasu L. Kunii. Fast collision detection be-
tween complex solids using rasterizing graphics
hardware. The Visual Computer, 11(9):497–512,
1995. ISSN 0178-2789.
[12] Andreas Raabe, Blazej Bartyzel, Joachim K. An-
lauf, and Gabriel Zachmann. Hardware Accel-
erated Collision Detection — An Architecture
and Simulation Results. In Design Automation
and Test (DATE), Munich, Germany, March7–
11 2005. http://www.gabrielzachmann.
org/.
[13] Andreas Raabe, Blazej Bartyzel, Joachim K. An-
lauf, and Gabriel Zachmann. Hardware Accel-
erated Collision Detection — An Architecture
and Simulation Results. In Design Automation
and Test (DATE), Munich, Germany, March7–11
2005. http://www.collisionchip.de.
[14] Gino Johannes Apolonia van den Bergen. Colli-
sion Detection in Interactive 3D Computer Ani-
mation. PhD dissertation, Eindhoven University
of Technology, 1999.
[15] Gabriel Zachmann. Rapid Collision Detection
by Dynamically Aligned DOP-Trees. In Proc. of
IEEE Virtual Reality Annual International Sympo-
sium; VRAIS ’98, pages 90–97, Atlanta, Georgia,
March 1998.
[16] Gabriel Zachmann and Günter Knittel. An
Architecture for Hierarchical Collision Detec-
tion. In Journal of WSCG ’2003, pages 149–
156, University of West Bohemia, Plzen, Czech
Republic, February3–7 2003. http://www.
gabrielzachmann.org/.
[17] Gabriel Zachmann and Günter Knittel. High-
Performance Collision Detection Hardware. Tech.
Rep. CG-2003-3, University Bonn, Informatikk
II, Bonn, Germany, August 2003. http://
www.gabrielzachmann.org/.
Journal of WSCG 24 ISBN 1213-6972  ISBN 80-86943-09-7
