Putting Queens in Carry Chains by Preußer, Thomas B. et al.
Putting Queens in Carry Chains
Thomas B. Preußer, Bernd Nägel
Rainer G. Spallek
Institut für Technische Informatik
TUD-FI09-03 März 2009
Technische Berichte
Technical Reports
ISSN 1430-211X
Fakultät Informatik
Technische Universität Dresden
Fakultät Informatik
D−01062 Dresden
Germany
URL: http://www.inf.tu−dresden.de/
Putting Queens in Carry Chains0
Thomas B. Preußer, Bernd Nägel, Rainer G. Spallek
Department of Computer Science, Technische Universität Dresden, Germany
{thomas.preusser,rainer.spallek}@tu-dresden.de
bernd.naegel@mailbox.tu-dresden.de
Dresden, March 2009
This paper describes an FPGA implementation of a solution-counting solver for
the N -Queens Puzzle. The proposed algorithmic mapping utilizes the fast carry-
chain logic found on modern FPGA architectures in order to achieve a regular
and efficient design. From an initial full chessboard mapping, several optimiza-
tion strategies are explored. Also, the infrastructure is described, which we have
constructed for the computation of the currently unknown solution count of the 26-
Queens Puzzle. Finally, we compare the performance of our used concrete FPGA
device mappings also in contrast to general-purpose CPUs.
1 Introduction
The 8-Queens Puzzle asks for the placement of 8 queens onto a chessboard in such a way that
none of them is able to capture any other within a single move. This puzzle is known to have
92 solutions. These can be partitioned into 12 fundamental equivalence classes whose members
are mutually convertible by board reflection and rotation. The N -Queens Puzzle establishes
a generalization to the original problem instance by requiring N non-attacking queens to be
accommodated on a N ×N board. Several questions may be asked concerning this basic setup:
• Are there generic solution templates valid for all board sizes?
• How can a non-trivial valid solution be found quickly?
• What is the overall number of (fundamental) solutions for a given board size?
This paper targets the last of these questions. While no tight lower bound on its computational
complexity is known, all available approaches to its solution operate on the order of an over-
exponential brute-force search calculating and counting every single solution. This effort has
been undertaken for all N up to 25 by various projects [1, Seq. A000170]. While modern
standard desktop processors finish problem sizes up to N = 20 within minutes, the initial
calculation for N = 25 already required a sequential computing time of slightly over 53 years
performed by a grid calculating for more than six months [2].
0Parts of this work were presented at and included in the informal proceedings of the 3rd HiPEAC Workshop
on Reconfigurable Computing, Paphos, Cyprus, January 25, 2009.
1
2 Preußer, Nägel, Spallek
The grid computation thrives on the fact that the N -Queens Puzzle is an embarrassingly
parallel problem. This means that it is easily partitionable into independent subproblems
so that parallelization comes at virtually no costs while achieving close-to-perfect speedups.
The key idea is to compute the valid placements for subboards comprising only a few initial
columns before having independent solver instances calculate and count their individual valid
completions. The overall total is then trivially obtained by the summation of the subtotals.
While this paper does not attack the fundamental over-exponential computational com-
plexity of current algorithmic approaches, we describe an implementation tailored to FPGA-
specific arithmetic logic structures. This implementation is the base of a massively-parallel
solver infrastructure, with which we set out to determine the still unknown solution count for
N = 26. Our approach challenges the NQueens@Home project [3], which follows a world-wide
distributed computing approach utilizing BOINC1 [4]. We hope to establish an example for
a problem where an appropriate FPGA implementation utilizing a few local devices can out-
perform the globally-distributed workforce of an immense network computing infrastructure.
As an intermediate achievement of our effort, we were able to identify and report a critical
counter overflow in the solver implementation used by NQueens@Home. This did not only
prevent them from more pointless computation but also resolved a pending dispute over the
solution count for N = 24 [5].
After introducing the common backtracking solver approach in Sec. 2, this paper presents
its FPGA-specific adaptation in Sec. 3, which also details the exploitation of fast carry-chain
structures. Sec. 4 then evaluates some approaches to the performance increase of the basic
design. The resulting improved implementation is currently employed pursuing the solution
of the 26-Queens Puzzle. The used infrastructure and a current snapshot of the calculating
devices and their performances are presented in Sec. 5 before Sec. 6 concludes the paper.
2 Solver Approach
There is currently no faster approach to the determination of the solution count of the N -
Queens Puzzle than the backtracking search for valid queen placements. Keeping in mind that
rows and columns as well as directions are interchangeable, we will assume an implementation
that fills the board from the left to the right column and each column from the bottom up.
Advancing to a new column, a new queen is successively placed onto each of the fields that are
not attacked by queens already present to the left. Each such successful placement unfolds a
subproblem, which is explored in full through the placement in the remaining columns before it
is advanced to the next unexplored legal field. When no such one remains, control is returned
to the preceding column if it exists; otherwise the search is finished. The desired solution count
is incremented each time a legal queen placement in the last column completes a placement of
all N queens.
Fast implementations of this search derive the fields available for the placement of a queen
from calculated blocking vectors that describe the attacked fields. This avoids the rather ex-
pensive validation of speculative placements in all of the fields of a column. The necessary
information about occupied rows and diagonals can be conveniently kept within three bit
vectors for horizontal, diagonally upwards and diagonally downwards blocking. Setting the
position of the queen placed in the current column and shifting the diagonal vectors appropri-
ately yields the blocking information for the next column. A C reference implementation of
this approach is provided by Jeff Somers [6].
1Berkeley Open Infrastructure for Network Computing
Putting Queens in Carry Chains 3
Solution
Start
Done
0
0
0
0
0
0
0
0 0 0 0 00
Backtracking Logic x+1clr
xclr q x,N
put x
put x−1
xclr
put xq
x,0
C1
1D CS
0
1
C
Carry Chain Logic
1D BH
1D BD
C1
EN
C1
EN
1CLR
1D
C1
EN
QN
1D BU
C1
EN
q
x,y+1 put x
xclr
BD
x+1,y−1
BH
x+1,y
BU
x+1,y+1
q
x,y
BH
x,y
BU
x,y
BD
x,y
s
0 1
b
C
ar
ry
 C
h
ai
n
 L
o
g
ic
C
Figure 1: Adaptation of the N -Queens Problem in Hardware
3 Hardware Adaptation
The N -Queens Problem can be mapped nicely onto a solver implemented in hardware. While
our implementation, in principle, also follows the iterative search approach, it extensively
exploits opportunities for parallelism that are not available on general-purpose CPUs. On
one hand, fine-grained parallelism allows a new queen placement and its implied changes of
the blocking vectors to be completed within a single clock cycle. On the other hand, coarse-
grain parallelism is applied by stuffing a single FPGA device with many more subproblem
solvers than there are computation cores within current general-purpose CPUs. Both of these
measures are vital as to make up for the clock frequency gap between the FPGA and the CPUs
executing software implementations.
Our initial adaptation of the N -Queens problem to an FPGA fabric is depicted in Fig. 1. The
board is represented by a quadratic matrix of identical computation cells, each representing
one of its fields. The state of each cell comprises its position of the three local blocking vectors
(BH - horizontal, BU - diagonally up, BD - diagonally down) to be fed into the succeeding
column as well as the current placement of a queen. All of this state may be dropped for the
4 Preußer, Nägel, Spallek
0
1
0
1
s
s
LUT
LUT
s
Stratix II
Ripple Carry Adder
Designated
LUT
cin
cout
cout
cin
cout
0 cout
1
cin
0 cin
1clab
s0
s1
Spartan / Virtex
Manchester Carry Chain
LUT
+
LUT MUXCY
0 1
p
g
Stratix
Carry Select Chain
0 01 1
LUT
LUT
Figure 2: Simplified Carry-Chain Structures of Sample FPGA Devices
final rightmost column since (a) no blocking information needs to be forwarded to a succeeding
neighbor, and (b) there will never be a second valid queen placement within this column even
after a successful placement has completed a solution. Successful placements in this column
trigger the increment of the solution counter.
All the cells of a column are connected through a fast signaling chain that serves the calcu-
lation of the next free field within the column. For that purpose, the active column is fed a
token from the one-hot column selector (CS) at the bottom. It is forwarded by all cells that
have any of the blocking signals set but it is consumed by the first free cell, which so forces an
according queen placement. If the token leaves the column at the top, there is no unexplored
legal placement remaining within the column. In these cases, the backtracking logic allows the
closest left neighbor column with a free cell to move up its already placed queen. All columns
with no remaining free cell are passed and their last queen placements are cleared to enable a
clean placement calculation when it becomes activated again. If the leftmost column can no
longer replace its queen the calculation terminates.
The token signalling chains used within our design utilize the fast carry-chain logic found in
all modern FPGAs. Their original purpose is the acceleration of simple implementations for
carry-rippling operations like additions and subtractions. The beneficial use of these structures
for other tasks such as token-based round-robin arbitration has been shown before [7]. In
our case, their employment circumvents the use of comparatively slow general-purpose signal
routing in favor of these fast direct interconnects. The speed advantage of these fast signal
routes is vital for the design as the token paths are its critical paths limiting the achievable
clock frequency.
As exemplified in Fig. 2 for some FPGA devices, the particularities of the available carry-
chain logic differ with vendor and device family2. While the libraries providing the imple-
mentations for standard arithmetic automatically utilize these logic structures, the mapping
of other custom-defined functions generally requires the manual instantiation of specific low-
level primitives. These primitives may be marking buffers without an own logic function as
the CARRY_SUM buffer provided by Altera libraries, or they may be explicit instantiations of
the hardware components found in the carry chain as the MUXCY provided by Xilinx libraries.
2For details, see the manufacturers’ online device handbooks and user guides [8–11].
Putting Queens in Carry Chains 5
While the first approach is the more general one as it is not bound to a particular carry-chain
structure, we made the experience that the constraints imposed by its implementation in Al-
tera’s synthesis tool chain are, at least, inconvenient. For example, the fan-in restrictions to
the buffer often force the designer to partition the driving logic manually as by the instantia-
tion of explicit LCELL primitives. An automatic partition, on the other hand, seems very well
feasible and could automatically take the peculiarities of specific target platforms (such as the
available LUT sizes) into account.
If no low-level primitives are available for carry-chain mapping or if these cannot be instan-
tiated reliably3, the designer may retreat to the use of standard binary word addition, which
enjoys thorough synthesis support. This does, however, require some effort for the derivation
of appropriate sum terms. We identified the following procedure as helpful:
1. Derivation of the Carry Propagation
For each position in the chain, identify the classic disjoint carry-propagation cases as
determined by the local inputs:
Case ci+1 Description
ki: Kill 0 never an outgoing carry
pi: Propagate ci outgoing carry exactly on incoming carry
gi: Generate 1 always an outgoing carry
As the complement to the other two cases, ki is rarely given explicitly.
Taking the token propagation within a board column for an example, we obtain:
pi = BHx,i + BUx,i + BDx,i
gi = QNx,i
2. Determination of Addends
In order to map the function onto an addition, find addends that induce the desired
carry propagation for each chain position:
Case ai bi
ki = 1 0 0
pi = 1
{
0
1
1
0
}
Choose Any
gi = 1 1 1
Although the distribution of the 0 and the 1 can be chosen arbitrarily for the propagating
case, this choice might impact the overall complexity of the functions obtained for ai and
bi.
We can make the following trivial choice for the example token propagation:
ai = gi + pi = QNx,i + BHx,i + BUx,i + BDx,i
bi = gi = QNx,i
Two extra bits copying CSx = qx,0 are appended at the right of the addends to feed
the incoming carry; an additional set of identical bits appended to the left makes the
outgoing carry appear in the added most significant sum bit.
3This was, indeed, the case with the instantiation of CARRY SUM buffers on Stratix II devices. Their designated
ripple-carry adders do no longer allow a straightforward mapping of the custom function, which apparently
establishes an obstacle for the Quartus II synthesis tool.
6 Preußer, Nägel, Spallek
 0
 0.2
 0.4
 0.6
 0.8
 1
 1.2
 0  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1
P
a
rt
ia
l 
S
o
lu
ti
o
n
s
 (
s
c
a
le
d
 t
o
 m
a
x
 =
 1
)
L/N
Problem Size N
7
11
15
19
Figure 3: Typical Development of Number of Partial Solutions
3. Fix up of Sum Bits
Finally, the originally intended function needs to be derived from the generated sum
bits and the local inputs. Note that a direct access to the intermediate carry signals is
not available when using the binary word addition. They are, however, always derivable
from the local pi inputs and si outputs as si = pi ⊕ ci.
For the placement to be calculated by the token propagation, observe that a queen is
placed if and only if the affected cell does not propagate an incoming carry (is not
blocked) and if a carry (or token), indeed, arrives. As the incoming carry is visible as
sum output in this case, we obtain:
s′i = pi · si
Although the fallback to word addition appears somewhat clumsy and might not yield optimal
results for the processing of local signals, it is, nonetheless, the most portable way to get one’s
hands onto the carry-chain structures.
4 Design Refinement
In the search for optimization potential, we studied the typical development of the number of
valid placements of L queens within the first L columns of the complete N × N board. As
exemplified in Fig. 34, they all display a peak around L = N − 4, which is flanked by both a
steep rise and a steep decline. The latter implies that many of the valid subboard placements
in this region already contain a hidden conflict that prevents their legal completion to the
whole board. An early recognition of these cases would prevent the search within hopeless
4Note that all plotted functions are discrete with L = 1, 2, 4, . . . , N . The connecting splines only serve the
optical perception.
Putting Queens in Carry Chains 7
subtrees and, thus, deliver a valuable shortcut for the computation. Note that any such early
conflict recognition is, in fact, a trade-off as the detection itself requires resources. While
a software implementation trades run time for run time, a hardware implementation would
typically invest additional logic to increase the computation speed.
In principle, all the knowledge required to identify a conflict is encoded within the available
blocking vectors. For the actual extraction of the conflict, we evaluated two models:
1. The projection of the current blocking vectors onto some later column. If this column is
already blocked totally, a completion of the board is impossible.
2. The projection of the current blocking vectors onto two later adjacent columns. As two
queens must be placed within them that do not attack each other, there must be two
free fields available that are not located within a single 2×2 square; otherwise the board
is again incompletable.
In both cases, the further exploration of the identified incompletable subboards can be can-
celed. This is achieved by the invalidation even of a possibly successful placement within the
column, which would be next to process.
Although we could achieve some speedup with these models, it was on a rather disappointing
level of 3-4 percent for each individual conflict model. Considering that more than 90% of the
partial solutions are stale at the peak, these conflict models are obviously fairly weak. Merely,
a little comfort is contributed by the fact that the conflicts detected by both models appear
to be mostly distinct as the concurrent application of both detectors yields a speedup that is
almost 90% of the plain sum of the savings. Consequently, an effective but also simple conflict
model to increase the computation speed is still needed.
The second vital performance parameter besides plain computation speed is concurrency,
which we then made our primary optimization goal. For this purpose, we retained only a single
active column, from which successful placements are shifted out into a field only consisting of
shift registers. These placements are restored when backtracking – now a single column per
cycle. We further merged all blocking information into three field-wide vectors, in which placed
queens are marked and the ones falling victim to the backtracking are cleared. Note that in
a software implementation, this computational restoration of blocking information would be
more expensive than the usual approach of retrieving them from a history stack (or array).
The obtained design was, indeed, significantly slower and took more than twice the time
for a solution. However, it was also nearly four times smaller, which more than compensated
the loss in speed. The smaller granularity of the single solvers further enabled a far better
device utilization. In addition to this, the critical paths shortened so that the supported clock
frequencies grew by more than 50%. Consequently, we adopted this version of the design to
approach N = 26.
5 Tackling N = 26
After thoroughly testing our solver architecture with problem sizes of N up to 22, we set out
to determine the currently unknown solution count for N = 26. The first essential decision
to make was the granularity of the subproblems, i.e. the number of pre-placed columns. On
the one hand, this choice would have to allow for an extremely high degree of parallelism so
as to cope with the required tremendous amount of computing effort but, on the other hand,
it would also need to ensure that the costs of the communication required to distribute the
subproblems and to collect their subtotals will not dominate the actual computation. Recalling
Fig. 3, the number of preplaced columns L should, therefore, be clearly kept below N
2
as to
avoid running into the unfavorable case explosion with many incompletable subboards.
8 Preußer, Nägel, Spallek
Host
Server
Host
Host
UART Solver #0 Solver #1
FPGA
FIFO
FIFO
DB
Figure 4: Infrastructure for Solving N = 26
We finally decided for N = 6, which safely meets this requirement. As this choice produces
subproblems that are solved within a minute timeframe clearly below an hour, an easy veri-
fication of individual results is possible. This timeframe also limits the costs of a solver node
failure as each completed subproblem establishes a natural checkpoint without requiring an
explicit checkpointing infrastructure. This is particularly important in our setup, which, for a
big part, utilizes the idle time of lab resources, which are regularly withdrawn from the active
solver pool.
The chosen problem partition yields 25204802 subboards to be explored. The number of
legal queen placements within the initial 6 columns is actually twice as high. We can, however,
apply a widely-adopted optimization, which exploits the fact that these subboards are pairwise
identical with respect to the reflection at the middle horizontal. Thus, the required computing
effort is effectively cut in half.
Our system infrastructure is depicted in Fig. 4. A central MySQL database holds the open
subproblems and collects their solution counts. It is exclusively accessed by a distribution
server, which supervises the handout and completion of subproblems in order to detect failed
solvers and to re-issue subproblems as required. The direct communication partners of this
server are local distributors running on the individual host computers. These distributors can
be instructed to fork C-implemented solvers onto the available CPU cores and to serve solver
FPGAs attached via a serial RS232 link. Also they keep track of the assigned subproblems
so as to allow a local re-issuing as a first retreat when attached solvers fail or, being more
likely, are withdrawn. The entire distribution backbone is implemented in Java. It utilizes the
Java Database Connectivity (JDBC) [12] for database access and the Java Native Interface
(JNI) [13] to incorporate the C solvers. Integrity protection is achieved by a cryptographic
hash on the server-host path and a CRC-9 checksum on the packets transfered over the serial
links to the FPGAs.
An essential measure taken to ensure a steady throughput is the integration of input and
output FIFO buffers at all nodes of the network including the FPGA nodes themselves. This
ensures that solvers do not have to run idle when potentially bursty communication requests
consume the available bandwidth. Thus, even when multiple solvers on a single FPGA device
finish within a short timeframe, they generally do not hinder each other in committing their
solutions and fetching new subproblems.
A selection of our FPGA devices performing the computation is given in Tab. 1. The per-
formance is normalized to the one of a single FPGA solver running at 100MHz (1 SE). For
your appreciation of the achieved performance, we also provide this measure for several current
general-purpose CPUs. Observe that a single CPU core is clearly faster than a single FPGA
solver slice. Its clock frequency greatly outpaces that of the FPGAs making more than up for
Putting Queens in Carry Chains 9
Table 1: Capacity and Performance of Selected Devices
Count Device Solvers Clock Performance
4× Spartan-3 XC3S1000 15 90 MHz 13.5 SE
1× Stratix EP1S80 74 120 MHz 88.8 SE
1× Stratix II EP2SGX90 78 160 MHz 124.8 SE
5× Virtex-5 XC5VLX50T 33 160 MHz 52.8 SE
AMD Quad-Core Phenom(tm) 9850, Steeping 3 4 2.5 GHz 8.0 SE
Intel Core2 Duo CPU, Stepping 11 2 2.7 GHz 3.6 SE
all the instruction and branching overhead. The strength of the FPGA devices is concurrency
so that the 15 solvers running on a Spartan-3 XC3S1000 clearly beat a quad-core GHz CPU.
An up-to-date device listing and the achieved results are constantly published on our project
website [14]. Our computation is currently well ahead of the NQueens@Home project – even
when neglecting their overflow issue already mentioned. Currently, we predict to complete our
computation by the end of 2009. We do, however, constantly strive for making more devices
available for our computation.
6 Conclusions
This paper proposed an FPGA-specific implementation for a backtracking solver of the N -
Queens Puzzle and detailed its mapping into the efficient arithmetic structures of modern
FPGAs. It further describes an infrastructure utilizing these solvers in order to determine the
solution count of the 26-Queens Puzzle and evaluates the performance of the specific device im-
plementations also in comparison to general-purpose GHz CPUs. Due to the achieved massive
concurrency, this system is likely to outpace the global computation of the NQueens@Home
project.
10 Preußer, Nägel, Spallek
References
[1] Neil Sloane et al., “On-line encyclopedia of integer sequences.”
[2] F. Letellier, “Objectweb proactive breaks world grid record,” August 2005,
http://www.theserverside.com/news/thread.tss?thread id=35629.
[3] “NQueens@Home project,” http://nqueens.ing.udec.cl/.
[4] D. P. Anderson, “BOINC: A system for public-resource computing and storage,” in GRID
’04: Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing.
Washington, DC, USA: IEEE Computer Society, 2004, pp. 4–10.
[5] T. B. Preußer et al., “NQueens@Home discussion forum: Dispute a result,”
http://nqueens.ing.udec.cl/forum thread.php?id=82.
[6] J. Somers, “The n queens problem – a study in optimization,” http://www.jsomers.com/
nqueen demo/nqueens.html.
[7] T. B. Preußer, M. Zabel, and R. G. Spallek, “About carries and tokens - re-using adder
circuits for arbitration,” in IEEE Workshop on Signal Processing Systems. IEEE Press,
2005.
[8] Stratix Device Handbook, Altera Corporation, June 2006.
[9] Stratix II Device Handbook, Altera Corporation, January 2008.
[10] Spartan-3 Generation FPGA User Guide, Xilinx Inc., June 2008.
[11] Virtex-5 FPGA User Guide, Xilinx Inc., September 2008.
[12] G. Hamilton, R. Cattell, and M. Fisher, JDBC Database Access with Java: A Tutorial
and Annotated Reference. Boston, MA, USA: Addison-Wesley Longman Publishing Co.,
Inc., 1997.
[13] S. Liang, Java Native Interface: Programmer’s Guide and Specification. Boston, MA,
USA: Addison-Wesley Longman Publishing Co., Inc., 1999.
[14] “Queens@TUD project,” http://queens.inf.tu-dresden.de.
Appendix A: Symmetry Considerations
Figure 5: Possible Board Symmetries
Although our design discovers all solutions of the N -Queens problem in a brute-force fashion,
the board symmetry may be exploited (a) to cut the search space in half and (b) to determine
the number of solutions that are unique even with respect to reflection and rotation.
The reduction of the search space is achieved through the observation that the reflection of
a valid solution about the center horizontal of the board also yields a valid solution. Both of
these solutions are distinct unless we have N = 1, in which case all the queens can and do sit
on the very reflection axis. The search can be easily restricted to only one of these solution by
restricting the valid placements of a queen within the first column to the bottom half. If there
is a center row, the subsearch with a queen placed into it applies this placement restriction to
the second column. All solutions discovered by this restricted search account for exactly two
distinct solutions.
For the derivation of the number of unique solutions, those, which are symmetric to them-
selves, play a special role. With the exception of the aforementioned reduction of the search
space, the implemented search will generally also discover all the images of a solution with
respect to reflection or rotation. These images will only not be covered by a distinct discovery
if the solution is self-symmetric.
As shown in Fig. 5, up to eight distinct solutions may be reflected or rotated images of
each other. The number of distinct images all representing a class of symmetric solutions is,
however, reduced if a solution displays self-symmetric properties.
Disregarding the trivial problem size of N = 1, no solution can ever be symmetric to itself
with respect to reflection: All four possible reflection axis are attacking paths so that no more
than a single queen can be placed directly upon it. Every queen not placed upon it would,
however, result in an image attacking the original, which disqualifies any such placement as a
valid solution.
Rotational symmetries are, however, very well possible. In particular, a solution, which is
symmetric to itself with respect to a 180◦ rotation, will only have four different appearances.
This can be verified in Fig. 5 where each image of the top row will cross-merge into one of
the bottom row. If a solution is even self-symmetric with respect to a 90◦ rotation, only two
distinct images remain, which are reflections of each other.
ii Preußer, Nägel, Spallek
Now, assume the solver keeps three separate counts:
S counts every completed solution;
W counts only those solutions, which have been found self-symmetric with respect to a 180◦
rotation; and
V counts only those solutions, which have been found self-symmetric with respect to a 90◦
rotation.
The counters are not incremented in a mutually exclusive fashion. Quite on the contrary, an
increment of V even implies an increment of W . Similarly, an increment of W implies one of
S. The converse statements clearly do not hold.
As observed before, a non-self-symmetric solution will be encountered in eight different
appearances all summed up in S and only in S. A solution self-symmetric with respect to
a 180◦ rotation contributes four increments to both S and W . The 90◦-symmetric solutions,
finally, cause only two increments but of all of the three counters.
In order to derive the number u of unique solutions with respect to any symmetry, call the
number of unique 90◦-symmetric solutions v, the number of unique 180◦-symmetric solutions
w, and the number of all other unique solutions s. Note, that the set of unique solutions is
partitioned disjunctly this time. In terms of the maintained counter values, we obtain:
A B
B’ A’
Figure 6: Quadrant Division for Symmetry Detection
S = 8s + 4w + 2v
W = 4w + 2v
V = 2v
Resolved to u = s+w+ v, this yields:
v = V
2
w = W−V
4
s = S−W
8
u = S+W+2V
8
The detection of self-symmetric properties of a found solution can be implemented by a
comparison of the board’s quadrants, which are obtained by its central horizontal and vertical
division as depicted in Fig. 6. While the equivalence of (AB) = (A′B′) establishes the 180◦
self-symmetry, an additional comparison of any two directly neighboring quadrants taking a
90◦ rotation into account serves to prove a 90◦ self-symmetry. While these comparisons can
be implemented as straightforward bitwise equivalence tests in hardware, their realizations on
word-oriented general-purpose processors appear comparatively clumsy.
If the board size N is odd, there will be a center row and a center column. Both of them
can be excluded from the quadrant comparison. Instead, it is simply necessary to verify that
the center field (lying on both the center row and column) hosts a queen. If it does not, the
queen in this row and column would have to be found on a different field outside the rotational
center. Then its images obtained from 90◦ or 180◦ rotation will be located on fields attacked
by the original. This, in turn, precludes any self-symmetric rotational symmetry.
Putting Queens in Carry Chains iii
Finally, consider boards with N = 4k + r with r ∈ {2, 3}. Its quadrants (excluding any
center row or column) have sides of length:
D =
⌊
N
2
⌋
= 2k + 1
On a completely filled board, the first D columns must also host D queens. If the board
was self-symmetric, none of these queens will be found in a potential center row according to
the argument of the previous paragraph. Thus, they must be distributed among the top and
bottom quadrants. Since D is odd, an even distribution and, hence, a 90◦ self-symmetry is
impossible. Thus, the checking for a 90◦ self-symmetry is wasted effort for half of the board
sizes, i.e. those N that are equivalent to 2 or 3 modulo 4.
Fortunately, the explicit symmetry detection of explored solutions can be safely neglected
when only the different solution counts are of interest. Instead, the self-symmetric solutions can
be explored in a separate independent exploration. Due to the additional constraints induced
by the selft-symmetry, such explorations are feasible for somewhat larger N . The corresponding
values W and V are known for N larger than 30 and 60, respectively [1, Seq.A032522], [1,
Seq.A033148]. Having then determined the number of all solutions S, the last missing count
of unique solutions u can be derived through the formula given above.
