An All-to-All Comparison problem is where every element of a data set is
compared with every other element. This is analogous to projective planes and
affine planes where every pair of points share a common line.
For large data sets, the comparison computations can be distributed across a
cluster of computers. All-to-All Comparison does not fit the highly successful
Map-Reduce pattern, so a new distributed computing framework is required. The
principal challenge is to distribute the data in such a way that computations
can be scheduled where the data already lies.
This paper uses projective planes, affine planes and balanced incomplete
block designs to design data distributions and schedule computations. The data
distributions based on these geometric and combinatorial structures achieve
minimal data replication whilst balancing the computational load across the
cluster