Search engines are a fundamental building block of the web. Be they general purpose web search engines,
product search engines for online catalogues or people search in online networks, search engines provide
easy access to a huge amount of information. To cope with large amounts of information, search engines
use many distributed servers to perform their functionality.
For instance, to search the web quickly, search engines partition the web index over many machines,
and consult every partition when answering a query. To increase throughput, replicas are added for each
of these machines. The key parameter of these search algorithms is the trade-off between replication
and partitioning: increasing the partitioning level typically improves query completion time since more
servers handle the query. However, partitioning too much also has drawbacks: startup costs for each
sub-query are not negligible, and will decrease total throughput. Finding the right operating point and
adapting to it can significantly improve performance and reduce costs.
In this thesis we propose that the tradeoff between partitioning and replication should be easily
configurable. To this end we introduce Rendezvous On a Ring (ROAR), a novel distributed algorithm
that enables on-the-fly re-configuration of the partitioning level. ROAR can add and remove servers
without stopping the system, cope with server failures, and provide good load-balancing even with a
heterogeneous server pool.
We experimentally show that it is possible to dynamically adjust the partitioning level to cope with
different loads while meeting target query delays, and in doing so the system can reduce its power
consumption significantly.
To test ROAR we introduce Privacy Preserving Search: a particular search application that allows
users to store encrypted data online while being able to easily search that data. Our contributions include
novel protocols that allow PPS for numeric values, as well as a proof of concept implementation of PPS
running on top of ROAR and allowing users to match as many as 5 million files in well under 1s