A Secure Protocol for Computing String Distance Metrics
- Publication date
- Publisher
Abstract
An important problem is that of finding matching pairs of records from heterogeneous databases, while maintaining privacy of the database parties. As we have shown in earlier work, distance metrics are a useful tool for record-linkage in many domains, and thus secure computation of distance metrics is quite important for secure record-linkage. In this paper, we consider the computation of a number of distance metrics in a secure multiparty setting. Towards this goal, we propose a stochastic scalar product protocol that is provably consistent, and is also as secure as an underlying set-intersection cryptographic protocol. We then use our stochastic dot product protocol to perform secure computation of some standard distance metrics like TFIDF, SoftTFIDF and the Euclidean Distance Metric. Not only are they asymptotically consistent, but experiments show that the stochastic estimates are also quite close to the true values after just 1000 samples. These secure distance computations can then be used to perform secure matching of records