12,971 research outputs found
Nonparametric Bayesian Modeling for Automated Database Schema Matching
The problem of merging databases arises in many government and commercial
applications. Schema matching, a common first step, identifies equivalent
fields between databases. We introduce a schema matching framework that builds
nonparametric Bayesian models for each field and compares them by computing the
probability that a single model could have generated both fields. Our
experiments show that our method is more accurate and faster than the existing
instance-based matching algorithms in part because of the use of nonparametric
Bayesian models
Distributed PCP Theorems for Hardness of Approximation in P
We present a new distributed model of probabilistically checkable proofs
(PCP). A satisfying assignment to a CNF formula is
shared between two parties, where Alice knows , Bob knows
, and both parties know . The goal is to have
Alice and Bob jointly write a PCP that satisfies , while
exchanging little or no information. Unfortunately, this model as-is does not
allow for nontrivial query complexity. Instead, we focus on a non-deterministic
variant, where the players are helped by Merlin, a third party who knows all of
.
Using our framework, we obtain, for the first time, PCP-like reductions from
the Strong Exponential Time Hypothesis (SETH) to approximation problems in P.
In particular, under SETH we show that there are no truly-subquadratic
approximation algorithms for Bichromatic Maximum Inner Product over
{0,1}-vectors, Bichromatic LCS Closest Pair over permutations, Approximate
Regular Expression Matching, and Diameter in Product Metric. All our
inapproximability factors are nearly-tight. In particular, for the first two
problems we obtain nearly-polynomial factors of ; only
-factor lower bounds (under SETH) were known before
Recommended from our members
The Swiss army knife of time series data mining: ten useful things you can do with the matrix profile and ten lines of code
- …