A wide range of learning tasks require human input in labeling massive data.
The collected data though are usually low quality and contain inaccuracies and
errors. As a result, modern science and business face the problem of learning
from unreliable data sets.
In this work, we provide a generic approach that is based on
\textit{verification} of only few records of the data set to guarantee high
quality learning outcomes for various optimization objectives. Our method,
identifies small sets of critical records and verifies their validity. We show
that many problems only need poly(1/ε) verifications, to
ensure that the output of the computation is at most a factor of (1±ε) away from the truth. For any given instance, we provide an
\textit{instance optimal} solution that verifies the minimum possible number of
records to approximately certify correctness. Then using this instance optimal
formulation of the problem we prove our main result: "every function that
satisfies some Lipschitz continuity condition can be certified with a small
number of verifications". We show that the required Lipschitz continuity
condition is satisfied even by some NP-complete problems, which illustrates the
generality and importance of this theorem.
In case this certification step fails, an invalid record will be identified.
Removing these records and repeating until success, guarantees that the result
will be accurate and will depend only on the verified records. Surprisingly, as
we show, for several computation tasks more efficient methods are possible.
These methods always guarantee that the produced result is not affected by the
invalid records, since any invalid record that affects the output will be
detected and verified