13,971 research outputs found
Noise-Tolerant Learning, the Parity Problem, and the Statistical Query Model
We describe a slightly sub-exponential time algorithm for learning parity
functions in the presence of random classification noise. This results in a
polynomial-time algorithm for the case of parity functions that depend on only
the first O(log n log log n) bits of input. This is the first known instance of
an efficient noise-tolerant algorithm for a concept class that is provably not
learnable in the Statistical Query model of Kearns. Thus, we demonstrate that
the set of problems learnable in the statistical query model is a strict subset
of those problems learnable in the presence of noise in the PAC model.
In coding-theory terms, what we give is a poly(n)-time algorithm for decoding
linear k by n codes in the presence of random noise for the case of k = c log n
loglog n for some c > 0. (The case of k = O(log n) is trivial since one can
just individually check each of the 2^k possible messages and choose the one
that yields the closest codeword.)
A natural extension of the statistical query model is to allow queries about
statistical properties that involve t-tuples of examples (as opposed to single
examples). The second result of this paper is to show that any class of
functions learnable (strongly or weakly) with t-wise queries for t = O(log n)
is also weakly learnable with standard unary queries. Hence this natural
extension to the statistical query model does not increase the set of weakly
learnable functions
A Framework for Efficient Adaptively Secure Composable Oblivious Transfer in the ROM
Oblivious Transfer (OT) is a fundamental cryptographic protocol that finds a
number of applications, in particular, as an essential building block for
two-party and multi-party computation. We construct a round-optimal (2 rounds)
universally composable (UC) protocol for oblivious transfer secure against
active adaptive adversaries from any OW-CPA secure public-key encryption scheme
with certain properties in the random oracle model (ROM). In terms of
computation, our protocol only requires the generation of a public/secret-key
pair, two encryption operations and one decryption operation, apart from a few
calls to the random oracle. In~terms of communication, our protocol only
requires the transfer of one public-key, two ciphertexts, and three binary
strings of roughly the same size as the message. Next, we show how to
instantiate our construction under the low noise LPN, McEliece, QC-MDPC, LWE,
and CDH assumptions. Our instantiations based on the low noise LPN, McEliece,
and QC-MDPC assumptions are the first UC-secure OT protocols based on coding
assumptions to achieve: 1) adaptive security, 2) optimal round complexity, 3)
low communication and computational complexities. Previous results in this
setting only achieved static security and used costly cut-and-choose
techniques.Our instantiation based on CDH achieves adaptive security at the
small cost of communicating only two more group elements as compared to the
gap-DH based Simplest OT protocol of Chou and Orlandi (Latincrypt 15), which
only achieves static security in the ROM
Leveraging Crowdsourcing Data For Deep Active Learning - An Application: Learning Intents in Alexa
This paper presents a generic Bayesian framework that enables any deep
learning model to actively learn from targeted crowds. Our framework inherits
from recent advances in Bayesian deep learning, and extends existing work by
considering the targeted crowdsourcing approach, where multiple annotators with
unknown expertise contribute an uncontrolled amount (often limited) of
annotations. Our framework leverages the low-rank structure in annotations to
learn individual annotator expertise, which then helps to infer the true labels
from noisy and sparse annotations. It provides a unified Bayesian model to
simultaneously infer the true labels and train the deep learning model in order
to reach an optimal learning efficacy. Finally, our framework exploits the
uncertainty of the deep learning model during prediction as well as the
annotators' estimated expertise to minimize the number of required annotations
and annotators for optimally training the deep learning model.
We evaluate the effectiveness of our framework for intent classification in
Alexa (Amazon's personal assistant), using both synthetic and real-world
datasets. Experiments show that our framework can accurately learn annotator
expertise, infer true labels, and effectively reduce the amount of annotations
in model training as compared to state-of-the-art approaches. We further
discuss the potential of our proposed framework in bridging machine learning
and crowdsourcing towards improved human-in-the-loop systems
- …