357 research outputs found
On The Robustness of a Neural Network
With the development of neural networks based machine learning and their
usage in mission critical applications, voices are rising against the
\textit{black box} aspect of neural networks as it becomes crucial to
understand their limits and capabilities. With the rise of neuromorphic
hardware, it is even more critical to understand how a neural network, as a
distributed system, tolerates the failures of its computing nodes, neurons, and
its communication channels, synapses. Experimentally assessing the robustness
of neural networks involves the quixotic venture of testing all the possible
failures, on all the possible inputs, which ultimately hits a combinatorial
explosion for the first, and the impossibility to gather all the possible
inputs for the second.
In this paper, we prove an upper bound on the expected error of the output
when a subset of neurons crashes. This bound involves dependencies on the
network parameters that can be seen as being too pessimistic in the average
case. It involves a polynomial dependency on the Lipschitz coefficient of the
neurons activation function, and an exponential dependency on the depth of the
layer where a failure occurs. We back up our theoretical results with
experiments illustrating the extent to which our prediction matches the
dependencies between the network parameters and robustness. Our results show
that the robustness of neural networks to the average crash can be estimated
without the need to neither test the network on all failure configurations, nor
access the training set used to train the network, both of which are
practically impossible requirements.Comment: 36th IEEE International Symposium on Reliable Distributed Systems 26
- 29 September 2017. Hong Kong, Chin
Counterexample-Guided Learning of Monotonic Neural Networks
The widespread adoption of deep learning is often attributed to its automatic
feature construction with minimal inductive bias. However, in many real-world
tasks, the learned function is intended to satisfy domain-specific constraints.
We focus on monotonicity constraints, which are common and require that the
function's output increases with increasing values of specific input features.
We develop a counterexample-guided technique to provably enforce monotonicity
constraints at prediction time. Additionally, we propose a technique to use
monotonicity as an inductive bias for deep learning. It works by iteratively
incorporating monotonicity counterexamples in the learning process. Contrary to
prior work in monotonic learning, we target general ReLU neural networks and do
not further restrict the hypothesis space. We have implemented these techniques
in a tool called COMET. Experiments on real-world datasets demonstrate that our
approach achieves state-of-the-art results compared to existing monotonic
learners, and can improve the model quality compared to those that were trained
without taking monotonicity constraints into account
An adaptive transport framework for joint and conditional density estimation
We propose a general framework to robustly characterize joint and conditional
probability distributions via transport maps. Transport maps or "flows"
deterministically couple two distributions via an expressive monotone
transformation. Yet, learning the parameters of such transformations in high
dimensions is challenging given few samples from the unknown target
distribution, and structural choices for these transformations can have a
significant impact on performance. Here we formulate a systematic framework for
representing and learning monotone maps, via invertible transformations of
smooth functions, and demonstrate that the associated minimization problem has
a unique global optimum. Given a hierarchical basis for the appropriate
function space, we propose a sample-efficient adaptive algorithm that estimates
a sparse approximation for the map. We demonstrate how this framework can learn
densities with stable generalization performance across a wide range of sample
sizes on real-world datasets
What Circuit Classes Can Be Learned with Non-Trivial Savings?
Despite decades of intensive research, efficient - or even sub-exponential time - distribution-free PAC learning algorithms are not known for many important Boolean function classes. In this work we suggest a new perspective on these learning problems, inspired by a surge of recent research in complexity theory, in which the goal is to determine whether and how much of a savings over a naive 2^n runtime can be achieved.
We establish a range of exploratory results towards this end. In more detail,
(1) We first observe that a simple approach building on known uniform-distribution learning results gives non-trivial distribution-free learning algorithms for several well-studied classes including AC0, arbitrary functions of a few linear threshold functions (LTFs), and AC0 augmented with mod_p gates.
(2) Next we present an approach, based on the method of random restrictions from circuit complexity, which can be used to obtain several distribution-free learning algorithms that do not appear to be achievable by approach (1) above. The results achieved in this way include learning algorithms with non-trivial savings for LTF-of-AC0 circuits and improved savings for learning parity-of-AC0 circuits.
(3) Finally, our third contribution is a generic technique for converting lower bounds proved using Neciporuk\u27s method to learning algorithms with non-trivial savings. This technique, which is the most involved of our three approaches, yields distribution-free learning algorithms for a range of classes where previously even non-trivial uniform-distribution learning algorithms were not known; these classes include full-basis formulas, branching programs, span programs, etc. up to some fixed polynomial size
Lower Bounds for DeMorgan Circuits of Bounded Negation Width
We consider Boolean circuits over {or, and, neg} with negations applied only to input variables. To measure the "amount of negation" in such circuits, we introduce the concept of their "negation width". In particular, a circuit computing a monotone Boolean function f(x_1,...,x_n) has negation width w if no nonzero term produced (purely syntactically) by the circuit contains more than w distinct negated variables. Circuits of negation width w=0 are equivalent to monotone Boolean circuits, while those of negation width w=n have no restrictions. Our motivation is that already circuits of moderate negation width w=n^{epsilon} for an arbitrarily small constant epsilon>0 can be even exponentially stronger than monotone circuits.
We show that the size of any circuit of negation width w computing f is roughly at least the minimum size of a monotone circuit computing f divided by K=min{w^m,m^w}, where m is the maximum length of a prime implicant of f. We also show that the depth of any circuit of negation width w computing f is roughly at least the minimum depth of a monotone circuit computing f minus log K. Finally, we show that formulas of bounded negation width can be balanced to achieve a logarithmic (in their size) depth without increasing their negation width
Predicting Audio Advertisement Quality
Online audio advertising is a particular form of advertising used abundantly
in online music streaming services. In these platforms, which tend to host tens
of thousands of unique audio advertisements (ads), providing high quality ads
ensures a better user experience and results in longer user engagement.
Therefore, the automatic assessment of these ads is an important step toward
audio ads ranking and better audio ads creation. In this paper we propose one
way to measure the quality of the audio ads using a proxy metric called Long
Click Rate (LCR), which is defined by the amount of time a user engages with
the follow-up display ad (that is shown while the audio ad is playing) divided
by the impressions. We later focus on predicting the audio ad quality using
only acoustic features such as harmony, rhythm, and timbre of the audio,
extracted from the raw waveform. We discuss how the characteristics of the
sound can be connected to concepts such as the clarity of the audio ad message,
its trustworthiness, etc. Finally, we propose a new deep learning model for
audio ad quality prediction, which outperforms the other discussed models
trained on hand-crafted features. To the best of our knowledge, this is the
first large-scale audio ad quality prediction study.Comment: WSDM '18 Proceedings of the Eleventh ACM International Conference on
Web Search and Data Mining, 9 page
- …