5 research outputs found
On the Learnability of Deep Random Networks
In this paper we study the learnability of deep random networks from both
theoretical and practical points of view. On the theoretical front, we show
that the learnability of random deep networks with sign activation drops
exponentially with its depth. On the practical front, we find that the
learnability drops sharply with depth even with the state-of-the-art training
methods, suggesting that our stylized theoretical results are closer to
reality
Learning Boolean Circuits with Neural Networks
While on some natural distributions, neural-networks are trained efficiently
using gradient-based algorithms, it is known that learning them is
computationally hard in the worst-case. To separate hard from easy to learn
distributions, we observe the property of local correlation: correlation
between local patterns of the input and the target label. We focus on learning
deep neural-networks using a gradient-based algorithm, when the target function
is a tree-structured Boolean circuit. We show that in this case, the existence
of correlation between the gates of the circuit and the target label determines
whether the optimization succeeds or fails. Using this result, we show that
neural-networks can learn the (log n)-parity problem for most product
distributions. These results hint that local correlation may play an important
role in separating easy/hard to learn distributions. We also obtain a novel
depth separation result, in which we show that a shallow network cannot express
some functions, while there exists an efficient gradient-based algorithm that
can learn the very same functions using a deep network. The negative
expressivity result for shallow networks is obtained by a reduction from
results in communication complexity, that may be of independent interest
Hardness of Learning Neural Networks with Natural Weights
Neural networks are nowadays highly successful despite strong hardness
results. The existing hardness results focus on the network architecture, and
assume that the network's weights are arbitrary. A natural approach to settle
the discrepancy is to assume that the network's weights are "well-behaved" and
posses some generic properties that may allow efficient learning. This approach
is supported by the intuition that the weights in real-world networks are not
arbitrary, but exhibit some "random-like" properties with respect to some
"natural" distributions. We prove negative results in this regard, and show
that for depth- networks, and many "natural" weights distributions such as
the normal and the uniform distribution, most networks are hard to learn.
Namely, there is no efficient learning algorithm that is provably successful
for most weights, and every input distribution. It implies that there is no
generic property that holds with high probability in such random networks and
allows efficient learning
A Deep Conditioning Treatment of Neural Networks
We study the role of depth in training randomly initialized overparameterized
neural networks. We give a general result showing that depth improves
trainability of neural networks by improving the conditioning of certain kernel
matrices of the input data. This result holds for arbitrary non-linear
activation functions under a certain normalization. We provide versions of the
result that hold for training just the top layer of the neural network, as well
as for training all layers, via the neural tangent kernel. As applications of
these general results, we provide a generalization of the results of Das et al.
(2019) showing that learnability of deep random neural networks with a large
class of non-linear activations degrades exponentially with depth. We also show
how benign overfitting can occur in deep neural networks via the results of
Bartlett et al. (2019b). We also give experimental evidence that normalized
versions of ReLU are a viable alternative to more complex operations like Batch
Normalization in training deep neural networks.Comment: In proceedings of ALT 202
High Accuracy and High Fidelity Extraction of Neural Networks
In a model extraction attack, an adversary steals a copy of a remotely
deployed machine learning model, given oracle prediction access. We taxonomize
model extraction attacks around two objectives: *accuracy*, i.e., performing
well on the underlying learning task, and *fidelity*, i.e., matching the
predictions of the remote victim classifier on any input.
To extract a high-accuracy model, we develop a learning-based attack
exploiting the victim to supervise the training of an extracted model. Through
analytical and empirical arguments, we then explain the inherent limitations
that prevent any learning-based strategy from extracting a truly high-fidelity
model---i.e., extracting a functionally-equivalent model whose predictions are
identical to those of the victim model on all possible inputs. Addressing these
limitations, we expand on prior work to develop the first practical
functionally-equivalent extraction attack for direct extraction (i.e., without
training) of a model's weights.
We perform experiments both on academic datasets and a state-of-the-art image
classifier trained with 1 billion proprietary images. In addition to broadening
the scope of model extraction research, our work demonstrates the practicality
of model extraction attacks against production-grade systems.Comment: USENIX Security 2020, 18 pages, 6 figure