262,064 research outputs found
The computational complexity of ReLU network training parameterized by data dimensionality
Understanding the computational complexity of training simple neural networks with rectified linear units (ReLUs) has recently been a subject of intensive research. Closing gaps and complementing results from the literature, we present several results on the parameterized complexity of training two-layer ReLU networks with respect to various loss functions. After a brief discussion of other parameters, we focus on analyzing the influence of the dimension d of the training data on the computational complexity. We provide running time lower bounds in terms of W[1]-hardness for parameter d and prove that known brute-force strategies are essentially optimal (assuming the Exponential Time Hypothesis). In comparison with previous work, our results hold for a broad(er) range of loss functions, including `p-loss for all p ∈ [0, ∞]. In particular, we improve a known polynomial-time algorithm for constant d and convex loss functions to a more general class of loss functions, matching our running time lower bounds also in these cases
Effects of the Generation Size and Overlap on Throughput and Complexity in Randomized Linear Network Coding
To reduce computational complexity and delay in randomized network coded
content distribution, and for some other practical reasons, coding is not
performed simultaneously over all content blocks, but over much smaller,
possibly overlapping subsets of these blocks, known as generations. A penalty
of this strategy is throughput reduction. To analyze the throughput loss, we
model coding over generations with random generation scheduling as a coupon
collector's brotherhood problem. This model enables us to derive the expected
number of coded packets needed for successful decoding of the entire content as
well as the probability of decoding failure (the latter only when generations
do not overlap) and further, to quantify the tradeoff between computational
complexity and throughput. Interestingly, with a moderate increase in the
generation size, throughput quickly approaches link capacity. Overlaps between
generations can further improve throughput substantially for relatively small
generation sizes.Comment: To appear in IEEE Transactions on Information Theory Special Issue:
Facets of Coding Theory: From Algorithms to Networks, Feb 201
Scalable Coordinated Beamforming for Dense Wireless Cooperative Networks
To meet the ever growing demand for both high throughput and uniform coverage
in future wireless networks, dense network deployment will be ubiquitous, for
which co- operation among the access points is critical. Considering the
computational complexity of designing coordinated beamformers for dense
networks, low-complexity and suboptimal precoding strategies are often adopted.
However, it is not clear how much performance loss will be caused. To enable
optimal coordinated beamforming, in this paper, we propose a framework to
design a scalable beamforming algorithm based on the alternative direction
method of multipliers (ADMM) method. Specifically, we first propose to apply
the matrix stuffing technique to transform the original optimization problem to
an equivalent ADMM-compliant problem, which is much more efficient than the
widely-used modeling framework CVX. We will then propose to use the ADMM
algorithm, a.k.a. the operator splitting method, to solve the transformed
ADMM-compliant problem efficiently. In particular, the subproblems of the ADMM
algorithm at each iteration can be solved with closed-forms and in parallel.
Simulation results show that the proposed techniques can result in significant
computational efficiency compared to the state- of-the-art interior-point
solvers. Furthermore, the simulation results demonstrate that the optimal
coordinated beamforming can significantly improve the system performance
compared to sub-optimal zero forcing beamforming
Recovery Guarantees for One-hidden-layer Neural Networks
In this paper, we consider regression problems with one-hidden-layer neural
networks (1NNs). We distill some properties of activation functions that lead
to in the neighborhood of the ground-truth
parameters for the 1NN squared-loss objective. Most popular nonlinear
activation functions satisfy the distilled properties, including rectified
linear units (ReLUs), leaky ReLUs, squared ReLUs and sigmoids. For activation
functions that are also smooth, we show
guarantees of gradient descent under a resampling rule. For homogeneous
activations, we show tensor methods are able to initialize the parameters to
fall into the local strong convexity region. As a result, tensor initialization
followed by gradient descent is guaranteed to recover the ground truth with
sample complexity
and computational complexity for
smooth homogeneous activations with high probability, where is the
dimension of the input, () is the number of hidden nodes,
is a conditioning property of the ground-truth parameter matrix
between the input layer and the hidden layer, is the targeted
precision and is the number of samples. To the best of our knowledge, this
is the first work that provides recovery guarantees for 1NNs with both sample
complexity and computational complexity in the input
dimension and in the precision.Comment: ICML 201
- …