262,064 research outputs found

    The computational complexity of ReLU network training parameterized by data dimensionality

    Get PDF
    Understanding the computational complexity of training simple neural networks with rectified linear units (ReLUs) has recently been a subject of intensive research. Closing gaps and complementing results from the literature, we present several results on the parameterized complexity of training two-layer ReLU networks with respect to various loss functions. After a brief discussion of other parameters, we focus on analyzing the influence of the dimension d of the training data on the computational complexity. We provide running time lower bounds in terms of W[1]-hardness for parameter d and prove that known brute-force strategies are essentially optimal (assuming the Exponential Time Hypothesis). In comparison with previous work, our results hold for a broad(er) range of loss functions, including `p-loss for all p ∈ [0, ∞]. In particular, we improve a known polynomial-time algorithm for constant d and convex loss functions to a more general class of loss functions, matching our running time lower bounds also in these cases

    Effects of the Generation Size and Overlap on Throughput and Complexity in Randomized Linear Network Coding

    Full text link
    To reduce computational complexity and delay in randomized network coded content distribution, and for some other practical reasons, coding is not performed simultaneously over all content blocks, but over much smaller, possibly overlapping subsets of these blocks, known as generations. A penalty of this strategy is throughput reduction. To analyze the throughput loss, we model coding over generations with random generation scheduling as a coupon collector's brotherhood problem. This model enables us to derive the expected number of coded packets needed for successful decoding of the entire content as well as the probability of decoding failure (the latter only when generations do not overlap) and further, to quantify the tradeoff between computational complexity and throughput. Interestingly, with a moderate increase in the generation size, throughput quickly approaches link capacity. Overlaps between generations can further improve throughput substantially for relatively small generation sizes.Comment: To appear in IEEE Transactions on Information Theory Special Issue: Facets of Coding Theory: From Algorithms to Networks, Feb 201

    Scalable Coordinated Beamforming for Dense Wireless Cooperative Networks

    Full text link
    To meet the ever growing demand for both high throughput and uniform coverage in future wireless networks, dense network deployment will be ubiquitous, for which co- operation among the access points is critical. Considering the computational complexity of designing coordinated beamformers for dense networks, low-complexity and suboptimal precoding strategies are often adopted. However, it is not clear how much performance loss will be caused. To enable optimal coordinated beamforming, in this paper, we propose a framework to design a scalable beamforming algorithm based on the alternative direction method of multipliers (ADMM) method. Specifically, we first propose to apply the matrix stuffing technique to transform the original optimization problem to an equivalent ADMM-compliant problem, which is much more efficient than the widely-used modeling framework CVX. We will then propose to use the ADMM algorithm, a.k.a. the operator splitting method, to solve the transformed ADMM-compliant problem efficiently. In particular, the subproblems of the ADMM algorithm at each iteration can be solved with closed-forms and in parallel. Simulation results show that the proposed techniques can result in significant computational efficiency compared to the state- of-the-art interior-point solvers. Furthermore, the simulation results demonstrate that the optimal coordinated beamforming can significantly improve the system performance compared to sub-optimal zero forcing beamforming

    Recovery Guarantees for One-hidden-layer Neural Networks

    Full text link
    In this paper, we consider regression problems with one-hidden-layer neural networks (1NNs). We distill some properties of activation functions that lead to local strong convexity\mathit{local~strong~convexity} in the neighborhood of the ground-truth parameters for the 1NN squared-loss objective. Most popular nonlinear activation functions satisfy the distilled properties, including rectified linear units (ReLUs), leaky ReLUs, squared ReLUs and sigmoids. For activation functions that are also smooth, we show local linear convergence\mathit{local~linear~convergence} guarantees of gradient descent under a resampling rule. For homogeneous activations, we show tensor methods are able to initialize the parameters to fall into the local strong convexity region. As a result, tensor initialization followed by gradient descent is guaranteed to recover the ground truth with sample complexity dlog(1/ϵ)poly(k,λ) d \cdot \log(1/\epsilon) \cdot \mathrm{poly}(k,\lambda ) and computational complexity ndpoly(k,λ)n\cdot d \cdot \mathrm{poly}(k,\lambda) for smooth homogeneous activations with high probability, where dd is the dimension of the input, kk (kdk\leq d) is the number of hidden nodes, λ\lambda is a conditioning property of the ground-truth parameter matrix between the input layer and the hidden layer, ϵ\epsilon is the targeted precision and nn is the number of samples. To the best of our knowledge, this is the first work that provides recovery guarantees for 1NNs with both sample complexity and computational complexity linear\mathit{linear} in the input dimension and logarithmic\mathit{logarithmic} in the precision.Comment: ICML 201
    corecore