31 research outputs found

    Distributed Dictionary Learning

    Full text link
    The paper studies distributed Dictionary Learning (DL) problems where the learning task is distributed over a multi-agent network with time-varying (nonsymmetric) connectivity. This formulation is relevant, for instance, in big-data scenarios where massive amounts of data are collected/stored in different spatial locations and it is unfeasible to aggregate and/or process all the data in a fusion center, due to resource limitations, communication overhead or privacy considerations. We develop a general distributed algorithmic framework for the (nonconvex) DL problem and establish its asymptotic convergence. The new method hinges on Successive Convex Approximation (SCA) techniques coupled with i) a gradient tracking mechanism instrumental to locally estimate the missing global information; and ii) a consensus step, as a mechanism to distribute the computations among the agents. To the best of our knowledge, this is the first distributed algorithm with provable convergence for the DL problem and, more in general, bi-convex optimization problems over (time-varying) directed graphs

    Hybrid Random/Deterministic Parallel Algorithms for Nonconvex Big Data Optimization

    Full text link
    We propose a decomposition framework for the parallel optimization of the sum of a differentiable {(possibly nonconvex)} function and a nonsmooth (possibly nonseparable), convex one. The latter term is usually employed to enforce structure in the solution, typically sparsity. The main contribution of this work is a novel \emph{parallel, hybrid random/deterministic} decomposition scheme wherein, at each iteration, a subset of (block) variables is updated at the same time by minimizing local convex approximations of the original nonconvex function. To tackle with huge-scale problems, the (block) variables to be updated are chosen according to a \emph{mixed random and deterministic} procedure, which captures the advantages of both pure deterministic and random update-based schemes. Almost sure convergence of the proposed scheme is established. Numerical results show that on huge-scale problems the proposed hybrid random/deterministic algorithm outperforms both random and deterministic schemes.Comment: The order of the authors is alphabetica

    On the impact of activation and normalization in obtaining isometric embeddings at initialization

    Full text link
    In this paper, we explore the structure of the penultimate Gram matrix in deep neural networks, which contains the pairwise inner products of outputs corresponding to a batch of inputs. In several architectures it has been observed that this Gram matrix becomes degenerate with depth at initialization, which dramatically slows training. Normalization layers, such as batch or layer normalization, play a pivotal role in preventing the rank collapse issue. Despite promising advances, the existing theoretical results (i) do not extend to layer normalization, which is widely used in transformers, (ii) can not characterize the bias of normalization quantitatively at finite depth. To bridge this gap, we provide a proof that layer normalization, in conjunction with activation layers, biases the Gram matrix of a multilayer perceptron towards isometry at an exponential rate with depth at initialization. We quantify this rate using the Hermite expansion of the activation function, highlighting the importance of higher order (≥2\ge 2) Hermite coefficients in the bias towards isometry

    Batch Normalization Orthogonalizes Representations in Deep Random Networks

    Full text link
    This paper underlines a subtle property of batch-normalization (BN): Successive batch normalizations with random linear transformations make hidden representations increasingly orthogonal across layers of a deep neural network. We establish a non-asymptotic characterization of the interplay between depth, width, and the orthogonality of deep representations. More precisely, under a mild assumption, we prove that the deviation of the representations from orthogonality rapidly decays with depth up to a term inversely proportional to the network width. This result has two main implications: 1) Theoretically, as the depth grows, the distribution of the representation -- after the linear layers -- contracts to a Wasserstein-2 ball around an isotropic Gaussian distribution. Furthermore, the radius of this Wasserstein ball shrinks with the width of the network. 2) In practice, the orthogonality of the representations directly influences the performance of stochastic gradient descent (SGD). When representations are initially aligned, we observe SGD wastes many iterations to orthogonalize representations before the classification. Nevertheless, we experimentally show that starting optimization from orthogonal representations is sufficient to accelerate SGD, with no need for BN

    Decentralized Dictionary Learning Over Time-Varying Digraphs

    Full text link
    This paper studies Dictionary Learning problems wherein the learning task is distributed over a multi-agent network, modeled as a time-varying directed graph. This formulation is relevant, for instance, in Big Data scenarios where massive amounts of data are collected/stored in different locations (e.g., sensors, clouds) and aggregating and/or processing all data in a fusion center might be inefficient or unfeasible, due to resource limitations, communication overheads or privacy issues. We develop a unified decentralized algorithmic framework for this class of nonconvex problems, which is proved to converge to stationary solutions at a sublinear rate. The new method hinges on Successive Convex Approximation techniques, coupled with a decentralized tracking mechanism aiming at locally estimating the gradient of the smooth part of the sum-utility. To the best of our knowledge, this is the first provably convergent decentralized algorithm for Dictionary Learning and, more generally, bi-convex problems over (time-varying) (di)graphs

    Residual Energy Based Cluster-head Selection in WSNs for IoT Application

    Full text link
    Wireless sensor networks (WSN) groups specialized transducers that provide sensing services to Internet of Things (IoT) devices with limited energy and storage resources. Since replacement or recharging of batteries in sensor nodes is almost impossible, power consumption becomes one of the crucial design issues in WSN. Clustering algorithm plays an important role in power conservation for the energy constrained network. Choosing a cluster head can appropriately balance the load in the network thereby reducing energy consumption and enhancing lifetime. The paper focuses on an efficient cluster head election scheme that rotates the cluster head position among the nodes with higher energy level as compared to other. The algorithm considers initial energy, residual energy and an optimum value of cluster heads to elect the next group of cluster heads for the network that suits for IoT applications such as environmental monitoring, smart cities, and systems. Simulation analysis shows the modified version performs better than the LEACH protocol by enhancing the throughput by 60%, lifetime by 66%, and residual energy by 64%

    Communication Ttechnologies for edge learning and inference: a novel framework, open issues, and perspectives

    Get PDF
    With the continuous advancement of smart devices and their demand for data, the complex computation that was previously exclusive to the cloud server is now moving towards the edge of the network. Due to numerous reasons (e.g., applications demanding low latencies and data privacy), data-based computation has been brought closer to the originating source, forging the Edge Computing paradigm. Together with Machine Learning, Edge Computing has turned into a powerful local decision-making tool, thus fostering the advent of Edge Learning. The latter, however, has become delay-sensitive as well as resource-thirsty in terms of hardware and networking. New methods have been developed to solve or, at least, minimize these issues, as proposed in this research. In this study, we first investigate representative communication methods for edge learning and inference (ELI), focusing on data compression, latency, and resource management. Next, we propose an ELI-based video data prioritization framework which only considers the data having events and hence significantly reduces the transmission and storage resources when implemented in surveillance networks. Furthermore, in this overview, we critically examine various communication aspects related to Edge Learning by analyzing their issues and highlighting their advantages and disadvantages. Finally, we discuss challenges and present issues that are yet to be overcome
    corecore