176 research outputs found

    Data Brushes: Interactive Style Transfer for Data Art

    Get PDF

    Robust Inference of Manifold Density and Geometry by Doubly Stochastic Scaling

    Full text link
    The Gaussian kernel and its traditional normalizations (e.g., row-stochastic) are popular approaches for assessing similarities between data points, commonly used for manifold learning and clustering, as well as supervised and semi-supervised learning on graphs. In many practical situations, the data can be corrupted by noise that prohibits traditional affinity matrices from correctly assessing similarities, especially if the noise magnitudes vary considerably across the data, e.g., under heteroskedasticity or outliers. An alternative approach that provides a more stable behavior under noise is the doubly stochastic normalization of the Gaussian kernel. In this work, we investigate this normalization in a setting where points are sampled from an unknown density on a low-dimensional manifold embedded in high-dimensional space and corrupted by possibly strong, non-identically distributed, sub-Gaussian noise. We establish the pointwise concentration of the doubly stochastic affinity matrix and its scaling factors around certain population forms. We then utilize these results to develop several tools for robust inference. First, we derive a robust density estimator that can substantially outperform the standard kernel density estimator under high-dimensional noise. Second, we provide estimators for the pointwise noise magnitudes, the pointwise signal magnitudes, and the pairwise Euclidean distances between clean data points. Lastly, we derive robust graph Laplacian normalizations that approximate popular manifold Laplacians, including the Laplace Beltrami operator, showing that the local geometry of the manifold can be recovered under high-dimensional noise. We exemplify our results in simulations and on real single-cell RNA-sequencing data. In the latter, we show that our proposed normalizations are robust to technical variability associated with different cell types

    Some New Results in Distributed Tracking and Optimization

    Get PDF
    The current age of Big Data is built on the foundation of distributed systems, and efficient distributed algorithms to run on these systems.With the rapid increase in the volume of the data being fed into these systems, storing and processing all this data at a central location becomes infeasible. Such a central \textit{server} requires a gigantic amount of computational and storage resources. Even when it is possible to have central servers, it is not always desirable, due to privacy concerns. Also, sending huge amounts of data to such servers incur often infeasible bandwidth requirements. In this dissertation, we consider two kinds of distributed architectures: 1) star-shaped topology, where multiple worker nodes are connected to, and communicate with a server, but the workers do not communicate with each other; and 2) mesh topology or network of interconnected workers, where each worker can communicate with a small number of neighboring workers. In the first half of this dissertation (Chapters 2 and 3), we consider distributed systems with mesh topology.We study two different problems in this context. First, we study the problem of simultaneous localization and multi-target tracking. Multiple mobile agents localize themselves cooperatively, while also tracking multiple, unknown number of mobile targets, in the presence of measurement-origin uncertainty. In situations with limited GPS signal availability, agents (like self-driving cars in urban canyons, or autonomous vehicles in hazardous environments) need to rely on inter-agent measurements for localization. The agents perform the additional task of tracking multiple targets (pedestrians and road-signs for self-driving cars). We propose a decentralized algorithm for this problem. To be effective in real-time applications, we propose efficient Gaussian and Gaussian-mixture based filters, rather than the computationally expensive particle-based methods in the existing literature. Our novel factor-graph based approach gives better performance, in terms of both agent localization errors, and target-location and cardinality errors. Next, we study an online convex optimization problem, where a network of agents cooperate to minimize a global time-varying objective function. Only the local functions are revealed to individual agents. The agents also need to satisfy their individual constraints. We propose a primal-dual update based decentralized algorithm for this problem. Under standard assumptions, we prove that the proposed algorithm achieves sublinear regret and constraint violation across the network. In other words, over a long enough time horizon, the decisions taken by the agents are, on average, as good as if all the information was revealed ahead of time. In addition, the individual constraint violations of the agents, averaged over time, are zero. In the next part of the dissertation (Chapters 4), we study distributed systems with a star-shaped topology. The problem we study is distributed nonconvex optimization. With the recent success of deep learning, coupled with the use of distributed systems to solve large-scale problems, this problem has gained prominence over the past decade. The recently proposed paradigm of Federated Learning (which has already been deployed by Google/Apple in Android/iOS phones) has further catalyzed research in this direction. The problem we consider is minimizing the average of local smooth, nonconvex functions. Each node has access only to its own loss function, but can communicate with the server, which aggregates updates from all the nodes, before distributing them to all the nodes. With the advent of more and more complex neural network architectures, these updates can be high dimensional. To save resources, the problem needs to be solved via communication-efficient approaches. We propose a novel algorithm, which combines the idea of variance-reduction, with the paradigm of carrying out multiple local updates at each node before averaging. We prove the convergence of the approach to a first-order stationary point. Our algorithm is optimal in terms of computation, and state-of-the-art in terms of the communication requirements. Lastly in Chapter 5, we consider the situation when the nodes do not have access to function gradients, and need to minimize the loss function using only function values. This problem lies in the domain of zeroth-order optimization. For simplicity of analysis, we study this problem only in the single-node case. This problem finds application in simulation-based optimization, and adversarial example generation for attacking deep neural networks. We propose a novel function value based gradient estimator, which has better variance, and better query-efficiency compared to existing estimators. The proposed estimator covers the most commonly used existing estimators as special cases. We conduct a comprehensive convergence analysis under different conditions. We also demonstrate its effectiveness through a real-world application to generating adversarial examples from a black-box deep neural network

    A review of technical factors to consider when designing neural networks for semantic segmentation of Earth Observation imagery

    Full text link
    Semantic segmentation (classification) of Earth Observation imagery is a crucial task in remote sensing. This paper presents a comprehensive review of technical factors to consider when designing neural networks for this purpose. The review focuses on Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Generative Adversarial Networks (GANs), and transformer models, discussing prominent design patterns for these ANN families and their implications for semantic segmentation. Common pre-processing techniques for ensuring optimal data preparation are also covered. These include methods for image normalization and chipping, as well as strategies for addressing data imbalance in training samples, and techniques for overcoming limited data, including augmentation techniques, transfer learning, and domain adaptation. By encompassing both the technical aspects of neural network design and the data-related considerations, this review provides researchers and practitioners with a comprehensive and up-to-date understanding of the factors involved in designing effective neural networks for semantic segmentation of Earth Observation imagery.Comment: 145 pages with 32 figure
    • …
    corecore