4 research outputs found

    Randomized structure-adaptive optimization

    Get PDF
    This thesis advances the state-of-the-art of randomized optimization algorithms, to efficiently solve the large-scale composite optimization problems which appear increasingly more frequent in modern statistical machine learning and signal processing applications in this big-data era. It contributes from a special point of view, that the low-dimensional structure of the composite optimization problem’s solution (such as sparsity, group-sparsity, piece-wise smoothness, or low-rank structure, etc), can be actively exploited by some purposefully tailored optimization algorithms to achieve even faster convergence rates – namely, the structure-adaptive algorithms. Driven by this motivation, several randomized optimization algorithms are designed and analyzed in this thesis. The proposed methods are provably equipped with the desirable structure-adaptive property, including the sketched gradient descent algorithms, the structure-adaptive variants of accelerated stochastic variance-reduced gradient descent and randomized coordinate descent algorithms. The thesis provides successful and inspiring paradigms for the algorithmic design of randomized structure-adaptive methods, confirming that the low-dimensional structure is indeed a promising “hidden treasure” to be exploited for accelerating large-scale optimization

    Analyzing intentions from big data traces of human activities

    Get PDF
    The rapid growth of big data formed by human activities makes research on intention analysis both challenging and rewarding. We study multifaceted problems in analyzing intentions from big data traces of human activities, and such problems span a range of machine learning, optimization, and security and privacy. We show that analyzing intentions from industry-scale human activity big data can effectively improve the accuracy of computational models. Specifically, we take query auto-completion as a case study. We identify two hitherto-undiscovered problems: adaptive query auto-completion and mobile query auto-completion. We develop two computational models by analyzing intentions from big data traces of human activities on search interface interactions and on mobile application usage respectively. Solving the large-scale optimization problems in the proposed query auto-completion models drives deeper studies of the solvers. Hence, we consider the generalized machine learning problem settings and focus on developing lightweight stochastic algorithms as solvers to the large-scale convex optimization problems with theoretical guarantees. For optimizing strongly convex objectives, we design an accelerated stochastic block coordinate descent method with optimal sampling; for optimizing non-strongly convex objectives, we design a stochastic variance reduced alternating direction method of multipliers with the doubling-trick. Inevitably, human activities are human-centric, thus its research can inform security and privacy. On one hand, intention analysis research from human activities can be motivated from the security perspective. For instance, to reduce false alarms of medical service providers' suspicious accesses to electronic health records, we discover potential de facto diagnosis specialties that reflect such providers' genuine and permissible intentions of accessing records with certain diagnoses. On the other hand, we examine the privacy risk in anonymized heterogeneous information networks representing large-scale human activities, such as in social networking. Such data are released for external researchers to improve the prediction accuracy for users' online social networking intentions on the publishers' microblogging site. We show a negative result that makes a compelling argument: privacy must be a central goal for sensitive human activity data publishers

    Robust and Scalable Data Representation and Analysis Leveraging Isometric Transformations and Sparsity

    Get PDF
    The main focus of this doctoral thesis is to study the problem of robust and scalable data representation and analysis. The success of any machine learning and signal processing framework relies on how the data is represented and analyzed. Thus, in this work, we focus on three closely related problems: (i) supervised representation learning, (ii) unsupervised representation learning, and (iii) fault tolerant data analysis. For the first task, we put forward new theoretical results on why a certain family of neural networks can become extremely deep and how we can improve this scalability property in a mathematically sound manner. We further investigate how we can employ them to generate data representations that are robust to outliers and to retrieve representative subsets of huge datasets. For the second task, we will discuss two different methods, namely compressive sensing (CS) and nonnegative matrix factorization (NMF). We show that we can employ prior knowledge, such as slow variation in time, to introduce an unsupervised learning component to the traditional CS framework and to learn better compressed representations. Furthermore, we show that prior knowledge and sparsity constraint can be used in the context of NMF, not to find sparse hidden factors, but to enforce other structures, such as piece-wise continuity. Finally, for the third task, we investigate how a data analysis framework can become robust to faulty data and faulty data processors. We employ Bayesian inference and propose a scheme that can solve the CS recovery problem in an asynchronous parallel manner. Furthermore, we show how sparsity can be used to make an optimization problem robust to faulty data measurements. The methods investigated in this work have applications in different practical problems such as resource allocation in wireless networks, source localization, image/video classification, and search engines. A detailed discussion of these practical applications will be presented for each method

    Uncertainty in Artificial Intelligence: Proceedings of the Thirty-Fourth Conference

    Get PDF
    corecore