21,862 research outputs found

    Greedy Column Subset Selection for Large-scale Data Sets

    Full text link
    In today's information systems, the availability of massive amounts of data necessitates the development of fast and accurate algorithms to summarize these data and represent them in a succinct format. One crucial problem in big data analytics is the selection of representative instances from large and massively-distributed data, which is formally known as the Column Subset Selection (CSS) problem. The solution to this problem enables data analysts to understand the insights of the data and explore its hidden structure. The selected instances can also be used for data preprocessing tasks such as learning a low-dimensional embedding of the data points or computing a low-rank approximation of the corresponding matrix. This paper presents a fast and accurate greedy algorithm for large-scale column subset selection. The algorithm minimizes an objective function which measures the reconstruction error of the data matrix based on the subset of selected columns. The paper first presents a centralized greedy algorithm for column subset selection which depends on a novel recursive formula for calculating the reconstruction error of the data matrix. The paper then presents a MapReduce algorithm which selects a few representative columns from a matrix whose columns are massively distributed across several commodity machines. The algorithm first learns a concise representation of all columns using random projection, and it then solves a generalized column subset selection problem at each machine in which a subset of columns are selected from the sub-matrix on that machine such that the reconstruction error of the concise representation is minimized. The paper demonstrates the effectiveness and efficiency of the proposed algorithm through an empirical evaluation on benchmark data sets.Comment: Under consideration for publication in Knowledge and Information System

    Far-Field Compression for Fast Kernel Summation Methods in High Dimensions

    Full text link
    We consider fast kernel summations in high dimensions: given a large set of points in dd dimensions (with d3d \gg 3) and a pair-potential function (the {\em kernel} function), we compute a weighted sum of all pairwise kernel interactions for each point in the set. Direct summation is equivalent to a (dense) matrix-vector multiplication and scales quadratically with the number of points. Fast kernel summation algorithms reduce this cost to log-linear or linear complexity. Treecodes and Fast Multipole Methods (FMMs) deliver tremendous speedups by constructing approximate representations of interactions of points that are far from each other. In algebraic terms, these representations correspond to low-rank approximations of blocks of the overall interaction matrix. Existing approaches require an excessive number of kernel evaluations with increasing dd and number of points in the dataset. To address this issue, we use a randomized algebraic approach in which we first sample the rows of a block and then construct its approximate, low-rank interpolative decomposition. We examine the feasibility of this approach theoretically and experimentally. We provide a new theoretical result showing a tighter bound on the reconstruction error from uniformly sampling rows than the existing state-of-the-art. We demonstrate that our sampling approach is competitive with existing (but prohibitively expensive) methods from the literature. We also construct kernel matrices for the Laplacian, Gaussian, and polynomial kernels -- all commonly used in physics and data analysis. We explore the numerical properties of blocks of these matrices, and show that they are amenable to our approach. Depending on the data set, our randomized algorithm can successfully compute low rank approximations in high dimensions. We report results for data sets with ambient dimensions from four to 1,000.Comment: 43 pages, 21 figure

    Estimation Considerations in Contextual Bandits

    Full text link
    Contextual bandit algorithms are sensitive to the estimation method of the outcome model as well as the exploration method used, particularly in the presence of rich heterogeneity or complex outcome models, which can lead to difficult estimation problems along the path of learning. We study a consideration for the exploration vs. exploitation framework that does not arise in multi-armed bandits but is crucial in contextual bandits; the way exploration and exploitation is conducted in the present affects the bias and variance in the potential outcome model estimation in subsequent stages of learning. We develop parametric and non-parametric contextual bandits that integrate balancing methods from the causal inference literature in their estimation to make it less prone to problems of estimation bias. We provide the first regret bound analyses for contextual bandits with balancing in the domain of linear contextual bandits that match the state of the art regret bounds. We demonstrate the strong practical advantage of balanced contextual bandits on a large number of supervised learning datasets and on a synthetic example that simulates model mis-specification and prejudice in the initial training data. Additionally, we develop contextual bandits with simpler assignment policies by leveraging sparse model estimation methods from the econometrics literature and demonstrate empirically that in the early stages they can improve the rate of learning and decrease regret

    Optimal CUR Matrix Decompositions

    Full text link
    The CUR decomposition of an m×nm \times n matrix AA finds an m×cm \times c matrix CC with a subset of c<nc < n columns of A,A, together with an r×nr \times n matrix RR with a subset of r<mr < m rows of A,A, as well as a c×rc \times r low-rank matrix UU such that the matrix CURC U R approximates the matrix A,A, that is, ACURF2(1+ϵ)AAkF2 || A - CUR ||_F^2 \le (1+\epsilon) || A - A_k||_F^2, where .F||.||_F denotes the Frobenius norm and AkA_k is the best m×nm \times n matrix of rank kk constructed via the SVD. We present input-sparsity-time and deterministic algorithms for constructing such a CUR decomposition where c=O(k/ϵ)c=O(k/\epsilon) and r=O(k/ϵ)r=O(k/\epsilon) and rank(U)=k(U) = k. Up to constant factors, our algorithms are simultaneously optimal in c,r,c, r, and rank(U)(U).Comment: small revision in lemma 4.

    Restricted Boltzmann machine to determine the input weights for extreme learning machines

    Full text link
    The Extreme Learning Machine (ELM) is a single-hidden layer feedforward neural network (SLFN) learning algorithm that can learn effectively and quickly. The ELM training phase assigns the input weights and bias randomly and does not change them in the whole process. Although the network works well, the random weights in the input layer can make the algorithm less effective and impact on its performance. Therefore, we propose a new approach to determine the input weights and bias for the ELM using the restricted Boltzmann machine (RBM), which we call RBM-ELM. We compare our new approach with a well-known approach to improve the ELM and a state of the art algorithm to select the weights for the ELM. The results show that the RBM-ELM outperforms both methodologies and achieve a better performance than the ELM.Comment: 14 pages, 7 figures and 5 table

    Deterministic Sampling of Sparse Trigonometric Polynomials

    Get PDF
    One can recover sparse multivariate trigonometric polynomials from few randomly taken samples with high probability (as shown by Kunis and Rauhut). We give a deterministic sampling of multivariate trigonometric polynomials inspired by Weil's exponential sum. Our sampling can produce a deterministic matrix satisfying the statistical restricted isometry property, and also nearly optimal Grassmannian frames. We show that one can exactly reconstruct every MM-sparse multivariate trigonometric polynomial with fixed degree and of length DD from the determinant sampling XX, using the orthogonal matching pursuit, and # X is a prime number greater than (MlogD)2(M\log D)^2. This result is almost optimal within the (logD)2(\log D)^2 factor. The simulations show that the deterministic sampling can offer reconstruction performance similar to the random sampling.Comment: 9 page

    Iteratively Reweighted 1\ell_1 Approaches to Sparse Composite Regularization

    Full text link
    Motivated by the observation that a given signal x\boldsymbol{x} admits sparse representations in multiple dictionaries Ψd\boldsymbol{\Psi}_d but with varying levels of sparsity across dictionaries, we propose two new algorithms for the reconstruction of (approximately) sparse signals from noisy linear measurements. Our first algorithm, Co-L1, extends the well-known lasso algorithm from the L1 regularizer Ψx1\|\boldsymbol{\Psi x}\|_1 to composite regularizers of the form dλdΨdx1\sum_d \lambda_d \|\boldsymbol{\Psi}_d \boldsymbol{x}\|_1 while self-adjusting the regularization weights λd\lambda_d. Our second algorithm, Co-IRW-L1, extends the well-known iteratively reweighted L1 algorithm to the same family of composite regularizers. We provide several interpretations of both algorithms: i) majorization-minimization (MM) applied to a non-convex log-sum-type penalty, ii) MM applied to an approximate 0\ell_0-type penalty, iii) MM applied to Bayesian MAP inference under a particular hierarchical prior, and iv) variational expectation-maximization (VEM) under a particular prior with deterministic unknown parameters. A detailed numerical study suggests that our proposed algorithms yield significantly improved recovery SNR when compared to their non-composite L1 and IRW-L1 counterparts

    Motion Planning of Uncertain Ordinary Differential Equation Systems

    Get PDF
    This work presents a novel motion planning framework, rooted in nonlinear programming theory, that treats uncertain fully and under-actuated dynamical systems described by ordinary differential equations. Uncertainty in multibody dynamical systems comes from various sources, such as: system parameters, initial conditions, sensor and actuator noise, and external forcing. Treatment of uncertainty in design is of paramount practical importance because all real-life systems are affected by it, and poor robustness and suboptimal performance result if it’s not accounted for in a given design. In this work uncertainties are modeled using Generalized Polynomial Chaos and are solved quantitatively using a least-square collocation method. The computational efficiency of this approach enables the inclusion of uncertainty statistics in the nonlinear programming optimization process. As such, the proposed framework allows the user to pose, and answer, new design questions related to uncertain dynamical systems. Specifically, the new framework is explained in the context of forward, inverse, and hybrid dynamics formulations. The forward dynamics formulation, applicable to both fully and under-actuated systems, prescribes deterministic actuator inputs which yield uncertain state trajectories. The inverse dynamics formulation is the dual to the forward dynamic, and is only applicable to fully-actuated systems; deterministic state trajectories are prescribed and yield uncertain actuator inputs. The inverse dynamics formulation is more computationally efficient as it requires only algebraic evaluations and completely avoids numerical integration. Finally, the hybrid dynamics formulation is applicable to under-actuated systems where it leverages the benefits of inverse dynamics for actuated joints and forward dynamics for unactuated joints; it prescribes actuated state and unactuated input trajectories which yield uncertain unactuated states and actuated inputs. The benefits of the ability to quantify uncertainty when planning the motion of multibody dynamic systems are illustrated through several case-studies. The resulting designs determine optimal motion plans—subject to deterministic and statistical constraints—for all possible systems within the probability space

    Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes

    Full text link
    We study an approach to policy selection for large relational Markov Decision Processes (MDPs). We consider a variant of approximate policy iteration (API) that replaces the usual value-function learning step with a learning step in policy space. This is advantageous in domains where good policies are easier to represent and learn than the corresponding value functions, which is often the case for the relational MDPs we are interested in. In order to apply API to such problems, we introduce a relational policy language and corresponding learner. In addition, we introduce a new bootstrapping routine for goal-based planning domains, based on random walks. Such bootstrapping is necessary for many large relational MDPs, where reward is extremely sparse, as API is ineffective in such domains when initialized with an uninformed policy. Our experiments show that the resulting system is able to find good policies for a number of classical planning domains and their stochastic variants by solving them as extremely large relational MDPs. The experiments also point to some limitations of our approach, suggesting future work

    A Near-Optimal Sampling Strategy for Sparse Recovery of Polynomial Chaos Expansions

    Full text link
    Compressive sampling has become a widely used approach to construct polynomial chaos surrogates when the number of available simulation samples is limited. Originally, these expensive simulation samples would be obtained at random locations in the parameter space. It was later shown that the choice of sample locations could significantly impact the accuracy of resulting surrogates. This motivated new sampling strategies or design-of-experiment approaches, such as coherence-optimal sampling, which aim at improving the coherence property. In this paper, we propose a sampling strategy that can identify near-optimal sample locations that lead to improvement in local-coherence property and also enhancement of cross-correlation properties of measurement matrices. We provide theoretical motivations for the proposed sampling strategy along with several numerical examples that show that our near-optimal sampling strategy produces substantially more accurate results, compared to other sampling strategies
    corecore