3,589 research outputs found

    Nonparametric likelihood based estimation of linear filters for point processes

    Full text link
    We consider models for multivariate point processes where the intensity is given nonparametrically in terms of functions in a reproducing kernel Hilbert space. The likelihood function involves a time integral and is consequently not given in terms of a finite number of kernel evaluations. The main result is a representation of the gradient of the log-likelihood, which we use to derive computable approximations of the log-likelihood and the gradient by time discretization. These approximations are then used to minimize the approximate penalized log-likelihood. For time and memory efficiency the implementation relies crucially on the use of sparse matrices. As an illustration we consider neuron network modeling, and we use this example to investigate how the computational costs of the approximations depend on the resolution of the time discretization. The implementation is available in the R package ppstat.Comment: 10 pages, 3 figure

    Challenges of Big Data Analysis

    Full text link
    Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article give overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasis on the viability of the sparsest solution in high-confidence set and point out that exogeneous assumptions in most statistical methods for Big Data can not be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions

    Robust Subspace Learning: Robust PCA, Robust Subspace Tracking, and Robust Subspace Recovery

    Full text link
    PCA is one of the most widely used dimension reduction techniques. A related easier problem is "subspace learning" or "subspace estimation". Given relatively clean data, both are easily solved via singular value decomposition (SVD). The problem of subspace learning or PCA in the presence of outliers is called robust subspace learning or robust PCA (RPCA). For long data sequences, if one tries to use a single lower dimensional subspace to represent the data, the required subspace dimension may end up being quite large. For such data, a better model is to assume that it lies in a low-dimensional subspace that can change over time, albeit gradually. The problem of tracking such data (and the subspaces) while being robust to outliers is called robust subspace tracking (RST). This article provides a magazine-style overview of the entire field of robust subspace learning and tracking. In particular solutions for three problems are discussed in detail: RPCA via sparse+low-rank matrix decomposition (S+LR), RST via S+LR, and "robust subspace recovery (RSR)". RSR assumes that an entire data vector is either an outlier or an inlier. The S+LR formulation instead assumes that outliers occur on only a few data vector indices and hence are well modeled as sparse corruptions.Comment: To appear, IEEE Signal Processing Magazine, July 201
    corecore