Search CORE

1,099 research outputs found

Sparse Matrix-based Random Projection for Classification

Author: Kpalma Kidiyo
Li Weiyu
Lu Weizhi
Ronsin Joseph
Publication venue
Publication date: 12/12/2013
Field of study

As a typical dimensionality reduction technique, random projection can be simply implemented with linear projection, while maintaining the pairwise distances of high-dimensional data with high probability. Considering this technique is mainly exploited for the task of classification, this paper is developed to study the construction of random matrix from the viewpoint of feature selection, rather than of traditional distance preservation. This yields a somewhat surprising theoretical result, that is, the sparse random matrix with exactly one nonzero element per column, can present better feature selection performance than other more dense matrices, if the projection dimension is sufficiently large (namely, not much smaller than the number of feature elements); otherwise, it will perform comparably to others. For random projection, this theoretical result implies considerable improvement on both complexity and performance, which is widely confirmed with the classification experiments on both synthetic data and real data

arXiv.org e-Print Archive

HAL-Rennes 1

Challenges of Big Data Analysis

Author: Fan Jianqing
Han Fang
Liu Han
Publication venue: 'Oxford University Press (OUP)'
Publication date: 06/02/2014
Field of study

Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article give overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasis on the viability of the sparsest solution in high-confidence set and point out that exogeneous assumptions in most statistical methods for Big Data can not be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions

arXiv.org e-Print Archive

CiteSeerX

Princeton University Open Access Repository

Illumination strategies for intensity-only imaging

Author: Moscoso Miguel
Novikov Alexei
Papanicolaou George
Publication venue
Publication date: 10/11/2014
Field of study

We propose a new strategy for narrow band, active array imaging of localized scat- terers when only the intensities are recorded and measured at the array. We consider a homogeneous medium so that wave propagation is fully coherent. We show that imaging with intensity-only measurements can be carried out using the time reversal operator of the imaging system, which can be obtained from intensity measurements using an appropriate illumination strategy and the polarization identity. Once the time reversal operator has been obtained, we show that the images can be formed using its singular value decomposition (SVD). We use two SVD-based methods to image the scatterers. The proposed approach is simple and efficient. It does not need prior information about the sought image, and guarantees exact recovery in the noise-free case. Furthermore, it is robust with respect to additive noise. Detailed numerical simulations illustrate the performance of the proposed imaging strategy when only the intensities are captured

arXiv.org e-Print Archive

CiteSeerX