13,532 research outputs found

    On asymptotics of ICA estimators and their performance indices

    Full text link
    Independent component analysis (ICA) has become a popular multivariate analysis and signal processing technique with diverse applications. This paper is targeted at discussing theoretical large sample properties of ICA unmixing matrix functionals. We provide a formal definition of unmixing matrix functional and consider two popular estimators in detail: the family based on two scatter matrices with the independence property (e.g., FOBI estimator) and the family of deflation-based fastICA estimators. The limiting behavior of the corresponding estimates is discussed and the asymptotic normality of the deflation-based fastICA estimate is proven under general assumptions. Furthermore, properties of several performance indices commonly used for comparison of different unmixing matrix estimates are discussed and a new performance index is proposed. The proposed index fullfills three desirable features which promote its use in practice and distinguish it from others. Namely, the index possesses an easy interpretation, is fast to compute and its asymptotic properties can be inferred from asymptotics of the unmixing matrix estimate. We illustrate the derived asymptotical results and the use of the proposed index with a small simulation study

    Space Warps II. New Gravitational Lens Candidates from the CFHTLS Discovered through Citizen Science

    Get PDF
    We report the discovery of 29 promising (and 59 total) new lens candidates from the CFHT Legacy Survey (CFHTLS) based on about 11 million classifications performed by citizen scientists as part of the first Space Warps lens search. The goal of the blind lens search was to identify lens candidates missed by robots (the RingFinder on galaxy scales and ArcFinder on group/cluster scales) which had been previously used to mine the CFHTLS for lenses. We compare some properties of the samples detected by these algorithms to the Space Warps sample and find them to be broadly similar. The image separation distribution calculated from the Space Warps sample shows that previous constraints on the average density profile of lens galaxies are robust. SpaceWarps recovers about 65% of known lenses, while the new candidates show a richer variety compared to those found by the two robots. This detection rate could be increased to 80% by only using classifications performed by expert volunteers (albeit at the cost of a lower purity), indicating that the training and performance calibration of the citizen scientists is very important for the success of Space Warps. In this work we present the SIMCT pipeline, used for generating in situ a sample of realistic simulated lensed images. This training sample, along with the false positives identified during the search, has a legacy value for testing future lens finding algorithms. We make the pipeline and the training set publicly available.Comment: 23 pages, 12 figures, MNRAS accepted, minor to moderate changes in this versio

    Fourth Moments and Independent Component Analysis

    Full text link
    In independent component analysis it is assumed that the components of the observed random vector are linear combinations of latent independent random variables, and the aim is then to find an estimate for a transformation matrix back to these independent components. In the engineering literature, there are several traditional estimation procedures based on the use of fourth moments, such as FOBI (fourth order blind identification), JADE (joint approximate diagonalization of eigenmatrices), and FastICA, but the statistical properties of these estimates are not well known. In this paper various independent component functionals based on the fourth moments are discussed in detail, starting with the corresponding optimization problems, deriving the estimating equations and estimation algorithms, and finding asymptotic statistical properties of the estimates. Comparisons of the asymptotic variances of the estimates in wide independent component models show that in most cases JADE and the symmetric version of FastICA perform better than their competitors.Comment: Published at http://dx.doi.org/10.1214/15-STS520 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Robust Machine Learning Applied to Astronomical Datasets I: Star-Galaxy Classification of the SDSS DR3 Using Decision Trees

    Get PDF
    We provide classifications for all 143 million non-repeat photometric objects in the Third Data Release of the Sloan Digital Sky Survey (SDSS) using decision trees trained on 477,068 objects with SDSS spectroscopic data. We demonstrate that these star/galaxy classifications are expected to be reliable for approximately 22 million objects with r < ~20. The general machine learning environment Data-to-Knowledge and supercomputing resources enabled extensive investigation of the decision tree parameter space. This work presents the first public release of objects classified in this way for an entire SDSS data release. The objects are classified as either galaxy, star or nsng (neither star nor galaxy), with an associated probability for each class. To demonstrate how to effectively make use of these classifications, we perform several important tests. First, we detail selection criteria within the probability space defined by the three classes to extract samples of stars and galaxies to a given completeness and efficiency. Second, we investigate the efficacy of the classifications and the effect of extrapolating from the spectroscopic regime by performing blind tests on objects in the SDSS, 2dF Galaxy Redshift and 2dF QSO Redshift (2QZ) surveys. Given the photometric limits of our spectroscopic training data, we effectively begin to extrapolate past our star-galaxy training set at r ~ 18. By comparing the number counts of our training sample with the classified sources, however, we find that our efficiencies appear to remain robust to r ~ 20. As a result, we expect our classifications to be accurate for 900,000 galaxies and 6.7 million stars, and remain robust via extrapolation for a total of 8.0 million galaxies and 13.9 million stars. [Abridged]Comment: 27 pages, 12 figures, to be published in ApJ, uses emulateapj.cl
    • …
    corecore