50 research outputs found

    Limitations of Principal Component Analysis for Dimensionality-Reduction for Classification of Hyperspectral Data

    Get PDF
    It is a popular practice in the remote-sensing community to apply principal component analysis (PCA) on a higher-dimensional feature space to achieve dimensionality-reduction. Several factors that have led to the popularity of PCA include its simplicity, ease of use, availability as part of popular remote-sensing packages, and optimal nature in terms of mean square error. These advantages have prompted the remote-sensing research community to overlook many limitations of PCA when used as a dimensionality-reduction tool for classification and target-detection applications. This thesis addresses the limitations of PCA when used as a dimensionality-reduction technique for extracting discriminating features from hyperspectral data. Theoretical and experimental analyses are presented to demonstrate that PCA is not necessarily an appropriate feature-extraction method for high-dimensional data when the objective is classification or target-recognition. The influence of certain data-distribution characteristics, such as within-class covariance, between-class covariance, and correlation on PCA transformation, is analyzed in this thesis. The classification accuracies obtained using PCA features are compared to accuracies obtained using other feature-extraction methods like variants of Karhunen-Loève transform and greedy search algorithms on spectral and wavelet domains. Experimental analyses are conducted for both two-class and multi-class cases. The classification accuracies obtained from higher-order PCA components are compared to the classification accuracies of features extracted from different regions of the spectrum. The comparative study done on the classification accuracies that are obtained using above feature-extraction methods, ascertain that PCA may not be an appropriate tool for dimensionality-reduction of certain hyperspectral data-distributions, when the objective is classification or target-recognition

    Galaxy morphological classification in deep-wide surveys via unsupervised machine learning

    Get PDF
    Accepted versionGalaxy morphology is a fundamental quantity, that is essential not only for the full spectrum of galaxy-evolution studies, but also for a plethora of science in observational cosmology. While a rich literature exists on morphological-classification techniques, the unprecedented data volumes, coupled, in some cases, with the short cadences of forthcoming 'Big-Data' surveys (e.g. from the LSST), present novel challenges for this field. Large data volumes make such datasets intractable for visual inspection (even via massively-distributed platforms like Galaxy Zoo), while short cadences make it difficult to employ techniques like supervised machine-learning, since it may be impractical to repeatedly produce training sets on short timescales. Unsupervised machine learning, which does not require training sets, is ideally suited to the morphological analysis of new and forthcoming surveys. Here, we employ an algorithm that performs clustering of graph representations, in order to group image patches with similar visual properties and objects constructed from those patches, like galaxies. We implement the algorithm on the Hyper-Suprime-Cam Subaru-Strategic-Program Ultra-Deep survey, to autonomously reduce the galaxy population to a small number (160) of 'morphological clusters', populated by galaxies with similar morphologies, which are then benchmarked using visual inspection. The morphological classifications (which we release publicly) exhibit a high level of purity, and reproduce known trends in key galaxy properties as a function of morphological type at zPeer reviewe

    Automatically determining dominant motions in crowded scenes by clustering partial feature trajectories

    No full text
    We present a system for automatically identifying dominant motions in a crowded scene. Accurately tracking individual objects in such scenes is difficult due to inter- and intra-object occlusions that cannot be easily resolved. Our approach begins by independently tracking low-level features using optical flow. While many of the feature point tracks are unreliable, we show that they can be clustered into dominant motions using a distance measure for feature trajectories based on longest common subsequences. Results on real video sequences demonstrate that the approach can successfully identify both dominant and anomalous motions in crowded scenes. These fully-automatic algorithms could be easily incorporated into distributed camera networks for autonomous scene analysis

    Using video-analysis technology to estimate social mixing and simulate influenza transmission at a mass gathering

    No full text
    Mass gatherings create settings conducive to infectious disease transmission. Empirical data to model infectious disease transmission at mass gatherings are limited. Video-analysis technology could be used to generate data on social mixing patterns needed for simulating influenza transmission at mass gatherings. We analyzed short video recordings of persons attending the GameFest event at a university in Troy, New York, in April 2013 to demonstrate the feasibility of this approach. Attendees were identified and tracked during three randomly selected time periods using an object-tracking algorithm. Tracks were analyzed to calculate the number and duration of unique pairwise contacts. A contact occurred each time two attendees were within 2 m of each other. We built and tested an agent-based stochastic influenza simulation model assuming two scenarios of mixing patterns in a geospatially accurate representation of the event venue —one calibrated to the mean cumulative contact duration estimated from GameFest video recordings and the other using a uniform mixing pattern. We compared one-hour attack rates (i.e., becoming infected) generated from these two scenarios following the introduction of a single infectious seed. Across the video recordings, 278 attendees were identified and tracked, resulting in 1,247 unique pairwise contacts with a cumulative mean contact duration of 74.76 s (SD: 80.71). The one-hour simulated mean attack rates were 2.17 % (95 % CI:1.45 – 2.82) and 0.21 % (95 % CI: 0.14 – 0.28) in the calibrated and uniform mixing model scenarios, respectively. We simulated influenza transmission at the GameFest event using social mixing data objectively captured through video-analysis technology. Microlevel geospatially accurate simulations can be used to assess the layout of event venues on social mixing and disease transmission. Future work can expand on this demonstration project to larger spatial and temporal scenes in more diverse settings
    corecore