33 research outputs found

    A computational framework to emulate the human perspective in flow cytometric data analysis

    Get PDF
    Background: In recent years, intense research efforts have focused on developing methods for automated flow cytometric data analysis. However, while designing such applications, little or no attention has been paid to the human perspective that is absolutely central to the manual gating process of identifying and characterizing cell populations. In particular, the assumption of many common techniques that cell populations could be modeled reliably with pre-specified distributions may not hold true in real-life samples, which can have populations of arbitrary shapes and considerable inter-sample variation. <p/>Results: To address this, we developed a new framework flowScape for emulating certain key aspects of the human perspective in analyzing flow data, which we implemented in multiple steps. First, flowScape begins with creating a mathematically rigorous map of the high-dimensional flow data landscape based on dense and sparse regions defined by relative concentrations of events around modes. In the second step, these modal clusters are connected with a global hierarchical structure. This representation allows flowScape to perform ridgeline analysis for both traversing the landscape and isolating cell populations at different levels of resolution. Finally, we extended manual gating with a new capacity for constructing templates that can identify target populations in terms of their relative parameters, as opposed to the more commonly used absolute or physical parameters. This allows flowScape to apply such templates in batch mode for detecting the corresponding populations in a flexible, sample-specific manner. We also demonstrated different applications of our framework to flow data analysis and show its superiority over other analytical methods. <p/>Conclusions: The human perspective, built on top of intuition and experience, is a very important component of flow cytometric data analysis. By emulating some of its approaches and extending these with automation and rigor, flowScape provides a flexible and robust framework for computational cytomics

    Representing complex data using localized principal components with application to astronomical data

    Full text link
    Often the relation between the variables constituting a multivariate data space might be characterized by one or more of the terms: ``nonlinear'', ``branched'', ``disconnected'', ``bended'', ``curved'', ``heterogeneous'', or, more general, ``complex''. In these cases, simple principal component analysis (PCA) as a tool for dimension reduction can fail badly. Of the many alternative approaches proposed so far, local approximations of PCA are among the most promising. This paper will give a short review of localized versions of PCA, focusing on local principal curves and local partitioning algorithms. Furthermore we discuss projections other than the local principal components. When performing local dimension reduction for regression or classification problems it is important to focus not only on the manifold structure of the covariates, but also on the response variable(s). Local principal components only achieve the former, whereas localized regression approaches concentrate on the latter. Local projection directions derived from the partial least squares (PLS) algorithm offer an interesting trade-off between these two objectives. We apply these methods to several real data sets. In particular, we consider simulated astrophysical data from the future Galactic survey mission Gaia.Comment: 25 pages. In "Principal Manifolds for Data Visualization and Dimension Reduction", A. Gorban, B. Kegl, D. Wunsch, and A. Zinovyev (eds), Lecture Notes in Computational Science and Engineering, Springer, 2007, pp. 180--204, http://www.springer.com/dal/home/generic/search/results?SGWID=1-40109-22-173750210-

    Filtered kernel density estimation

    No full text

    A family of spatial biodiversity measures based on graphs

    Get PDF
    While much research in ecology has focused on spatially explicit modelling as well as on measures of biodiversity, the concept of spatial (or local) biodiversity has been discussed very little. This paper generalises existing measures of spatial biodiversity and introduces a family of spatial biodiversity measures by flexibly defining the notion of the individuals’ neighbourhood within the framework of graphs associated to a spatial point pattern. We consider two non-independent aspects of spatial biodiversity, scattering, i.e. the spatial arrangement of the individuals in the study area and exposure, the local diversity in an individual’s neighbourhood. A simulation study reveals that measures based on the most commonly used neigh-bourhood defined by the geometric graph do not distinguish well between scattering and exposure. This problem is much less pronounced when other graphs are used. In an analysis of the spatial diversity in a rainforest, the results based on the geometric graph have been shown to spuriously indicate a decrease in spatial biodiversity when no such trend was detected by the other types of neighbourhoods. We also show that the choice neighbourhood markedly impacts on the classification of species according to how strongly and in what way different species spatially structure species diversity. Clearly, in an analysis of spatial or local diversity an appropriate choice of local neighbourhood is crucial in particular in terms of the biological interpretation of the results. Due to its general definition, the approach discussed here offers the necessary flexibility that allows suitable and varying neighbourhood structures to be chosen.PostprintPeer reviewe
    corecore