2,593 research outputs found

    Data reduction for spectral clustering to analyze high throughput flow cytometry data

    Get PDF
    Background: Recent biological discoveries have shown that clustering large datasets is essential for better understanding biology in many areas. Spectral clustering in particular has proven to be a powerful tool amenable for many applications. However, it cannot be directly applied to large datasets due to time and memory limitations. To address this issue, we have modified spectral clustering by adding an information preserving sampling procedure and applying a post-processing stage. We call this entire algorithm SamSPECTRAL.Results: We tested our algorithm on flow cytometry data as an example of large, multidimensional data containing potentially hundreds of thousands of data points (i.e., events in flow cytometry, typically corresponding to cells). Compared to two state of the art model-based flow cytometry clustering methods, SamSPECTRAL demonstrates significant advantages in proper identification of populations with non-elliptical shapes, low density populations close to dense ones, minor subpopulations of a major population and rare populations.Conclusions: This work is the first successful attempt to apply spectral methodology on flow cytometry data. An implementation of our algorithm as an R package is freely available through BioConductor. © 2010 Zare et al; licensee BioMed Central Ltd

    Understanding Health and Disease with Multidimensional Single-Cell Methods

    Full text link
    Current efforts in the biomedical sciences and related interdisciplinary fields are focused on gaining a molecular understanding of health and disease, which is a problem of daunting complexity that spans many orders of magnitude in characteristic length scales, from small molecules that regulate cell function to cell ensembles that form tissues and organs working together as an organism. In order to uncover the molecular nature of the emergent properties of a cell, it is essential to measure multiple cell components simultaneously in the same cell. In turn, cell heterogeneity requires multiple cells to be measured in order to understand health and disease in the organism. This review summarizes current efforts towards a data-driven framework that leverages single-cell technologies to build robust signatures of healthy and diseased phenotypes. While some approaches focus on multicolor flow cytometry data and other methods are designed to analyze high-content image-based screens, we emphasize the so-called Supercell/SVM paradigm (recently developed by the authors of this review and collaborators) as a unified framework that captures mesoscopic-scale emergence to build reliable phenotypes. Beyond their specific contributions to basic and translational biomedical research, these efforts illustrate, from a larger perspective, the powerful synergy that might be achieved from bringing together methods and ideas from statistical physics, data mining, and mathematics to solve the most pressing problems currently facing the life sciences.Comment: 25 pages, 7 figures; revised version with minor changes. To appear in J. Phys.: Cond. Mat

    A computational framework to emulate the human perspective in flow cytometric data analysis

    Get PDF
    Background: In recent years, intense research efforts have focused on developing methods for automated flow cytometric data analysis. However, while designing such applications, little or no attention has been paid to the human perspective that is absolutely central to the manual gating process of identifying and characterizing cell populations. In particular, the assumption of many common techniques that cell populations could be modeled reliably with pre-specified distributions may not hold true in real-life samples, which can have populations of arbitrary shapes and considerable inter-sample variation. <p/>Results: To address this, we developed a new framework flowScape for emulating certain key aspects of the human perspective in analyzing flow data, which we implemented in multiple steps. First, flowScape begins with creating a mathematically rigorous map of the high-dimensional flow data landscape based on dense and sparse regions defined by relative concentrations of events around modes. In the second step, these modal clusters are connected with a global hierarchical structure. This representation allows flowScape to perform ridgeline analysis for both traversing the landscape and isolating cell populations at different levels of resolution. Finally, we extended manual gating with a new capacity for constructing templates that can identify target populations in terms of their relative parameters, as opposed to the more commonly used absolute or physical parameters. This allows flowScape to apply such templates in batch mode for detecting the corresponding populations in a flexible, sample-specific manner. We also demonstrated different applications of our framework to flow data analysis and show its superiority over other analytical methods. <p/>Conclusions: The human perspective, built on top of intuition and experience, is a very important component of flow cytometric data analysis. By emulating some of its approaches and extending these with automation and rigor, flowScape provides a flexible and robust framework for computational cytomics

    Single Cell Proteomics in Biomedicine: High-dimensional Data Acquisition, Visualization and Analysis

    Get PDF
    New insights on cellular heterogeneity in the last decade provoke the development of a variety of single cell omics tools at a lightning pace. The resultant high-dimensional single cell data generated by these tools require new theoretical approaches and analytical algorithms for effective visualization and interpretation. In this review, we briefly survey the state-of-the-art single cell proteomic tools with a particular focus on data acquisition and quantification, followed by an elaboration of a number of statistical and computational approaches developed to date for dissecting the high-dimensional single cell data. The underlying assumptions, unique features, and limitations of the analytical methods with the designated biological questions they seek to answer will be discussed. Particular attention will be given to those information theoretical approaches that are anchored in a set of first principles of physics and can yield detailed (and often surprising) predictions

    Development of machine learning techniques for flow cytometry data

    Get PDF

    Computational approaches in high-throughput proteomics data analysis

    Get PDF
    Proteins are key components in biological systems as they mediate the signaling responsible for information processing in a cell and organism. In biomedical research, one goal is to elucidate the mechanisms of cellular signal transduction pathways to identify possible defects that cause disease. Advancements in technologies such as mass spectrometry and flow cytometry enable the measurement of multiple proteins from a system. Proteomics, or the large-scale study of proteins of a system, thus plays an important role in biomedical research. The analysis of all high-throughput proteomics data requires the use of advanced computational methods. Thus, the combination of bioinformatics and proteomics has become an important part in research of signal transduction pathways. The main objective in this study was to develop and apply computational methods for the preprocessing, analysis and interpretation of high-throughput proteomics data. The methods focused on data from tandem mass spectrometry and single cell flow cytometry, and integration of proteomics data with gene expression microarray data and information from various biological databases. Overall, the methods developed and applied in this study have led to new ways of management and preprocessing of proteomics data. Additionally, the available tools have successfully been used to help interpret biomedical data and to facilitate analysis of data that would have been cumbersome to do without the use of computational methods.Proteiineilla on tärkeä merkitys biologisissa systeemeissä sillä ne koordinoivat erilaisia solujen ja organismien prosesseja. Yksi biolääketieteellisen tutkimuksen tavoitteista on valottaa solujen viestintäreittejä ja niiden toiminnassa tapahtuvia muutoksia eri sairauksien yhteydessä, jotta tällaisia muutoksia voitaisiin korjata. Proteomiikka on proteiinien laajamittaista tutkimista solusta, kudoksesta tai organismista. Proteomiikan menetelmät kuten massaspektrometria ja virtaussytometria ovat keskeisiä biolääketieteellisen tutkimuksen menetelmiä, joilla voidaan mitata näytteestä samanaikaisesti useita proteiineja. Nykyajan kehittyneet proteomiikan mittausteknologiat tuottavat suuria tulosaineistoja ja edellyttävät laskennallisten menetelmien käyttöä aineiston analyysissä. Bioinformatiikan menetelmät ovatkin nousseet tärkeäksi osaksi proteomiikka-analyysiä ja viestintäreittien tutkimusta. Tämän tutkimuksen päätavoite oli kehittää ja soveltaa tehokkaita laskennallisia menetelmiä laajamittaisten proteomiikka-aineistojen esikäsittelyyn, analyysiin ja tulkintaan. Tässä tutkimuksessa kehitettiin esikäsittelymenetelmä massaspektrometria-aineistolle sekä automatisoitu analyysimenetelmä virtaussytometria-aineistolle. Proteiinitason tietoa yhdistettiin mittauksiin geenien transkriptiotasoista ja olemassaolevaan biologisista tietokannoista poimittuun tietoon. Väitöskirjatyö osoittaa, että laskennallisilla menetelmillä on keskeinen merkitys proteomiikan aineistojen hallinnassa, esikäsittelyssä ja analyysissä. Tutkimuksessa kehitetyt analyysimenetelmät edistävät huomattavasti biolääketieteellisen tiedon laajempaa hyödyntämistä ja ymmärtämistä

    Flow cytometry data standards

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Flow cytometry is a widely used analytical technique for examining microscopic particles, such as cells. The Flow Cytometry Standard (FCS) was developed in 1984 for storing flow data and it is supported by all instrument and third party software vendors. However, FCS does not capture the full scope of flow cytometry (FCM)-related data and metadata, and data standards have recently been developed to address this shortcoming.</p> <p>Findings</p> <p>The Data Standards Task Force (DSTF) of the International Society for the Advancement of Cytometry (ISAC) has developed several data standards to complement the raw data encoded in FCS files. Efforts started with the Minimum Information about a Flow Cytometry Experiment, a minimal data reporting standard of details necessary to include when publishing FCM experiments to facilitate third party understanding. MIFlowCyt is now being recommended to authors by publishers as part of manuscript submission, and manuscripts are being checked by reviewers and editors for compliance. Gating-ML was then introduced to capture gating descriptions - an essential part of FCM data analysis describing the selection of cell populations of interest. The Classification Results File Format was developed to accommodate results of the gating process, mostly within the context of automated clustering. Additionally, the Archival Cytometry Standard bundles data with all the other components describing experiments. Here, we introduce these recent standards and provide the very first example of how they can be used to report FCM data including analysis and results in a standardized, computationally exchangeable form.</p> <p>Conclusions</p> <p>Reporting standards and open file formats are essential for scientific collaboration and independent validation. The recently developed FCM data standards are now being incorporated into third party software tools and data repositories, which will ultimately facilitate understanding and data reuse.</p
    • …
    corecore