1,114 research outputs found
Clustering environmental flow cytometry data by searching density peaks
Microbial single cells can be characterized by their phenotypic properties using flow cytometry. Therefore flow cytometry can be used to analyze various aspects of environmental microbial communities. In recent years, researchers have focused on fully exploiting the multivariate data that such analyses generate. As they are interested in the diversity of an environmental sample, we need a proper estimation of the number of species and their abundances. We modified a recently published algorithm to estimate the microbial diversity based on flow cytometry data. After giving a brief sketch of the problem setup, we will review this algorithm alongside its various implementations. Moreover we will present our current implementation combined with future challenges we foresee
Understanding Health and Disease with Multidimensional Single-Cell Methods
Current efforts in the biomedical sciences and related interdisciplinary
fields are focused on gaining a molecular understanding of health and disease,
which is a problem of daunting complexity that spans many orders of magnitude
in characteristic length scales, from small molecules that regulate cell
function to cell ensembles that form tissues and organs working together as an
organism. In order to uncover the molecular nature of the emergent properties
of a cell, it is essential to measure multiple cell components simultaneously
in the same cell. In turn, cell heterogeneity requires multiple cells to be
measured in order to understand health and disease in the organism. This review
summarizes current efforts towards a data-driven framework that leverages
single-cell technologies to build robust signatures of healthy and diseased
phenotypes. While some approaches focus on multicolor flow cytometry data and
other methods are designed to analyze high-content image-based screens, we
emphasize the so-called Supercell/SVM paradigm (recently developed by the
authors of this review and collaborators) as a unified framework that captures
mesoscopic-scale emergence to build reliable phenotypes. Beyond their specific
contributions to basic and translational biomedical research, these efforts
illustrate, from a larger perspective, the powerful synergy that might be
achieved from bringing together methods and ideas from statistical physics,
data mining, and mathematics to solve the most pressing problems currently
facing the life sciences.Comment: 25 pages, 7 figures; revised version with minor changes. To appear in
J. Phys.: Cond. Mat
Computational and Systems Biology Advances to Enable Bioagent-Agnostic Signatures
Enumerated threat agent lists have long driven biodefense priorities. The
global SARS-CoV-2 pandemic demonstrated the limitations of searching for known
threat agents as compared to a more agnostic approach. Recent technological
advances are enabling agent-agnostic biodefense, especially through the
integration of multi-modal observations of host-pathogen interactions directed
by a human immunological model. Although well-developed technical assays exist
for many aspects of human-pathogen interaction, the analytic methods and
pipelines to combine and holistically interpret the results of such assays are
immature and require further investments to exploit new technologies. In this
manuscript, we discuss potential immunologically based bioagent-agnostic
approaches and the computational tool gaps the community should prioritize
filling
Machine learning in marine ecology: an overview of techniques and applications
Machine learning covers a large set of algorithms that can be trained to identify patterns in data. Thanks to the increase in the amount of data and computing power available, it has become pervasive across scientific disciplines. We first highlight why machine learning is needed in marine ecology. Then we provide a quick primer on machine learning techniques and vocabulary. We built a database of ∼1000 publications that implement such techniques to analyse marine ecology data. For various data types (images, optical spectra, acoustics, omics, geolocations, biogeochemical profiles, and satellite imagery), we present a historical perspective on applications that proved influential, can serve as templates for new work, or represent the diversity of approaches. Then, we illustrate how machine learning can be used to better understand ecological systems, by combining various sources of marine data. Through this coverage of the literature, we demonstrate an increase in the proportion of marine ecology studies that use machine learning, the pervasiveness of images as a data source, the dominance of machine learning for classification-type problems, and a shift towards deep learning for all data types. This overview is meant to guide researchers who wish to apply machine learning methods to their marine datasets.Machine learning in marine ecology: an overview of techniques and applicationspublishedVersio
Challenges in the Multivariate Analysis of Mass Cytometry Data: The Effect of Randomization
Cytometry by time-of-flight (CyTOF) has emerged as a high-throughput single cell
technology able to provide large samples of protein readouts. Already, there exists a
large pool of advanced high-dimensional analysis algorithms that explore the observed
heterogeneous distributions making intriguing biological inferences. A fact largely
overlooked by these methods, however, is the effect of the established data
preprocessing pipeline to the distributions of the measured quantities. In this article,
we focus on randomization, a transformation used for improving data visualization,
which can negatively affect multivariate data analysis methods such as dimensionality
reduction, clustering, and network reconstruction algorithms. Our results indicate that
randomization should be used only for visualization purposes, but not in conjunction
with high-dimensional analytical tools
Chronic helminth infection burden differentially affects haematopoietic cell development while ageing selectively impairs adaptive responses to infection
Throughout the lifespan of an individual, the immune system undergoes complex changes while facing novel and chronic infections. Helminths, which infect over one billion people and impose heavy livestock productivity losses, typically cause chronic infections by avoiding and suppressing host immunity. Yet, how age affects immune responses to lifelong parasitic infection is poorly understood. To disentangle the processes involved, we employed supervised statistical learning techniques to identify which factors among haematopoietic stem and progenitor cells (HSPC), and both innate and adaptive responses regulate parasite burdens and how they are affected by host age. Older mice harboured greater numbers of the parasites’ offspring than younger mice. Protective immune responses that did not vary with age were dominated by HSPC, while ageing specifically eroded adaptive immunity, with reduced numbers of naïve T cells, poor T cell responsiveness to parasites, and impaired antibody production. We identified immune factors consistent with previously-reported immune responses to helminths, and also revealed novel interactions between helminths and HSPC maturation. Our approach thus allowed disentangling the concurrent effects of ageing and infection across the full maturation cycle of the immune response and highlights the potential of such approaches to improve understanding of the immune system within the whole organism
A Kernel-Based Change Detection Method to Map Shifts in Phytoplankton Communities Measured by Flow Cytometry
1. Automated, ship-board flow cytometers provide high-resolution maps of phytoplankton composition over large swaths of the world\u27s oceans. They therefore pave the way for understanding how environmental conditions shape community structure. Identification of community changes along a cruise transect commonly segments the data into distinct regions. However, existing segmentation methods are generally not applicable to flow cytometry data, as these data are recorded as ‘point cloud’ data, with hundreds or thousands of particles measured during each time interval. Moreover, nonparametric segmentation methods that do not rely on prior knowledge of the number of species are desirable to map community shifts.
2. We present CytoSegmenter, a kernel-based change-point estimation method for segmenting point cloud data. Our method allows us to represent and summarize a point cloud of data points by a single element in a Hilbert space. The change-point locations can be found using a fast dynamic programming algorithm.
3. Through an analysis of 12 cruises, we demonstrate that CytoSegmenter allows us to locate abrupt changes in phytoplankton community structure. We show that the changes in community structure generally coincide with changes in the temperature and salinity of the ocean. We also illustrate how the main parameter of CytoSegmenter can be easily calibrated using limited auxiliary annotated data.
4. CytoSegmenter is generally applicable for segmenting series of point cloud data from any domain. Moreover, it readily scales to thousands of point clouds, each containing thousands of points. In the context of flow cytometry data collected during research cruises, it does not require prior clustering of particles to define taxa labels, eliminating a potential source of error. This represents an important advance in automating the analysis of large datasets now emerging in biological oceanography and other fields. It also allows for the approach to be applied during research cruises
- …