584 research outputs found
Nested effects models for high-dimensional phenotyping screens
Motivation: In high-dimensional phenotyping screens, a large number of cellular features is observed after perturbing genes by knockouts or RNA interference. Comprehensive analysis of perturbation effects is one of the most powerful techniques for attributing functions to genes, but not much work has been done so far to adapt statistical and computational methodology to the specific needs of large-scale and high-dimensional phenotyping screens.
Results: We introduce and compare probabilistic methods to efficiently infer a genetic hierarchy from the nested structure of observed perturbation effects. These hierarchies elucidate the structures of signaling pathways and regulatory networks. Our methods achieve two goals: (1) they reveal clusters of genes with highly similar phenotypic profiles, and (2) they order (clusters of) genes according to subset relationships between phenotypes. We evaluate our algorithms in the controlled setting of simulation studies and show their practical use in two experimental scenarios: (1) a data set investigating the response to microbial challenge in Drosophila melanogaster, and (2) a compendium of expression profiles of Saccharomyces cerevisiae knockout strains. We show that our methods identify biologically justified genetic hierarchies of perturbation effects.
Availability: The software used in our analysis is freely available in the R package ‘nem’ from www.bioconductor.or
Structure Learning in Nested Effects Models
Nested Effects Models (NEMs) are a class of graphical models introduced to
analyze the results of gene perturbation screens. NEMs explore noisy subset
relations between the high-dimensional outputs of phenotyping studies, e.g. the
effects showing in gene expression profiles or as morphological features of the
perturbed cell.
In this paper we expand the statistical basis of NEMs in four directions:
First, we derive a new formula for the likelihood function of a NEM, which
generalizes previous results for binary data. Second, we prove model
identifiability under mild assumptions. Third, we show that the new formulation
of the likelihood allows to efficiently traverse model space. Fourth, we
incorporate prior knowledge and an automated variable selection criterion to
decrease the influence of noise in the data
How to understand the cell by breaking it: network analysis of gene perturbation screens
Modern high-throughput gene perturbation screens are key technologies at the
forefront of genetic research. Combined with rich phenotypic descriptors they
enable researchers to observe detailed cellular reactions to experimental
perturbations on a genome-wide scale. This review surveys the current
state-of-the-art in analyzing perturbation screens from a network point of
view. We describe approaches to make the step from the parts list to the wiring
diagram by using phenotypes for network inference and integrating them with
complementary data sources. The first part of the review describes methods to
analyze one- or low-dimensional phenotypes like viability or reporter activity;
the second part concentrates on high-dimensional phenotypes showing global
changes in cell morphology, transcriptome or proteome.Comment: Review based on ISMB 2009 tutorial; after two rounds of revisio
Inferring Regulatory Networks by Combining Perturbation Screens and Steady State Gene Expression Profiles
Reconstructing transcriptional regulatory networks is an important task in
functional genomics. Data obtained from experiments that perturb genes by
knockouts or RNA interference contain useful information for addressing this
reconstruction problem. However, such data can be limited in size and/or are
expensive to acquire. On the other hand, observational data of the organism in
steady state (e.g. wild-type) are more readily available, but their
informational content is inadequate for the task at hand. We develop a
computational approach to appropriately utilize both data sources for
estimating a regulatory network. The proposed approach is based on a three-step
algorithm to estimate the underlying directed but cyclic network, that uses as
input both perturbation screens and steady state gene expression data. In the
first step, the algorithm determines causal orderings of the genes that are
consistent with the perturbation data, by combining an exhaustive search method
with a fast heuristic that in turn couples a Monte Carlo technique with a fast
search algorithm. In the second step, for each obtained causal ordering, a
regulatory network is estimated using a penalized likelihood based method,
while in the third step a consensus network is constructed from the highest
scored ones. Extensive computational experiments show that the algorithm
performs well in reconstructing the underlying network and clearly outperforms
competing approaches that rely only on a single data source. Further, it is
established that the algorithm produces a consistent estimate of the regulatory
network.Comment: 24 pages, 4 figures, 6 table
Getting started in probabilistic graphical models
Probabilistic graphical models (PGMs) have become a popular tool for
computational analysis of biological data in a variety of domains. But, what
exactly are they and how do they work? How can we use PGMs to discover patterns
that are biologically relevant? And to what extent can PGMs help us formulate
new hypotheses that are testable at the bench? This note sketches out some
answers and illustrates the main ideas behind the statistical approach to
biological pattern discovery.Comment: 12 pages, 1 figur
DRUG-NEM: Optimizing drug combinations using single-cell perturbation response to account for intratumoral heterogeneity.
An individual malignant tumor is composed of a heterogeneous collection of single cells with distinct molecular and phenotypic features, a phenomenon termed intratumoral heterogeneity. Intratumoral heterogeneity poses challenges for cancer treatment, motivating the need for combination therapies. Single-cell technologies are now available to guide effective drug combinations by accounting for intratumoral heterogeneity through the analysis of the signaling perturbations of an individual tumor sample screened by a drug panel. In particular, Mass Cytometry Time-of-Flight (CyTOF) is a high-throughput single-cell technology that enables the simultaneous measurements of multiple ([Formula: see text]40) intracellular and surface markers at the level of single cells for hundreds of thousands of cells in a sample. We developed a computational framework, entitled Drug Nested Effects Models (DRUG-NEM), to analyze CyTOF single-drug perturbation data for the purpose of individualizing drug combinations. DRUG-NEM optimizes drug combinations by choosing the minimum number of drugs that produce the maximal desired intracellular effects based on nested effects modeling. We demonstrate the performance of DRUG-NEM using single-cell drug perturbation data from tumor cell lines and primary leukemia samples
Morphological Profiling for Drug Discovery in the Era of Deep Learning
Morphological profiling is a valuable tool in phenotypic drug discovery. The
advent of high-throughput automated imaging has enabled the capturing of a wide
range of morphological features of cells or organisms in response to
perturbations at the single-cell resolution. Concurrently, significant advances
in machine learning and deep learning, especially in computer vision, have led
to substantial improvements in analyzing large-scale high-content images at
high-throughput. These efforts have facilitated understanding of compound
mechanism-of-action (MOA), drug repurposing, characterization of cell
morphodynamics under perturbation, and ultimately contributing to the
development of novel therapeutics. In this review, we provide a comprehensive
overview of the recent advances in the field of morphological profiling. We
summarize the image profiling analysis workflow, survey a broad spectrum of
analysis strategies encompassing feature engineering- and deep learning-based
approaches, and introduce publicly available benchmark datasets. We place a
particular emphasis on the application of deep learning in this pipeline,
covering cell segmentation, image representation learning, and multimodal
learning. Additionally, we illuminate the application of morphological
profiling in phenotypic drug discovery and highlight potential challenges and
opportunities in this field.Comment: 44 pages, 5 figure, 5 table
Reconstructing evolving signalling networks by hidden Markov nested effects models
Inferring time-varying networks is important to understand the development and evolution of interactions over time. However, the vast majority of currently used models assume direct measurements of node states, which are often difficult to obtain, especially in fields like cell biology, where perturbation experiments often only provide indirect information of network structure. Here we propose hidden Markov nested effects models (HM-NEMs) to model the evolving network by a Markov chain on a state space of signalling networks, which are derived from nested effects models (NEMs) of indirect perturbation data. To infer the hidden network evolution and unknown parameter, a Gibbs sampler is developed, in which sampling network structure is facilitated by a novel structural Metropolis–Hastings algorithm. We demonstrate the potential of HM-NEMs by simulations on synthetic time-series perturbation data. We also show the applicability of HM-NEMs in two real biological case studies, in one capturing dynamic crosstalk during the progression of neutrophil polarisation, and in the other inferring an evolving network underlying early differentiation of mouse embryonic stem cells.This is the final published manuscript, originally published by The Annals of Applied Statistics here: http://projecteuclid.org/euclid.aoas/1396966294
Emerging Paradigms in Genomics-Based Crop Improvement
Next generation sequencing platforms and high-throughput genotyping assays have remarkably expedited the pace of development of genomic tools and resources for several crops. Complementing the technological developments, conceptual shifts have also been witnessed in designing experimental populations. Availability of second generation mapping populations encompassing multiple alleles, multiple traits, and extensive recombination events is radically changing the phenomenon of classical QTL mapping. Additionally, the rising molecular breeding approaches like marker assisted recurrent selection (MARS) that are able to harness several QTLs are of particular importance in obtaining a “designed” genotype carrying the most desirable combinations of favourable alleles. Furthermore, rapid generation of genome-wide marker data coupled with easy access to precise and accurate phenotypic screens enable large-scale exploitation of LD not only to discover novel QTLs via whole genome association scans but also to practise genomic estimated breeding value (GEBV)-based selection of genotypes. Given refinements being experienced in analytical methods and software tools, the multiparent populations will be the resource of choice to undertake genome wide association studies (GWAS), multiparent MARS, and genomic selection (GS). With this, it is envisioned that these high-throughput and high-power molecular breeding methods would greatly assist in exploiting the enormous potential underlying breeding by design approach to facilitate accelerated crop improvement
- …