2 research outputs found

    MaGIC: a machine learning tool set and web application for monoallelic gene inference from chromatin

    Get PDF
    Background: A large fraction of human and mouse autosomal genes are subject to random monoallelic expression (MAE), an epigenetic mechanism characterized by allele-specific gene expression that varies between clonal cell lineages. MAE is highly cell-type specific and mapping it in a large number of cell and tissue types can provide insight into its biological function. Its detection, however, remains challenging. Results: We previously reported that a sequence-independent chromatin signature identifies, with high sensitivity and specificity, genes subject to MAE in multiple tissue types using readily available ChIP-seq data. Here we present an implementation of this method as a user-friendly, open-source software pipeline for monoallelic gene inference from chromatin (MaGIC). The source code for the MaGIC pipeline and the Shiny app is available at https://github.com/gimelbrantlab/magic Conclusion: The pipeline can be used by researchers to map monoallelic expression in a variety of cell types using existing models and to train new models with additional sets of chromatin marks.National Institutes of Health (U.S.) (award U54 HG007963

    Machine Aided Biological Discovery and Design

    No full text
    Advances in biotechnology and the life sciences are primarily driven by biologists conducting rigorous experimentation. However, biology is often too complex – with intractable combinatorial search spaces and functional landscapes – to comprehensively explore, understand, and engineer via iterative biological experimentation. Next-generation sequencing technologies have made it possible to measure biology in high-throughput, giving observational insight into these complexities. Further, in recent years, it has become possible to both manipulate biological systems with fine-grained control and directly synthesize large libraries of DNA molecules with specified sequences, providing unprecedented ability to engineer biology. We explore the thesis that computational methods that are built with experimental considerations and trained on carefully selected high-throughput experimental data can drive advances in the life sciences by making accurate predictions that can then be used to iteratively generate hypotheses and design biological sequences for further experimental validation. To test our thesis about the value of computational methods we introduce and apply computational approaches for modeling cellular differentiation trajectories, identifying non-specific antibodies, and designing diverse libraries of biological sequences that reflect desired objectives. First, we introduce a generative machine learning model for inferring cellular developmental landscapes from cross-sectional sequencing of in vitro differentiation time-series. We validate this model with ground-truth experimental lineage tracing experiments, and we show its ability to conduct in silico simulations of cellular differentiation trajectories with perturbations. Next, we present a computational framework for using sequencing data from therapeutic discovery campaigns to identify nonspecific antibody therapeutics in large candidate pools. We show that this approach bypasses and outperforms costly combinatorial affinity selection experiments and allows the use of only single-target selection data to identify pairwise nonspecificity. Finally, we introduce an algorithm for the rational design of high diversity synthetic antibody libraries using machine learning models and stochastic optimization. We show how this can be used to develop large libraries optimized for targets or developability characteristics leading to more promising candidates from affinity selection.Ph.D
    corecore