51 research outputs found

    Learning Bayesian network structure from massive datasets: The ”sparse candidate” algorithm

    Get PDF
    Learning Bayesian networks is often cast as an optimization problem, where the computational task is to find a structure that maximizes a statistically motivated score. By and large, existing learning tools address this optimization problem using standard heuristic search techniques. Since the search space is extremely large, such search procedures can spend most of the time examining candidates that are extremely unreasonable. This problem becomes critical when we deal with data sets that are large both in the number of instances, and the number of attributes. In this paper, we introduce an algorithm that achieves faster learning by restricting the search space. This iterative algorithm restricts the parents of each variable to belong to a small subset of candidates. We then search for a network that satisfies these constraints. The learned network is then used for selecting better candidates for the next iteration. We evaluate this algorithm both on synthetic and real-life data. Our results show that it is significantly faster than alternative search procedures without loss of quality in the learned structures.

    Integrated live imaging and molecular profiling of embryoid bodies reveals a synchronized progression of early differentiation.

    Get PDF
    Embryonic stem cells can spontaneously differentiate into cell types of all germ layers within embryoid bodies (EBs) in a highly variable manner. Whether there exists an intrinsic differentiation program common to all EBs is unknown. Here, we present a novel combination of high-throughput live two-photon imaging and gene expression profiling to study early differentiation dynamics spontaneously occurring within developing EBs. Onset timing of Brachyury-GFP was highly variable across EBs, while the spatial patterns as well as the dynamics of mesendodermal progression following onset were remarkably similar. We therefore defined a \u27developmental clock\u27 using the Brachyury-GFP signal onset timing. Mapping snapshot gene expression measurements to this clock revealed their temporal trends, indicating that loss of pluripotency, formation of primitive streak and mesodermal lineage progression are synchronized in EBs. Exogenous activation of Wnt or BMP signaling accelerated the intrinsic clock. CHIR down-regulated Wnt3, allowing insights into dependency mechanisms between canonical Wnt signaling and multiple genes. Our findings reveal a developmental clock characteristic of an early differentiation program common to all EBs, further establishing them as an in vitro developmental model

    Sculpting with stem cells: how models of embryo development take shape

    Get PDF
    During embryogenesis, organisms acquire their shape given boundary conditions that impose geometrical, mechanical and biochemical constraints. A detailed integrative understanding how these morphogenetic information modules pattern and shape the mammalian embryo is still lacking, mostly owing to the inaccessibility of the embryo in vivo for direct observation and manipulation. These impediments are circumvented by the developmental engineering of embryo-like structures (stembryos) from pluripotent stem cells that are easy to access, track, manipulate and scale. Here, we explain how unlocking distinct levels of embryo-like architecture through controlled modulations of the cellular environment enables the identification of minimal sets of mechanical and biochemical inputs necessary to pattern and shape the mammalian embryo. We detail how this can be complemented with precise measurements and manipulations of tissue biochemistry, mechanics and geometry across spatial and temporal scales to provide insights into the mechanochemical feedback loops governing embryo morphogenesis. Finally, we discuss how, even in the absence of active manipulations, stembryos display intrinsic phenotypic variability that can be leveraged to define the constraints that ensure reproducible morphogenesis in vivo

    Dynamic single cell imaging of direct reprogramming reveals an early specifying event

    Get PDF
    available in PMC 2010 November 1.The study of induced pluripotency often relies on experimental approaches that average measurements across a large population of cells, the majority of which do not become pluripotent. Here we used high-resolution, time-lapse imaging to trace the reprogramming process over 2 weeks from single mouse embryonic fibroblasts (MEFs) to pluripotency factor–positive colonies. This enabled us to calculate a normalized cell-of-origin reprogramming efficiency that takes into account only the initial MEFs that respond to form reprogrammed colonies rather than the larger number of final colonies. Furthermore, this retrospective analysis revealed that successfully reprogramming cells undergo a rapid shift in their proliferative rate that coincides with a reduction in cellular area. This event occurs as early as the first cell division and with similar kinetics in all cells that form induced pluripotent stem (iPS) cell colonies. These data contribute to the theoretical modeling of reprogramming and suggest that certain parts of the reprogramming process follow defined rather than stochastic steps.Burroughs Wellcome Fund (Career Award at the Scientific Interface)Pew Charitable TrustsMassachusetts Life Sciences Center (New Investigator grant)Broad Institute (Investigator of the Merkin Foundation for Stem Cell Research)Howard Hughes Medical Institute (Early Career Scientist)Alfred P. Sloan FoundationNational Institutes of Health (U.S.) (Pioneer Award

    Introducing v0.5 of the AI Safety Benchmark from MLCommons

    Get PDF
    This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas (i.e., typical users, malicious users, and vulnerable users). We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark. We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024. The v1.0 benchmark will provide meaningful insights into the safety of AI systems. However, the v0.5 benchmark should not be used to assess the safety of AI systems. We have sought to fully document the limitations, flaws, and challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes (1) a principled approach to specifying and constructing the benchmark, which comprises use cases, types of systems under test (SUTs), language and context, personas, tests, and test items; (2) a taxonomy of 13 hazard categories with definitions and subcategories; (3) tests for seven of the hazard categories, each comprising a unique set of test items, i.e., prompts. There are 43,090 test items in total, which we created with templates; (4) a grading system for AI systems against the benchmark; (5) an openly available platform, and downloadable tool, called ModelBench that can be used to evaluate the safety of AI systems on the benchmark; (6) an example evaluation report which benchmarks the performance of over a dozen openly available chat-tuned language models; (7) a test specification for the benchmark

    Introducing v0.5 of the AI Safety Benchmark from MLCommons

    Get PDF
    This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas (i.e., typical users, malicious users, and vulnerable users). We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark. We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024. The v1.0 benchmark will provide meaningful insights into the safety of AI systems. However, the v0.5 benchmark should not be used to assess the safety of AI systems. We have sought to fully document the limitations, flaws, and challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes (1) a principled approach to specifying and constructing the benchmark, which comprises use cases, types of systems under test (SUTs), language and context, personas, tests, and test items; (2) a taxonomy of 13 hazard categories with definitions and subcategories; (3) tests for seven of the hazard categories, each comprising a unique set of test items, i.e., prompts. There are 43,090 test items in total, which we created with templates; (4) a grading system for AI systems against the benchmark; (5) an openly available platform, and downloadable tool, called ModelBench that can be used to evaluate the safety of AI systems on the benchmark; (6) an example evaluation report which benchmarks the performance of over a dozen openly available chat-tuned language models; (7) a test specification for the benchmark

    BRNI: Modular analysis of transcriptional regulatory programs

    Get PDF
    Background Transcriptional responses often consist of regulatory modules – sets of genes with a shared expression pattern that are controlled by the same regulatory mechanisms. Previous methods allow dissecting regulatory modules from genomics data, such as expression profiles, protein-DNA binding, and promoter sequences. In cases where physical protein-DNA data are lacking, such methods are essential for the analysis of the underlying regulatory program. Results Here, we present a novel approach for the analysis of modular regulatory programs. Our method – Biochemical Regulatory Network Inference (BRNI) – is based on an algorithm that learns from expression data a biochemically-motivated regulatory program. It describes the expression profiles of gene modules consisting of hundreds of genes using a small number of regulators and affinity parameters. We developed an ensemble learning algorithm that ensures the robustness of the learned model. We then use the topology of the learned regulatory program to guide the discovery of a library of cis-regulatory motifs, and determined the motif compositions associated with each module. We test our method on the cell cycle regulatory program of the fission yeast. We discovered 16 coherent modules, covering diverse processes from cell division to metabolism and associated them with 18 learned regulatory elements, including both known cell-cycle regulatory elements (MCB, Ace2, PCB, ACCCT box) and novel ones, some of which are associated with G2 modules. We integrate the regulatory relations from the expression- and motif-based models into a single network, highlighting specific topologies that result in distinct dynamics of gene expression in the fission yeast cell cycle. Conclusion Our approach provides a biologically-driven, principled way for deconstructing a set of genes into meaningful transcriptional modules and identifying their associated cis-regulatory programs. Our analysis sheds light on the architecture and function of the regulatory network controlling the fission yeast cell cycle, and a similar approach can be applied to the regulatory underpinnings of other modular transcriptional responses

    Using Bayesian networks to analyze expression data

    No full text
    DNA hybridization arrays simultaneously measure the expression level for thousands of genes. These measurements provide a “snapshot ” of transcription levels within the cell. A major challenge in computational biology is to uncover, from such measurements, gene/protein interactions and key biological features of cellular systems. In this paper, we propose a new framework for discovering interactions between genes based on multiple expression measurements. This framework builds on the use of Bayesian networks for representing statistical dependencies. A Bayesian network is a graph-based model of joint multivariate probability distributions that captures properties of conditional independence between variables. Such models are attractive for their ability to describe complex stochastic processes and because they provide a clear methodology for learning from (noisy) observations. We start by showing how Bayesian networks can describe interactions between genes. We then describe a method for recovering gene interactions from microarray data using tools for learning Bayesian networks. Finally, we demonstrate this method on the S. cerevisiae cell-cycle measurements of Spellman et al. (1998). Key words: gene expression, microarrays, Bayesian methods. 1
    • …
    corecore