359 research outputs found

    RepeatFS: A File System Providing Reproducibility Through Provenance and Automation

    Get PDF
    Reproducibility is of central importance to the scientific process. The difficulty of consistently replicating and verifying experimental results is magnified in the era of big data, in which computational analysis often involves complex multi-application pipelines operating on terabytes of data. These processes result in thousands of possible permutations of data preparation steps, software versions, and command-line arguments. Existing reproducibility frameworks are cumbersome and involve redesigning computational methods. To address these issues, we developed two conceptual models and implemented them through RepeatFS, a file system that records, replicates, and verifies computational workflows with no alteration to the original methods. RepeatFS also provides provenance visualization and task automation. We used RepeatFS to successfully visualize and replicate a variety of bioinformatics tasks consisting of over a million operations with no alteration to the original methods. RepeatFS correctly identified all software inconsistencies that resulted in replication differences

    MediAlly: A Provenance-Aware Remote Health Monitoring Middleware

    Get PDF

    Trusted Artificial Intelligence in Manufacturing; Trusted Artificial Intelligence in Manufacturing

    Get PDF
    The successful deployment of AI solutions in manufacturing environments hinges on their security, safety and reliability which becomes more challenging in settings where multiple AI systems (e.g., industrial robots, robotic cells, Deep Neural Networks (DNNs)) interact as atomic systems and with humans. To guarantee the safe and reliable operation of AI systems in the shopfloor, there is a need to address many challenges in the scope of complex, heterogeneous, dynamic and unpredictable environments. Specifically, data reliability, human machine interaction, security, transparency and explainability challenges need to be addressed at the same time. Recent advances in AI research (e.g., in deep neural networks security and explainable AI (XAI) systems), coupled with novel research outcomes in the formal specification and verification of AI systems provide a sound basis for safe and reliable AI deployments in production lines. Moreover, the legal and regulatory dimension of safe and reliable AI solutions in production lines must be considered as well. To address some of the above listed challenges, fifteen European Organizations collaborate in the scope of the STAR project, a research initiative funded by the European Commission in the scope of its H2020 program (Grant Agreement Number: 956573). STAR researches, develops, and validates novel technologies that enable AI systems to acquire knowledge in order to take timely and safe decisions in dynamic and unpredictable environments. Moreover, the project researches and delivers approaches that enable AI systems to confront sophisticated adversaries and to remain robust against security attacks. This book is co-authored by the STAR consortium members and provides a review of technologies, techniques and systems for trusted, ethical, and secure AI in manufacturing. The different chapters of the book cover systems and technologies for industrial data reliability, responsible and transparent artificial intelligence systems, human centered manufacturing systems such as human-centred digital twins, cyber-defence in AI systems, simulated reality systems, human robot collaboration systems, as well as automated mobile robots for manufacturing environments. A variety of cutting-edge AI technologies are employed by these systems including deep neural networks, reinforcement learning systems, and explainable artificial intelligence systems. Furthermore, relevant standards and applicable regulations are discussed. Beyond reviewing state of the art standards and technologies, the book illustrates how the STAR research goes beyond the state of the art, towards enabling and showcasing human-centred technologies in production lines. Emphasis is put on dynamic human in the loop scenarios, where ethical, transparent, and trusted AI systems co-exist with human workers. The book is made available as an open access publication, which could make it broadly and freely available to the AI and smart manufacturing communities

    Mark My Words: Analyzing and Evaluating Language Model Watermarks

    Full text link
    The capabilities of large language models have grown significantly in recent years and so too have concerns about their misuse. In this context, the ability to distinguish machine-generated text from human-authored content becomes important. Prior works have proposed numerous schemes to watermark text, which would benefit from a systematic evaluation framework. This work focuses on text watermarking techniques - as opposed to image watermarks - and proposes MARKMYWORDS, a comprehensive benchmark for them under different tasks as well as practical attacks. We focus on three main metrics: quality, size (e.g. the number of tokens needed to detect a watermark), and tamper-resistance. Current watermarking techniques are good enough to be deployed: Kirchenbauer et al. [1] can watermark Llama2-7B-chat with no perceivable loss in quality, the watermark can be detected with fewer than 100 tokens, and the scheme offers good tamper-resistance to simple attacks. We argue that watermark indistinguishability, a criteria emphasized in some prior works, is too strong a requirement: schemes that slightly modify logit distributions outperform their indistinguishable counterparts with no noticeable loss in generation quality. We publicly release our benchmark (https://github.com/wagner-group/MarkMyWords)Comment: 18 pages, 11 figure

    Conceptual Framework and Methodology for Analysing Previous Molecular Docking Results

    Get PDF
    Modern drug discovery relies on in-silico computational simulations such as molecular docking. Molecular docking models biochemical interactions to predict where and how two molecules would bind. The results of large-scale molecular docking simulations can provide valuable insight into the relationship between two molecules. This is useful to a biomedical scientist before conducting in-vitro or in-vivo wet-lab experiments. Although this ˝eld has seen great advancements, feedback from biomedical scientists shows that there is a need for storage and further analysis of molecular docking results. To meet this need, biomedical scientists need to have access to computing, data, and network resources, and require speci˝c knowledge or skills they might lack. Therefore, a conceptual framework speci˝cally tailored to enable biomedical scientists to reuse molecular docking results, and a methodology which uses regular input from scientists, has been proposed. The framework is composed of 5 types of elements and 13 interfaces. The methodology is light and relies on frequent communication between biomedical sciences and computer science experts, speci˝ed by particular roles. It shows how developers can bene˝t from using the framework which allows them to determine whether a scenario ˝ts the framework, whether an already implemented element can be reused, or whether a newly proposed tool can be used as an element. Three scenarios that show the versatility of this new framework and the methodology based on it, have been identi˝ed and implemented. A methodical planning and design approach was used and it was shown that the implementations are at least as usable as existing solutions. To eliminate the need for access to expensive computing infrastructure, state-of-the-art cloud computing techniques are used. The implementations enable faster identi˝cation of new molecules for use in docking, direct querying of existing databases, and simpler learning of good molecular docking practice without the need to manually run multiple tools. Thus, the framework and methodol-ogy enable more user-friendly implementations, and less error-prone use of computational methods in drug discovery. Their use could lead to more e˙ective discovery of new drugs

    Understanding Gene Regulation In Development And Differentiation Using Single Cell Multi-Omics

    Get PDF
    Transcriptional regulation is a major determinant of tissue-specific gene expression during development. My thesis research leverages powerful single-cell approaches to address this fundamental question in two developmental systems, C. elegans embryogenesis and mouse embryonic hematopoiesis. I have also developed much-needed computational algorithms for single-cell data analysis and exploration. C. elegans is an animal with few cells, but a striking diversity of cell types. In this thesis, I characterize the molecular basis for their specification by analyzing the transcriptomes of 86,024 single embryonic cells. I identified 502 terminal and pre-terminal cell types, mapping most single cell transcriptomes to their exact position in C. elegans’ invariant lineage. Using these annotations, I find that: 1) the correlation between a cell’s lineage and its transcriptome increases from mid to late gastrulation, then falls dramatically as cells in the nervous system and pharynx adopt their terminal fates; 2) multilineage priming contributes to the differentiation of sister cells at dozens of lineage branches; and 3) most distinct lineages that produce the same anatomical cell type converge to a homogenous transcriptomic state. Next, I studied the development of hematopoietic stem cells (HSCs). All HSCs come from a specialized type of endothelial cells in the major arteries of the embryo called hemogenic endothelium (HE). To examine the cellular and molecular transitions underlying the formation of HSCs, we profiled nearly 40,000 rare single cells from the caudal arteries of embryonic day 9.5 (E9.5) to E11.5 mouse embryos using single-cell RNA-Seq and single-cell ATAC-Seq. I identified a continuous developmental trajectory from endothelial cells to early precursors of HSCs, and several critical transitional cell types during this process. The intermediate stage most proximal to HE, which we termed pre-HE, is characterized by increased accessibility of chromatin enriched for SOX, FOX, GATA, and SMAD binding motifs. I also identified a developmental bottleneck separates pre-HE from HE, and RUNX1 dosage regulates the efficiency of the pre-HE to HE transition. A distal enhancer of Runx1 shows high accessibility in pre-HE cells at the bottleneck, but loses accessibility thereafter. Once cells pass the bottleneck, they follow distinct developmental trajectories leading to an initial wave of lympho-myeloid-biased progenitors, followed by precursors of HSCs. During the course of both projects, I have developed novel computational methods for analyzing single-cell multi-omics data, including VERSE, PIVOT and VisCello. Together, these tools constitute a comprehensive single cell data analysis suite that facilitates the discovery of novel biological mechanisms
    corecore