19,958 research outputs found

    Automating biomedical data science through tree-based pipeline optimization

    Full text link
    Over the past decade, data science and machine learning has grown from a mysterious art form to a staple tool across a variety of fields in academia, business, and government. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning---pipeline design. We implement a Tree-based Pipeline Optimization Tool (TPOT) and demonstrate its effectiveness on a series of simulated and real-world genetic data sets. In particular, we show that TPOT can build machine learning pipelines that achieve competitive classification accuracy and discover novel pipeline operators---such as synthetic feature constructors---that significantly improve classification accuracy on these data sets. We also highlight the current challenges to pipeline optimization, such as the tendency to produce pipelines that overfit the data, and suggest future research paths to overcome these challenges. As such, this work represents an early step toward fully automating machine learning pipeline design.Comment: 16 pages, 5 figures, to appear in EvoBIO 2016 proceeding

    Automating property-based testing of evolving web services

    Get PDF
    Web services are the most widely used service technology that drives the Service-Oriented Computing~(SOC) paradigm. As a result, effective testing of web services is getting increasingly important. In this paper, we present a framework and toolset for testing web services and for evolving test code in sync with the evolution of web services. Our approach to testing web services is based on the Erlang programming language and QuviQ QuickCheck, a property-based testing tool written in Erlang, and our support for test code evolution is added to Wrangler, the Erlang refactoring tool. The key components of our system include the automatic generation of initial test code, the inference of web service interface changes between versions, the provision of a number of domain specific refactorings and the automatic generation of refactoring scripts for evolving the test code. Our framework provides users with a powerful and expressive web service testing framework, while minimising users' effort in creating, maintaining and evolving the test model. The framework presented in this paper can be used by both web service providers and consumers, and can be used to test web services written in whatever language; the approach advocated here could also be adopted in other property-based testing frameworks and refactoring tools

    Tea: A High-level Language and Runtime System for Automating Statistical Analysis

    Full text link
    Though statistical analyses are centered on research questions and hypotheses, current statistical analysis tools are not. Users must first translate their hypotheses into specific statistical tests and then perform API calls with functions and parameters. To do so accurately requires that users have statistical expertise. To lower this barrier to valid, replicable statistical analysis, we introduce Tea, a high-level declarative language and runtime system. In Tea, users express their study design, any parametric assumptions, and their hypotheses. Tea compiles these high-level specifications into a constraint satisfaction problem that determines the set of valid statistical tests, and then executes them to test the hypothesis. We evaluate Tea using a suite of statistical analyses drawn from popular tutorials. We show that Tea generally matches the choices of experts while automatically switching to non-parametric tests when parametric assumptions are not met. We simulate the effect of mistakes made by non-expert users and show that Tea automatically avoids both false negatives and false positives that could be produced by the application of incorrect statistical tests.Comment: 11 page

    Working Notes from the 1992 AAAI Workshop on Automating Software Design. Theme: Domain Specific Software Design

    Get PDF
    The goal of this workshop is to identify different architectural approaches to building domain-specific software design systems and to explore issues unique to domain-specific (vs. general-purpose) software design. Some general issues that cut across the particular software design domain include: (1) knowledge representation, acquisition, and maintenance; (2) specialized software design techniques; and (3) user interaction and user interface

    The Knowledge-Based Software Assistant: Beyond CASE

    Get PDF
    This paper will outline the similarities and differences between two paradigms of software development. Both support the whole software life cycle and provide automation for most of the software development process, but have different approaches. The CASE approach is based on a set of tools linked by a central data repository. This tool-based approach is data driven and views software development as a series of sequential steps, each resulting in a product. The Knowledge-Based Software Assistant (KBSA) approach, a radical departure from existing software development practices, is knowledge driven and centers around a formalized software development process. KBSA views software development as an incremental, iterative, and evolutionary process with development occurring at the specification level

    Automated robotic liquid handling assembly of modular DNA devices

    Get PDF
    Recent advances in modular DNA assembly techniques have enabled synthetic biologists to test significantly more of the available "design space" represented by "devices" created as combinations of individual genetic components. However, manual assembly of such large numbers of devices is time-intensive, error-prone, and costly. The increasing sophistication and scale of synthetic biology research necessitates an efficient, reproducible way to accommodate large-scale, complex, and high throughput device construction. Here, a DNA assembly protocol using the Type-IIS restriction endonuclease based Modular Cloning (MoClo) technique is automated on two liquid-handling robotic platforms. Automated liquid-handling robots require careful, often times tedious optimization of pipetting parameters for liquids of different viscosities (e.g. enzymes, DNA, water, buffers), as well as explicit programming to ensure correct aspiration and dispensing of DNA parts and reagents. This makes manual script writing for complex assemblies just as problematic as manual DNA assembly, and necessitates a software tool that can automate script generation. To this end, we have developed a web-based software tool, http://mocloassembly.com, for generating combinatorial DNA device libraries from basic DNA parts uploaded as Genbank files. We provide access to the tool, and an export file from our liquid handler software which includes optimized liquid classes, labware parameters, and deck layout. All DNA parts used are available through Addgene, and their digital maps can be accessed via the Boston University BDC ICE Registry. Together, these elements provide a foundation for other organizations to automate modular cloning experiments and similar protocols. The automated DNA assembly workflow presented here enables the repeatable, automated, high-throughput production of DNA devices, and reduces the risk of human error arising from repetitive manual pipetting. Sequencing data show the automated DNA assembly reactions generated from this workflow are ~95% correct and require as little as 4% as much hands-on time, compared to manual reaction preparation

    Enabling security checking of automotive ECUs with formal CSP models

    Get PDF
    • …
    corecore