Search CORE

19,958 research outputs found

Automating biomedical data science through tree-based pipeline optimization

Author: Andrews Peter C.
Kidd La Creis
Lavender Nicole A.
Moore Jason H.
Olson Randal S.
Urbanowicz Ryan J.
Publication venue
Publication date: 27/01/2016
Field of study

Over the past decade, data science and machine learning has grown from a mysterious art form to a staple tool across a variety of fields in academia, business, and government. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning---pipeline design. We implement a Tree-based Pipeline Optimization Tool (TPOT) and demonstrate its effectiveness on a series of simulated and real-world genetic data sets. In particular, we show that TPOT can build machine learning pipelines that achieve competitive classification accuracy and discover novel pipeline operators---such as synthetic feature constructors---that significantly improve classification accuracy on these data sets. We also highlight the current challenges to pipeline optimization, such as the tendency to produce pipelines that overfit the data, and suggest future research paths to overcome these challenges. As such, this work represents an early step toward fully automating machine learning pipeline design.Comment: 16 pages, 5 figures, to appear in EvoBIO 2016 proceeding

arXiv.org e-Print Archive

Scipedia

Automating property-based testing of evolving web services

Author: Francisco Miguel Angel
Lamela Seijas Pablo
Li Huiqing
Thompson Simon
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

Web services are the most widely used service technology that drives the Service-Oriented Computing~(SOC) paradigm. As a result, effective testing of web services is getting increasingly important. In this paper, we present a framework and toolset for testing web services and for evolving test code in sync with the evolution of web services. Our approach to testing web services is based on the Erlang programming language and QuviQ QuickCheck, a property-based testing tool written in Erlang, and our support for test code evolution is added to Wrangler, the Erlang refactoring tool. The key components of our system include the automatic generation of initial test code, the inference of web service interface changes between versions, the provision of a number of domain specific refactorings and the automatic generation of refactoring scripts for evolving the test code. Our framework provides users with a powerful and expressive web service testing framework, while minimising users' effort in creating, maintaining and evolving the test model. The framework presented in this paper can be used by both web service providers and consumers, and can be used to test web services written in whatever language; the approach advocated here could also be adopted in other property-based testing frameworks and refactoring tools

Crossref

Kent Academic Repository

Tea: A High-level Language and Runtime System for Automating Statistical Analysis

Author: Berger Emery D.
Chasins Sarah E.
Daum Maureen
Jun Eunice
Just Rene
Reinecke Katharina
Roesch Jared
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 10/04/2019
Field of study

Though statistical analyses are centered on research questions and hypotheses, current statistical analysis tools are not. Users must first translate their hypotheses into specific statistical tests and then perform API calls with functions and parameters. To do so accurately requires that users have statistical expertise. To lower this barrier to valid, replicable statistical analysis, we introduce Tea, a high-level declarative language and runtime system. In Tea, users express their study design, any parametric assumptions, and their hypotheses. Tea compiles these high-level specifications into a constraint satisfaction problem that determines the set of valid statistical tests, and then executes them to test the hypothesis. We evaluate Tea using a suite of statistical analyses drawn from popular tutorials. We show that Tea generally matches the choices of experts while automatically switching to non-parametric tests when parametric assumptions are not met. We simulate the effect of mistakes made by non-expert users and show that Tea automatically avoids both false negatives and false positives that could be produced by the application of incorrect statistical tests.Comment: 11 page

arXiv.org e-Print Archive

Crossref

Working Notes from the 1992 AAAI Workshop on Automating Software Design. Theme: Domain Specific Software Design

Author: Barstow David
Keller Richard M.
Lowry Michael R.
Tong Christopher H.
Publication venue
Publication date
Field of study

The goal of this workshop is to identify different architectural approaches to building domain-specific software design systems and to explore issues unique to domain-specific (vs. general-purpose) software design. Some general issues that cut across the particular software design domain include: (1) knowledge representation, acquisition, and maintenance; (2) specialized software design techniques; and (3) user interaction and user interface

NASA Technical Reports Server

The Knowledge-Based Software Assistant: Beyond CASE

Author: Carozzoni Joseph A.
Publication venue
Publication date
Field of study

This paper will outline the similarities and differences between two paradigms of software development. Both support the whole software life cycle and provide automation for most of the software development process, but have different approaches. The CASE approach is based on a set of tools linked by a central data repository. This tool-based approach is data driven and views software development as a series of sequential steps, each resulting in a product. The Knowledge-Based Software Assistant (KBSA) approach, a radical departure from existing software development practices, is knowledge driven and centers around a formalized software development process. KBSA views software development as an incremental, iterative, and evolutionary process with development occurring at the specification level

NASA Technical Reports Server

Automated robotic liquid handling assembly of modular DNA devices

Author: Densmore Douglas
McCarthy Lloyd
Ortiz Luis
Pavan Marilene
Timmons Joshua
Publication venue: 'MyJove Corporation'
Publication date: 01/12/2017
Field of study

Recent advances in modular DNA assembly techniques have enabled synthetic biologists to test significantly more of the available "design space" represented by "devices" created as combinations of individual genetic components. However, manual assembly of such large numbers of devices is time-intensive, error-prone, and costly. The increasing sophistication and scale of synthetic biology research necessitates an efficient, reproducible way to accommodate large-scale, complex, and high throughput device construction. Here, a DNA assembly protocol using the Type-IIS restriction endonuclease based Modular Cloning (MoClo) technique is automated on two liquid-handling robotic platforms. Automated liquid-handling robots require careful, often times tedious optimization of pipetting parameters for liquids of different viscosities (e.g. enzymes, DNA, water, buffers), as well as explicit programming to ensure correct aspiration and dispensing of DNA parts and reagents. This makes manual script writing for complex assemblies just as problematic as manual DNA assembly, and necessitates a software tool that can automate script generation. To this end, we have developed a web-based software tool, http://mocloassembly.com, for generating combinatorial DNA device libraries from basic DNA parts uploaded as Genbank files. We provide access to the tool, and an export file from our liquid handler software which includes optimized liquid classes, labware parameters, and deck layout. All DNA parts used are available through Addgene, and their digital maps can be accessed via the Boston University BDC ICE Registry. Together, these elements provide a foundation for other organizations to automate modular cloning experiments and similar protocols. The automated DNA assembly workflow presented here enables the repeatable, automated, high-throughput production of DNA devices, and reduces the risk of human error arising from repetitive manual pipetting. Sequencing data show the automated DNA assembly reactions generated from this workflow are ~95% correct and require as little as 4% as much hands-on time, compared to manual reaction preparation

Crossref

Boston University Institutional Repository (OpenBU)

Enabling security checking of automotive ECUs with formal CSP models

Author: Bryans Jeremy
Cheah Madeline
Heneghan John
Shaikh Siraj Ahmed
Wooderson Paul
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/08/2019
Field of study

Crossref

Coventry University Pure Portal