19,403 research outputs found

    Automating biomedical data science through tree-based pipeline optimization

    Full text link
    Over the past decade, data science and machine learning has grown from a mysterious art form to a staple tool across a variety of fields in academia, business, and government. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning---pipeline design. We implement a Tree-based Pipeline Optimization Tool (TPOT) and demonstrate its effectiveness on a series of simulated and real-world genetic data sets. In particular, we show that TPOT can build machine learning pipelines that achieve competitive classification accuracy and discover novel pipeline operators---such as synthetic feature constructors---that significantly improve classification accuracy on these data sets. We also highlight the current challenges to pipeline optimization, such as the tendency to produce pipelines that overfit the data, and suggest future research paths to overcome these challenges. As such, this work represents an early step toward fully automating machine learning pipeline design.Comment: 16 pages, 5 figures, to appear in EvoBIO 2016 proceeding

    Detecting Family Resemblance: Automated Genre Classification.

    Get PDF
    This paper presents results in automated genre classification of digital documents in PDF format. It describes genre classification as an important ingredient in contextualising scientific data and in retrieving targetted material for improving research. The current paper compares the role of visual layout, stylistic features and language model features in clustering documents and presents results in retrieving five selected genres (Scientific Article, Thesis, Periodicals, Business Report, and Form) from a pool of materials populated with documents of the nineteen most popular genres found in our experimental data set.

    Working Notes from the 1992 AAAI Workshop on Automating Software Design. Theme: Domain Specific Software Design

    Get PDF
    The goal of this workshop is to identify different architectural approaches to building domain-specific software design systems and to explore issues unique to domain-specific (vs. general-purpose) software design. Some general issues that cut across the particular software design domain include: (1) knowledge representation, acquisition, and maintenance; (2) specialized software design techniques; and (3) user interaction and user interface

    Requirement-driven creation and deployment of multidimensional and ETL designs

    Get PDF
    We present our tool for assisting designers in the error-prone and time-consuming tasks carried out at the early stages of a data warehousing project. Our tool semi-automatically produces multidimensional (MD) and ETL conceptual designs from a given set of business requirements (like SLAs) and data source descriptions. Subsequently, our tool translates both the MD and ETL conceptual designs produced into physical designs, so they can be further deployed on a DBMS and an ETL engine. In this paper, we describe the system architecture and present our demonstration proposal by means of an example.Peer ReviewedPostprint (author's final draft

    A method to support SMEs to optimize their manufacturing operations

    Get PDF
    In the last decades the gap between enterprise systems, like Enterprise Resource Planning (ERP), and process control systems has been filled with the development of software systems, commonly referred to as Manufacturing Operations Management (MOM). The ISA-95 standard provides a detailed functional description of this intermediate layer in the CIM pyramid. This standard supports manufacturing companies, system integrators and software vendors by using the same terminology in their communication for integrating their enterprise and control systems. Most of the time, these software systems address bigger companies which are convinced of the strategic advantages for their MOM projects: reduction of risks, costs and errors. This paper introduces an analysis and justification method that reduces the barriers to adoption of MOM systems for small and medium enterprises (SMEs). By applying the method an SME gets an idea of the possible improvements for the materials and information flow required for the production of goods or services

    A Factor Graph Approach to Automated Design of Bayesian Signal Processing Algorithms

    Get PDF
    The benefits of automating design cycles for Bayesian inference-based algorithms are becoming increasingly recognized by the machine learning community. As a result, interest in probabilistic programming frameworks has much increased over the past few years. This paper explores a specific probabilistic programming paradigm, namely message passing in Forney-style factor graphs (FFGs), in the context of automated design of efficient Bayesian signal processing algorithms. To this end, we developed "ForneyLab" (https://github.com/biaslab/ForneyLab.jl) as a Julia toolbox for message passing-based inference in FFGs. We show by example how ForneyLab enables automatic derivation of Bayesian signal processing algorithms, including algorithms for parameter estimation and model comparison. Crucially, due to the modular makeup of the FFG framework, both the model specification and inference methods are readily extensible in ForneyLab. In order to test this framework, we compared variational message passing as implemented by ForneyLab with automatic differentiation variational inference (ADVI) and Monte Carlo methods as implemented by state-of-the-art tools "Edward" and "Stan". In terms of performance, extensibility and stability issues, ForneyLab appears to enjoy an edge relative to its competitors for automated inference in state-space models.Comment: Accepted for publication in the International Journal of Approximate Reasonin

    Treatment of palm oil mill secondary effluent (POMSE) using ultrafiltration and nanofiltration membranes

    Get PDF
    Malaysian palm oil industry has grown rapidly over the last few decades, to becoming the world’s largest producer and exporter of palm oil. This success story however, comes with a greater challenge and equally required more sacrifices in order to maintain the tempo. In the year of 2004, it has been recorded that 26.7 million tons of solid biomass and approximately a 30 million tons of palm oil mill effluent (POME) were generated from 381 palm oil mills in Malaysia [1]. Although different kind of wastes are generated in the palm oil mills, the perceived harmful waste among all the waste generated is the palm oil mill effluent (POME) due to its associated harm if discharged into the environment untreated [2]. POME is a colloidal suspension originating from mixture of sterilizer condensate, separator sludge and hydro cyclone wastewater in a ratio of 9:15:1 respectively [3]. It is a brownish colored, thick liquid that is containing high amount of oil, solids, and grease with high Chemical Oxygen Demand (COD) and Biological Oxygen Demand (BOD) values. Table 15.1 describes the characteristic of POME obtained from Malaysian Palm Oil Board
    • …
    corecore