19,403 research outputs found
Automating biomedical data science through tree-based pipeline optimization
Over the past decade, data science and machine learning has grown from a
mysterious art form to a staple tool across a variety of fields in academia,
business, and government. In this paper, we introduce the concept of tree-based
pipeline optimization for automating one of the most tedious parts of machine
learning---pipeline design. We implement a Tree-based Pipeline Optimization
Tool (TPOT) and demonstrate its effectiveness on a series of simulated and
real-world genetic data sets. In particular, we show that TPOT can build
machine learning pipelines that achieve competitive classification accuracy and
discover novel pipeline operators---such as synthetic feature
constructors---that significantly improve classification accuracy on these data
sets. We also highlight the current challenges to pipeline optimization, such
as the tendency to produce pipelines that overfit the data, and suggest future
research paths to overcome these challenges. As such, this work represents an
early step toward fully automating machine learning pipeline design.Comment: 16 pages, 5 figures, to appear in EvoBIO 2016 proceeding
Detecting Family Resemblance: Automated Genre Classification.
This paper presents results in automated genre classification of digital documents in PDF format. It describes genre classification as an important ingredient in contextualising scientific data and in retrieving targetted material for improving research. The current paper compares the role of visual layout, stylistic features and language model features in clustering documents and presents results in retrieving five selected genres (Scientific Article, Thesis, Periodicals, Business Report, and Form) from a pool of materials populated with documents of the nineteen most popular genres found in our experimental data set.
Working Notes from the 1992 AAAI Workshop on Automating Software Design. Theme: Domain Specific Software Design
The goal of this workshop is to identify different architectural approaches to building domain-specific software design systems and to explore issues unique to domain-specific (vs. general-purpose) software design. Some general issues that cut across the particular software design domain include: (1) knowledge representation, acquisition, and maintenance; (2) specialized software design techniques; and (3) user interaction and user interface
Requirement-driven creation and deployment of multidimensional and ETL designs
We present our tool for assisting designers in the error-prone and time-consuming tasks carried out at the early stages of a data warehousing project. Our tool semi-automatically produces multidimensional (MD) and ETL conceptual designs from a given set of business requirements (like SLAs) and data source descriptions. Subsequently, our tool translates both the MD and ETL conceptual designs produced into physical designs, so they can be further deployed on a DBMS and an ETL engine. In this paper, we describe the system architecture and present our demonstration proposal by means of an example.Peer ReviewedPostprint (author's final draft
A method to support SMEs to optimize their manufacturing operations
In the last decades the gap between enterprise systems, like Enterprise Resource Planning (ERP), and process control systems has been filled with the development of software systems, commonly referred to as Manufacturing Operations Management (MOM). The ISA-95 standard provides a detailed functional description of this intermediate layer in the CIM pyramid. This standard supports manufacturing companies, system integrators and software vendors by using the same terminology in their communication for integrating their enterprise and control systems. Most of the time, these software systems address bigger companies which are convinced of the strategic advantages for their MOM projects: reduction of risks, costs and errors. This paper introduces an analysis and justification method that reduces the barriers to adoption of MOM systems for small and medium enterprises (SMEs). By applying the method an SME gets an idea of the possible improvements for the materials and information flow required for the production of goods or services
A Factor Graph Approach to Automated Design of Bayesian Signal Processing Algorithms
The benefits of automating design cycles for Bayesian inference-based
algorithms are becoming increasingly recognized by the machine learning
community. As a result, interest in probabilistic programming frameworks has
much increased over the past few years. This paper explores a specific
probabilistic programming paradigm, namely message passing in Forney-style
factor graphs (FFGs), in the context of automated design of efficient Bayesian
signal processing algorithms. To this end, we developed "ForneyLab"
(https://github.com/biaslab/ForneyLab.jl) as a Julia toolbox for message
passing-based inference in FFGs. We show by example how ForneyLab enables
automatic derivation of Bayesian signal processing algorithms, including
algorithms for parameter estimation and model comparison. Crucially, due to the
modular makeup of the FFG framework, both the model specification and inference
methods are readily extensible in ForneyLab. In order to test this framework,
we compared variational message passing as implemented by ForneyLab with
automatic differentiation variational inference (ADVI) and Monte Carlo methods
as implemented by state-of-the-art tools "Edward" and "Stan". In terms of
performance, extensibility and stability issues, ForneyLab appears to enjoy an
edge relative to its competitors for automated inference in state-space models.Comment: Accepted for publication in the International Journal of Approximate
Reasonin
Recommended from our members
Automatic synthesis of analog layout : a survey
A review of recent research in the automatic synthesis of physical geometry for analog integrated circuits is presented. On introduction, an explanation of the difficulties involved in analog layout as opposed to digital layout is covered. Review of the literature then follows. Emphasis is placed on the exposition of general methods for addressing problems specific to analog layout, with the details of specific systems only being given when they surve to illustrate these methods well. The conclusion discusses problems remaining and offers a prediction as to how technology will evolve to solve them. It is argued that although progress has been and will continue to be made in the automation of analog IC layout, due to fundamental differences in the nature of analog IC design as opposed to digital design, it should not be expected that the level of automation of the former will reach that of the latter any time soon
Treatment of palm oil mill secondary effluent (POMSE) using ultrafiltration and nanofiltration membranes
Malaysian palm oil industry has grown rapidly over the last few decades, to becoming the world’s largest producer and exporter of palm oil. This success story however, comes with a greater challenge and equally required more sacrifices in order to maintain the tempo. In the year of 2004, it has been recorded that 26.7 million tons of solid biomass and approximately a 30 million tons of palm oil mill effluent (POME) were generated from 381 palm oil mills in Malaysia [1]. Although different kind of wastes are generated in the palm oil mills, the perceived harmful waste among all the waste generated is the palm oil mill effluent (POME) due to its associated harm if discharged into the environment untreated [2]. POME is a colloidal suspension originating from mixture of sterilizer condensate, separator sludge and hydro cyclone wastewater in a ratio of 9:15:1 respectively [3]. It is a brownish colored, thick liquid that is containing high amount of oil, solids, and grease with high Chemical Oxygen Demand (COD) and Biological Oxygen Demand (BOD) values. Table 15.1 describes the characteristic of POME obtained from Malaysian Palm Oil Board
- …