6,861 research outputs found

    Next challenges for adaptive learning systems

    Get PDF
    Learning from evolving streaming data has become a 'hot' research topic in the last decade and many adaptive learning algorithms have been developed. This research was stimulated by rapidly growing amounts of industrial, transactional, sensor and other business data that arrives in real time and needs to be mined in real time. Under such circumstances, constant manual adjustment of models is in-efficient and with increasing amounts of data is becoming infeasible. Nevertheless, adaptive learning models are still rarely employed in business applications in practice. In the light of rapidly growing structurally rich 'big data', new generation of parallel computing solutions and cloud computing services as well as recent advances in portable computing devices, this article aims to identify the current key research directions to be taken to bring the adaptive learning closer to application needs. We identify six forthcoming challenges in designing and building adaptive learning (pre-diction) systems: making adaptive systems scalable, dealing with realistic data, improving usability and trust, integrat-ing expert knowledge, taking into account various application needs, and moving from adaptive algorithms towards adaptive tools. Those challenges are critical for the evolving stream settings, as the process of model building needs to be fully automated and continuous.</jats:p

    ALOJA: A benchmarking and predictive platform for big data performance analysis

    Get PDF
    The main goals of the ALOJA research project from BSC-MSR, are to explore and automate the characterization of cost-effectivenessof Big Data deployments. The development of the project over its first year, has resulted in a open source benchmarking platform, an online public repository of results with over 42,000 Hadoop job runs, and web-based analytic tools to gather insights about system's cost-performance1. This article describes the evolution of the project's focus and research lines from over a year of continuously benchmarking Hadoop under dif- ferent configuration and deployments options, presents results, and dis cusses the motivation both technical and market-based of such changes. During this time, ALOJA's target has evolved from a previous low-level profiling of Hadoop runtime, passing through extensive benchmarking and evaluation of a large body of results via aggregation, to currently leveraging Predictive Analytics (PA) techniques. Modeling benchmark executions allow us to estimate the results of new or untested configu- rations or hardware set-ups automatically, by learning techniques from past observations saving in benchmarking time and costs.This work is partially supported the BSC-Microsoft Research Centre, the Span- ish Ministry of Education (TIN2012-34557), the MINECO Severo Ochoa Research program (SEV-2011-0067) and the Generalitat de Catalunya (2014-SGR-1051).Peer ReviewedPostprint (author's final draft

    Engineering a machine learning pipeline for automating metadata extraction from longitudinal survey questionnaires

    Get PDF
    Data Documentation Initiative-Lifecycle (DDI-L) introduced a robust metadata model to support the capture of questionnaire content and flow, and encouraged through support for versioning and provenancing, objects such as BasedOn for the reuse of existing question items. However, the dearth of questionnaire banks including both question text and response domains has meant that an ecosystem to support the development of DDI ready Computer Assisted Interviewing (CAI) tools has been limited. Archives hold the information in PDFs associated with surveys but extracting that in an efficient manner into DDI-Lifecycle is a significant challenge.&#x0D; While CLOSER Discovery has been championing the provision of high-quality questionnaire metadata in DDI-Lifecycle, this has primarily been done manually. More automated methods need to be explored to ensure scalable metadata annotation and uplift.&#x0D; This paper presents initial results in engineering a machine learning (ML) pipeline to automate the extraction of questions from survey questionnaires as PDFs. Using CLOSER Discovery as a ‘training and test dataset’, a number of machine learning approaches have been explored to classify parsed text from questionnaires to be output as valid DDI items for inclusion in a DDI-L compliant repository.&#x0D; The developed ML pipeline adopts a continuous build and integrate approach, with processes in place to keep track of various combinations of the structured DDI-L input metadata, ML models and model parameters against the defined evaluation metrics, thus enabling reproducibility and comparative analysis of the experiments.  Tangible outputs include a map of the various metadata and model parameters with the corresponding evaluation metrics’ values, which enable model tuning as well as transparent management of data and experiments

    Engineering a machine learning pipeline for automating metadata extraction from longitudinal survey questionnaires

    Get PDF
    Data Documentation Initiative-Lifecycle (DDI-L) introduced a robust metadata model to support the capture of questionnaire content and flow, and encouraged through support for versioning and provenancing, objects such as BasedOn for the reuse of existing question items. However, the dearth of questionnaire banks including both question text and response domains has meant that an ecosystem to support the development of DDI ready Computer Assisted Interviewing (CAI) tools has been limited. Archives hold the information in PDFs associated with surveys but extracting that in an efficient manner into DDI-Lifecycle is a significant challenge.&#x0D; While CLOSER Discovery has been championing the provision of high-quality questionnaire metadata in DDI-Lifecycle, this has primarily been done manually. More automated methods need to be explored to ensure scalable metadata annotation and uplift.&#x0D; This paper presents initial results in engineering a machine learning (ML) pipeline to automate the extraction of questions from survey questionnaires as PDFs. Using CLOSER Discovery as a ‘training and test dataset’, a number of machine learning approaches have been explored to classify parsed text from questionnaires to be output as valid DDI items for inclusion in a DDI-L compliant repository.&#x0D; The developed ML pipeline adopts a continuous build and integrate approach, with processes in place to keep track of various combinations of the structured DDI-L input metadata, ML models and model parameters against the defined evaluation metrics, thus enabling reproducibility and comparative analysis of the experiments.  Tangible outputs include a map of the various metadata and model parameters with the corresponding evaluation metrics’ values, which enable model tuning as well as transparent management of data and experiments

    Automating biomedical data science through tree-based pipeline optimization

    Full text link
    Over the past decade, data science and machine learning has grown from a mysterious art form to a staple tool across a variety of fields in academia, business, and government. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning---pipeline design. We implement a Tree-based Pipeline Optimization Tool (TPOT) and demonstrate its effectiveness on a series of simulated and real-world genetic data sets. In particular, we show that TPOT can build machine learning pipelines that achieve competitive classification accuracy and discover novel pipeline operators---such as synthetic feature constructors---that significantly improve classification accuracy on these data sets. We also highlight the current challenges to pipeline optimization, such as the tendency to produce pipelines that overfit the data, and suggest future research paths to overcome these challenges. As such, this work represents an early step toward fully automating machine learning pipeline design.Comment: 16 pages, 5 figures, to appear in EvoBIO 2016 proceeding

    A System for Accessible Artificial Intelligence

    Full text link
    While artificial intelligence (AI) has become widespread, many commercial AI systems are not yet accessible to individual researchers nor the general public due to the deep knowledge of the systems required to use them. We believe that AI has matured to the point where it should be an accessible technology for everyone. We present an ongoing project whose ultimate goal is to deliver an open source, user-friendly AI system that is specialized for machine learning analysis of complex data in the biomedical and health care domains. We discuss how genetic programming can aid in this endeavor, and highlight specific examples where genetic programming has automated machine learning analyses in previous projects.Comment: 14 pages, 5 figures, submitted to Genetic Programming Theory and Practice 2017 worksho

    Elements of Design for Containers and Solutions in the LinBox Library

    Get PDF
    We describe in this paper new design techniques used in the \cpp exact linear algebra library \linbox, intended to make the library safer and easier to use, while keeping it generic and efficient. First, we review the new simplified structure for containers, based on our \emph{founding scope allocation} model. We explain design choices and their impact on coding: unification of our matrix classes, clearer model for matrices and submatrices, \etc Then we present a variation of the \emph{strategy} design pattern that is comprised of a controller--plugin system: the controller (solution) chooses among plug-ins (algorithms) that always call back the controllers for subtasks. We give examples using the solution \mul. Finally we present a benchmark architecture that serves two purposes: Providing the user with easier ways to produce graphs; Creating a framework for automatically tuning the library and supporting regression testing.Comment: 8 pages, 4th International Congress on Mathematical Software, Seoul : Korea, Republic Of (2014
    • …
    corecore