542 research outputs found

    Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science

    Full text link
    As the field of data science continues to grow, there will be an ever-increasing demand for tools that make machine learning accessible to non-experts. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning---pipeline design. We implement an open source Tree-based Pipeline Optimization Tool (TPOT) in Python and demonstrate its effectiveness on a series of simulated and real-world benchmark data sets. In particular, we show that TPOT can design machine learning pipelines that provide a significant improvement over a basic machine learning analysis while requiring little to no input nor prior knowledge from the user. We also address the tendency for TPOT to design overly complex pipelines by integrating Pareto optimization, which produces compact pipelines without sacrificing classification accuracy. As such, this work represents an important step toward fully automating machine learning pipeline design.Comment: 8 pages, 5 figures, preprint to appear in GECCO 2016, edits not yet made from reviewer comment

    Expressiveness modulo Bisimilarity of Regular Expressions with Parallel Composition (Extended Abstract)

    Get PDF
    The languages accepted by finite automata are precisely the languages denoted by regular expressions. In contrast, finite automata may exhibit behaviours that cannot be described by regular expressions up to bisimilarity. In this paper, we consider extensions of the theory of regular expressions with various forms of parallel composition and study the effect on expressiveness. First we prove that adding pure interleaving to the theory of regular expressions strictly increases its expressiveness up to bisimilarity. Then, we prove that replacing the operation for pure interleaving by ACP-style parallel composition gives a further increase in expressiveness. Finally, we prove that the theory of regular expressions with ACP-style parallel composition and encapsulation is expressive enough to express all finite automata up to bisimilarity. Our results extend the expressiveness results obtained by Bergstra, Bethke and Ponse for process algebras with (the binary variant of) Kleene's star operation.Comment: In Proceedings EXPRESS'10, arXiv:1011.601

    Algorithms for Hyper-Parameter Optimization

    Get PDF
    International audienceSeveral recent advances to the state of the art in image classification benchmarks have come from better configurations of existing techniques rather than novel ap- proaches to feature learning. Traditionally, hyper-parameter optimization has been the job of humans because they can be very efficient in regimes where only a few trials are possible. Presently, computer clusters and GPU processors make it pos- sible to run more trials and we show that algorithmic approaches can find better results. We present hyper-parameter optimization results on tasks of training neu- ral networks and deep belief networks (DBNs). We optimize hyper-parameters using random search and two new greedy sequential methods based on the ex- pected improvement criterion. Random search has been shown to be sufficiently efficient for learning neural networks for several datasets, but we show it is unreli- able for training DBNs. The sequential algorithms are applied to the most difficult DBN learning problems from [1] and find significantly better results than the best previously reported. This work contributes novel techniques for making response surface models P(y|x) in which many elements of hyper-parameter assignment (x) are known to be irrelevant given particular values of other elements

    Actors, actions, and initiative in normative system specification

    Get PDF
    The logic of norms, called deontic logic, has been used to specify normative constraints for information systems. For example, one can specify in deontic logic the constraints that a book borrowed from a library should be returned within three weeks, and that if it is not returned, the library should send a reminder. Thus, the notion of obligation to perform an action arises naturally in system specification. Intuitively, deontic logic presupposes the concept of anactor who undertakes actions and is responsible for fulfilling obligations. However, the concept of an actor has not been formalized until now in deontic logic. We present a formalization in dynamic logic, which allows us to express the actor who initiates actions or choices. This is then combined with a formalization, presented earlier, of deontic logic in dynamic logic, which allows us to specify obligations, permissions, and prohibitions to perform an action. The addition of actors allows us to expresswho has the responsibility to perform an action. In addition to the application of the concept of an actor in deontic logic, we discuss two other applications of actors. First, we show how to generalize an approach taken up by De Nicola and Hennessy, who eliminate from CCS in favor of internal and external choice. We show that our generalization allows a more accurate specification of system behavior than is possible without it. Second, we show that actors can be used to resolve a long-standing paradox of deontic logic, called the paradox of free-choice permission. Towards the end of the paper, we discuss whether the concept of an actor can be combined with that of an object to formalize the concept of active objects

    Introducing a framework to assess newly created questions with Natural Language Processing

    Full text link
    Statistical models such as those derived from Item Response Theory (IRT) enable the assessment of students on a specific subject, which can be useful for several purposes (e.g., learning path customization, drop-out prediction). However, the questions have to be assessed as well and, although it is possible to estimate with IRT the characteristics of questions that have already been answered by several students, this technique cannot be used on newly generated questions. In this paper, we propose a framework to train and evaluate models for estimating the difficulty and discrimination of newly created Multiple Choice Questions by extracting meaningful features from the text of the question and of the possible choices. We implement one model using this framework and test it on a real-world dataset provided by CloudAcademy, showing that it outperforms previously proposed models, reducing by 6.7% the RMSE for difficulty estimation and by 10.8% the RMSE for discrimination estimation. We also present the results of an ablation study performed to support our features choice and to show the effects of different characteristics of the questions' text on difficulty and discrimination.Comment: Accepted at the International Conference of Artificial Intelligence in Educatio

    Batalin-Vilkovisky Integrals in Finite Dimensions

    Full text link
    The Batalin-Vilkovisky method (BV) is the most powerful method to analyze functional integrals with (infinite-dimensional) gauge symmetries presently known. It has been invented to fix gauges associated with symmetries that do not close off-shell. Homological Perturbation Theory is introduced and used to develop the integration theory behind BV and to describe the BV quantization of a Lagrangian system with symmetries. Localization (illustrated in terms of Duistermaat-Heckman localization) as well as anomalous symmetries are discussed in the framework of BV.Comment: 35 page

    Ten years of METEOR (an international rheumatoid arthritis registry): development, research opportunities and future perspectives

    Get PDF
    OBJECTIVES: Ten years ago, the METEOR tool was developed to simulate treatment-to-target and create an international research database. The development of the METEOR tool and database, research opportunities and future perspectives are described. METHODS: The METEOR tool is a free, online, internationally available tool in which daily practice visits of all rheumatoid arthritis patients visiting a rheumatologist can be registered. In the tool, disease characteristics, patient- and physician-reported outcomes and prescribed treatment could be entered. These can be subsequently displayed in powerful graphics, facilitating treatment decisions and patient-physician interactions. An upload facility is also available, by which data from local electronic health record systems or registries can be integrated into the METEOR database. This is currently being actively used in, among other countries, the Netherlands, Portugal and India. RESULTS: Since an increasing number of hospitals use electronic health record systems, the upload facility is being actively used by an increasing number of sites, enabling them to benefit from the benchmark and research opportunities of METEOR. Enabling a connection between local registries and METEOR is a well established but time-consuming process for which an IT-specialist of METEOR and the local registry are necessary. However, once this process has been finished, data can be uploaded regularly and relatively easily according to a pre-specified format. The METEOR database currently contains data from >39,000 patients and >200,000 visits, from 32 different countries and is ever increasing. Continuous efforts are being undertaken to increase the quality of data in the database. CONCLUSIONS: Since METEOR was founded 10 years ago, many rheumatologists worldwide have used the METEOR tool to follow-up their patients and improve the quality of care they provide to their patients. Combined with uploaded data, this has led to an extensive growth of the database. It now offers a unique opportunity to study daily practice care and to perform research regarding cross-country differences in a large, worldwide setting, which could provide important knowledge about disease and its treatment in different geographic and clinical settings
    corecore