542 research outputs found
Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science
As the field of data science continues to grow, there will be an
ever-increasing demand for tools that make machine learning accessible to
non-experts. In this paper, we introduce the concept of tree-based pipeline
optimization for automating one of the most tedious parts of machine
learning---pipeline design. We implement an open source Tree-based Pipeline
Optimization Tool (TPOT) in Python and demonstrate its effectiveness on a
series of simulated and real-world benchmark data sets. In particular, we show
that TPOT can design machine learning pipelines that provide a significant
improvement over a basic machine learning analysis while requiring little to no
input nor prior knowledge from the user. We also address the tendency for TPOT
to design overly complex pipelines by integrating Pareto optimization, which
produces compact pipelines without sacrificing classification accuracy. As
such, this work represents an important step toward fully automating machine
learning pipeline design.Comment: 8 pages, 5 figures, preprint to appear in GECCO 2016, edits not yet
made from reviewer comment
Expressiveness modulo Bisimilarity of Regular Expressions with Parallel Composition (Extended Abstract)
The languages accepted by finite automata are precisely the languages denoted
by regular expressions. In contrast, finite automata may exhibit behaviours
that cannot be described by regular expressions up to bisimilarity. In this
paper, we consider extensions of the theory of regular expressions with various
forms of parallel composition and study the effect on expressiveness. First we
prove that adding pure interleaving to the theory of regular expressions
strictly increases its expressiveness up to bisimilarity. Then, we prove that
replacing the operation for pure interleaving by ACP-style parallel composition
gives a further increase in expressiveness. Finally, we prove that the theory
of regular expressions with ACP-style parallel composition and encapsulation is
expressive enough to express all finite automata up to bisimilarity. Our
results extend the expressiveness results obtained by Bergstra, Bethke and
Ponse for process algebras with (the binary variant of) Kleene's star
operation.Comment: In Proceedings EXPRESS'10, arXiv:1011.601
Algorithms for Hyper-Parameter Optimization
International audienceSeveral recent advances to the state of the art in image classification benchmarks have come from better configurations of existing techniques rather than novel ap- proaches to feature learning. Traditionally, hyper-parameter optimization has been the job of humans because they can be very efficient in regimes where only a few trials are possible. Presently, computer clusters and GPU processors make it pos- sible to run more trials and we show that algorithmic approaches can find better results. We present hyper-parameter optimization results on tasks of training neu- ral networks and deep belief networks (DBNs). We optimize hyper-parameters using random search and two new greedy sequential methods based on the ex- pected improvement criterion. Random search has been shown to be sufficiently efficient for learning neural networks for several datasets, but we show it is unreli- able for training DBNs. The sequential algorithms are applied to the most difficult DBN learning problems from [1] and find significantly better results than the best previously reported. This work contributes novel techniques for making response surface models P(y|x) in which many elements of hyper-parameter assignment (x) are known to be irrelevant given particular values of other elements
Actors, actions, and initiative in normative system specification
The logic of norms, called deontic logic, has been used to specify normative constraints for information systems. For example, one can specify in deontic logic the constraints that a book borrowed from a library should be returned within three weeks, and that if it is not returned, the library should send a reminder. Thus, the notion of obligation to perform an action arises naturally in system specification. Intuitively, deontic logic presupposes the concept of anactor who undertakes actions and is responsible for fulfilling obligations. However, the concept of an actor has not been formalized until now in deontic logic. We present a formalization in dynamic logic, which allows us to express the actor who initiates actions or choices. This is then combined with a formalization, presented earlier, of deontic logic in dynamic logic, which allows us to specify obligations, permissions, and prohibitions to perform an action. The addition of actors allows us to expresswho has the responsibility to perform an action. In addition to the application of the concept of an actor in deontic logic, we discuss two other applications of actors. First, we show how to generalize an approach taken up by De Nicola and Hennessy, who eliminate from CCS in favor of internal and external choice. We show that our generalization allows a more accurate specification of system behavior than is possible without it. Second, we show that actors can be used to resolve a long-standing paradox of deontic logic, called the paradox of free-choice permission. Towards the end of the paper, we discuss whether the concept of an actor can be combined with that of an object to formalize the concept of active objects
Introducing a framework to assess newly created questions with Natural Language Processing
Statistical models such as those derived from Item Response Theory (IRT)
enable the assessment of students on a specific subject, which can be useful
for several purposes (e.g., learning path customization, drop-out prediction).
However, the questions have to be assessed as well and, although it is possible
to estimate with IRT the characteristics of questions that have already been
answered by several students, this technique cannot be used on newly generated
questions. In this paper, we propose a framework to train and evaluate models
for estimating the difficulty and discrimination of newly created Multiple
Choice Questions by extracting meaningful features from the text of the
question and of the possible choices. We implement one model using this
framework and test it on a real-world dataset provided by CloudAcademy, showing
that it outperforms previously proposed models, reducing by 6.7% the RMSE for
difficulty estimation and by 10.8% the RMSE for discrimination estimation. We
also present the results of an ablation study performed to support our features
choice and to show the effects of different characteristics of the questions'
text on difficulty and discrimination.Comment: Accepted at the International Conference of Artificial Intelligence
in Educatio
Batalin-Vilkovisky Integrals in Finite Dimensions
The Batalin-Vilkovisky method (BV) is the most powerful method to analyze
functional integrals with (infinite-dimensional) gauge symmetries presently
known. It has been invented to fix gauges associated with symmetries that do
not close off-shell. Homological Perturbation Theory is introduced and used to
develop the integration theory behind BV and to describe the BV quantization of
a Lagrangian system with symmetries. Localization (illustrated in terms of
Duistermaat-Heckman localization) as well as anomalous symmetries are discussed
in the framework of BV.Comment: 35 page
Ten years of METEOR (an international rheumatoid arthritis registry): development, research opportunities and future perspectives
OBJECTIVES: Ten years ago, the METEOR tool was developed to simulate treatment-to-target and create an international research database. The development of the METEOR tool and database, research opportunities and future perspectives are described. METHODS: The METEOR tool is a free, online, internationally available tool in which daily practice visits of all rheumatoid arthritis patients visiting a rheumatologist can be registered. In the tool, disease characteristics, patient- and physician-reported outcomes and prescribed treatment could be entered. These can be subsequently displayed in powerful graphics, facilitating treatment decisions and patient-physician interactions. An upload facility is also available, by which data from local electronic health record systems or registries can be integrated into the METEOR database. This is currently being actively used in, among other countries, the Netherlands, Portugal and India. RESULTS: Since an increasing number of hospitals use electronic health record systems, the upload facility is being actively used by an increasing number of sites, enabling them to benefit from the benchmark and research opportunities of METEOR. Enabling a connection between local registries and METEOR is a well established but time-consuming process for which an IT-specialist of METEOR and the local registry are necessary. However, once this process has been finished, data can be uploaded regularly and relatively easily according to a pre-specified format. The METEOR database currently contains data from >39,000 patients and >200,000 visits, from 32 different countries and is ever increasing. Continuous efforts are being undertaken to increase the quality of data in the database. CONCLUSIONS: Since METEOR was founded 10 years ago, many rheumatologists worldwide have used the METEOR tool to follow-up their patients and improve the quality of care they provide to their patients. Combined with uploaded data, this has led to an extensive growth of the database. It now offers a unique opportunity to study daily practice care and to perform research regarding cross-country differences in a large, worldwide setting, which could provide important knowledge about disease and its treatment in different geographic and clinical settings
- …