9 research outputs found
Recommending Learning Algorithms and Their Associated Hyperparameters
The success of machine learning on a given task dependson, among other
things, which learning algorithm is selected and its associated
hyperparameters. Selecting an appropriate learning algorithm and setting its
hyperparameters for a given data set can be a challenging task, especially for
users who are not experts in machine learning. Previous work has examined using
meta-features to predict which learning algorithm and hyperparameters should be
used. However, choosing a set of meta-features that are predictive of algorithm
performance is difficult. Here, we propose to apply collaborative filtering
techniques to learning algorithm and hyperparameter selection, and find that
doing so avoids determining which meta-features to use and outperforms
traditional meta-learning approaches in many cases.Comment: Short paper--2 pages, 2 table
An Easy to Use Repository for Comparing and Improving Machine Learning Algorithm Usage
The results from most machine learning experiments are used for a specific
purpose and then discarded. This results in a significant loss of information
and requires rerunning experiments to compare learning algorithms. This also
requires implementation of another algorithm for comparison, that may not
always be correctly implemented. By storing the results from previous
experiments, machine learning algorithms can be compared easily and the
knowledge gained from them can be used to improve their performance. The
purpose of this work is to provide easy access to previous experimental results
for learning and comparison. These stored results are comprehensive -- storing
the prediction for each test instance as well as the learning algorithm,
hyperparameters, and training set that were used. Previous results are
particularly important for meta-learning, which, in a broad sense, is the
process of learning from previous machine learning results such that the
learning process is improved. While other experiment databases do exist, one of
our focuses is on easy access to the data. We provide meta-learning data sets
that are ready to be downloaded for meta-learning experiments. In addition,
queries to the underlying database can be made if specific information is
desired. We also differ from previous experiment databases in that our
databases is designed at the instance level, where an instance is an example in
a data set. We store the predictions of a learning algorithm trained on a
specific training set for each instance in the test set. Data set level
information can then be obtained by aggregating the results from the instances.
The instance level information can be used for many tasks such as determining
the diversity of a classifier or algorithmically determining the optimal subset
of training instances for a learning algorithm.Comment: 7 pages, 1 figure, 6 table
OpenML: networked science in machine learning
Many sciences have made significant breakthroughs by adopting online tools
that help organize, structure and mine information that is too detailed to be
printed in journals. In this paper, we introduce OpenML, a place for machine
learning researchers to share and organize data in fine detail, so that they
can work more effectively, be more visible, and collaborate with others to
tackle harder problems. We discuss how OpenML relates to other examples of
networked science and what benefits it brings for machine learning research,
individual scientists, as well as students and practitioners.Comment: 12 pages, 10 figure
Semantic descriptor for intelligence services
The exposition and discovery of intelligence especially for connected devices and autonomous systems have become an important area of the research towards an all-intelligent world. In this article, it a semantic description of functions is proposed and used to provide intelligence services mainly for networked devices. The semantic descriptors aim to provide interoperability between multiple domains' vocabularies, data models, and ontologies, so that device applications become able to deploy them autonomously once they are onboarded in the device or system platform. The proposed framework supports the discovery, onboarding, and updating of the services by providing descriptions of their execution environment, software dependencies, policies and data inputs required, as well as the outputs produced, to enable application decoupling from the AI functions
ASlib: A Benchmark Library for Algorithm Selection
The task of algorithm selection involves choosing an algorithm from a set of
algorithms on a per-instance basis in order to exploit the varying performance
of algorithms over a set of instances. The algorithm selection problem is
attracting increasing attention from researchers and practitioners in AI. Years
of fruitful applications in a number of domains have resulted in a large amount
of data, but the community lacks a standard format or repository for this data.
This situation makes it difficult to share and compare different approaches
effectively, as is done in other, more established fields. It also
unnecessarily hinders new researchers who want to work in this area. To address
this problem, we introduce a standardized format for representing algorithm
selection scenarios and a repository that contains a growing number of data
sets from the literature. Our format has been designed to be able to express a
wide variety of different scenarios. Demonstrating the breadth and power of our
platform, we describe a set of example experiments that build and evaluate
algorithm selection models through a common interface. The results display the
potential of algorithm selection to achieve significant performance
improvements across a broad range of problems and algorithms.Comment: Accepted to be published in Artificial Intelligence Journa
Ontology of core data mining entities
In this article, we present OntoDM-core, an ontology of core data mining
entities. OntoDM-core defines themost essential datamining entities in a three-layered
ontological structure comprising of a specification, an implementation and an application
layer. It provides a representational framework for the description of mining
structured data, and in addition provides taxonomies of datasets, data mining tasks,
generalizations, data mining algorithms and constraints, based on the type of data.
OntoDM-core is designed to support a wide range of applications/use cases, such as
semantic annotation of data mining algorithms, datasets and results; annotation of
QSAR studies in the context of drug discovery investigations; and disambiguation of
terms in text mining. The ontology has been thoroughly assessed following the practices
in ontology engineering, is fully interoperable with many domain resources and
is easy to extend
Experiment databases: A new way to share, organize and learn from experiments
Thousands of machine learning research papers contain extensive experimental comparisons. However, the details of those experiments are often lost after publication, making it impossible to reuse these experiments in further research, or reproduce them to verify the claims made. In this paper, we present a collaboration framework designed to easily share machine learning experiments with the community, and automatically organize them in public databases. This enables immediate reuse of experiments for subsequent, possibly much broader investigation and offers faster and more thorough analysis based on a large set of varied results. We describe how we designed such an experiment database, currently holding over 650,000 classification experiments, and demonstrate its use by answering a wide range of interesting research questions and by verifying a number of recent studies