496 research outputs found
A Very Brief and Critical Discussion on AutoML
This contribution presents a very brief and critical discussion on automated
machine learning (AutoML), which is categorized here into two classes, referred
to as narrow AutoML and generalized AutoML, respectively. The conclusions
yielded from this discussion can be summarized as follows: (1) most existent
research on AutoML belongs to the class of narrow AutoML; (2) advances in
narrow AutoML are mainly motivated by commercial needs, while any possible
benefit obtained is definitely at a cost of increase in computing burdens;
(3)the concept of generalized AutoML has a strong tie in spirit with artificial
general intelligence (AGI), also called "strong AI", for which obstacles abound
for obtaining pivotal progresses.Comment: 5 page
autoBagging: Learning to Rank Bagging Workflows with Metalearning
Machine Learning (ML) has been successfully applied to a wide range of
domains and applications. One of the techniques behind most of these successful
applications is Ensemble Learning (EL), the field of ML that gave birth to
methods such as Random Forests or Boosting. The complexity of applying these
techniques together with the market scarcity on ML experts, has created the
need for systems that enable a fast and easy drop-in replacement for ML
libraries. Automated machine learning (autoML) is the field of ML that attempts
to answers these needs. Typically, these systems rely on optimization
techniques such as bayesian optimization to lead the search for the best model.
Our approach differs from these systems by making use of the most recent
advances on metalearning and a learning to rank approach to learn from
metadata. We propose autoBagging, an autoML system that automatically ranks 63
bagging workflows by exploiting past performance and dataset characterization.
Results on 140 classification datasets from the OpenML platform show that
autoBagging can yield better performance than the Average Rank method and
achieve results that are not statistically different from an ideal model that
systematically selects the best workflow for each dataset. For the purpose of
reproducibility and generalizability, autoBagging is publicly available as an R
package on CRAN
Taking Human out of Learning Applications: A Survey on Automated Machine Learning
Machine learning techniques have deeply rooted in our everyday life. However,
since it is knowledge- and labor-intensive to pursue good learning performance,
human experts are heavily involved in every aspect of machine learning. In
order to make machine learning techniques easier to apply and reduce the demand
for experienced human experts, automated machine learning (AutoML) has emerged
as a hot topic with both industrial and academic interest. In this paper, we
provide an up to date survey on AutoML. First, we introduce and define the
AutoML problem, with inspiration from both realms of automation and machine
learning. Then, we propose a general AutoML framework that not only covers most
existing approaches to date but also can guide the design for new methods.
Subsequently, we categorize and review the existing works from two aspects,
i.e., the problem setup and the employed techniques. Finally, we provide a
detailed analysis of AutoML approaches and explain the reasons underneath their
successful applications. We hope this survey can serve as not only an
insightful guideline for AutoML beginners but also an inspiration for future
research.Comment: This is a preliminary and will be kept update
HARK Side of Deep Learning -- From Grad Student Descent to Automated Machine Learning
Recent advancements in machine learning research, i.e., deep learning,
introduced methods that excel conventional algorithms as well as humans in
several complex tasks, ranging from detection of objects in images and speech
recognition to playing difficult strategic games. However, the current
methodology of machine learning research and consequently, implementations of
the real-world applications of such algorithms, seems to have a recurring
HARKing (Hypothesizing After the Results are Known) issue. In this work, we
elaborate on the algorithmic, economic and social reasons and consequences of
this phenomenon. We present examples from current common practices of
conducting machine learning research (e.g. avoidance of reporting negative
results) and failure of generalization ability of the proposed algorithms and
datasets in actual real-life usage. Furthermore, a potential future trajectory
of machine learning research and development from the perspective of
accountable, unbiased, ethical and privacy-aware algorithmic decision making is
discussed. We would like to emphasize that with this discussion we neither
claim to provide an exhaustive argumentation nor blame any specific institution
or individual on the raised issues. This is simply a discussion put forth by
us, insiders of the machine learning field, reflecting on us.Comment: 13 page
Automated Machine Learning -- a brief review at the end of the early years
Automated machine learning (AutoML) is the sub-field of machine learning that
aims at automating, to some extend, all stages of the design of a machine
learning system. In the context of supervised learning, AutoML is concerned
with feature extraction, pre processing, model design and post processing.
Major contributions and achievements in AutoML have been taking place during
the recent decade. We are therefore in perfect timing to look back and realize
what we have learned. This chapter aims to summarize the main findings in the
early years of AutoML. More specifically, in this chapter an introduction to
AutoML for supervised learning is provided and an historical review of progress
in this field is presented. Likewise, the main paradigms of AutoML are
described and research opportunities are outlined.Comment: Preprint submitted to Springe
Demystifying a Dark Art: Understanding Real-World Machine Learning Model Development
It is well-known that the process of developing machine learning (ML)
workflows is a dark-art; even experts struggle to find an optimal workflow
leading to a high accuracy model. Users currently rely on empirical
trial-and-error to obtain their own set of battle-tested guidelines to inform
their modeling decisions. In this study, we aim to demystify this dark art by
understanding how people iterate on ML workflows in practice. We analyze over
475k user-generated workflows on OpenML, an open-source platform for tracking
and sharing ML workflows. We find that users often adopt a manual, automated,
or mixed approach when iterating on their workflows. We observe that manual
approaches result in fewer wasted iterations compared to automated approaches.
Yet, automated approaches often involve more preprocessing and hyperparameter
options explored, resulting in higher performance overall--suggesting potential
benefits for a human-in-the-loop ML system that appropriately recommends a
clever combination of the two strategies
A Level-wise Taxonomic Perspective on Automated Machine Learning to Date and Beyond: Challenges and Opportunities
Automated machine learning (AutoML) is essentially automating the process of
applying machine learning to real-world problems. The primary goals of AutoML
tools are to provide methods and processes to make Machine Learning available
for non-Machine Learning experts (domain experts), to improve efficiency of
Machine Learning and to accelerate research on Machine Learning. Although
automation and efficiency are some of AutoML's main selling points, the process
still requires a surprising level of human involvement. A number of vital steps
of the machine learning pipeline, including understanding the attributes of
domain-specific data, defining prediction problems, creating a suitable
training data set etc. still tend to be done manually by a data scientist on an
ad-hoc basis. Often, this process requires a lot of back-and-forth between the
data scientist and domain experts, making the whole process more difficult and
inefficient. Altogether, AutoML systems are still far from a "real automatic
system". In this review article, we present a level-wise taxonomic perspective
on AutoML systems to-date and beyond, i.e., we introduce a new classification
system with seven levels to distinguish AutoML systems based on their level of
autonomy. We first start with a discussion on how an end-to-end Machine
learning pipeline actually looks like and which sub-tasks of Machine learning
Pipeline has indeed been automated so far. Next, we highlight the sub-tasks
which are still done manually by a data-scientist in most cases and how that
limits a domain expert's access to Machine learning. Then, we introduce the
novel level-based taxonomy of AutoML systems and define each level according to
their scope of automation support. Finally, we provide a road-map of future
research endeavor in the area of AutoML and discuss some important challenges
in achieving this ambitious goal.Comment: 35 pages, survey article, 3 figure
Automatic Model Selection for Neural Networks
Neural networks and deep learning are changing the way that artificial
intelligence is being done. Efficiently choosing a suitable network
architecture and fine-tune its hyper-parameters for a specific dataset is a
time-consuming task given the staggering number of possible alternatives. In
this paper, we address the problem of model selection by means of a fully
automated framework for efficiently selecting a neural network model for a
given task: classification or regression. The algorithm, named Automatic Model
Selection, is a modified micro-genetic algorithm that automatically and
efficiently finds the most suitable neural network model for a given dataset.
The main contributions of this method are a simple list based encoding for
neural networks as genotypes in an evolutionary algorithm, new crossover, and
mutation operators, the introduction of a fitness function that considers both,
the accuracy of the model and its complexity and a method to measure the
similarity between two neural networks. AMS is evaluated on two different
datasets. By comparing some models obtained with AMS to state-of-the-art models
for each dataset we show that AMS can automatically find efficient neural
network models. Furthermore, AMS is computationally efficient and can make use
of distributed computing paradigms to further boost its performance.Comment: 31 pages, 6 figures. Preprint Submitted to Elsevier Neural Network
A Brief Survey of Associations Between Meta-Learning and General AI
This paper briefly reviews the history of meta-learning and describes its
contribution to general AI. Meta-learning improves model generalization
capacity and devises general algorithms applicable to both in-distribution and
out-of-distribution tasks potentially. General AI replaces task-specific models
with general algorithmic systems introducing higher level of automation in
solving diverse tasks using AI. We summarize main contributions of
meta-learning to the developments in general AI, including memory module,
meta-learner, coevolution, curiosity, forgetting and AI-generating algorithm.
We present connections between meta-learning and general AI and discuss how
meta-learning can be used to formulate general AI algorithms
Towards A Domain-Customized Automated Machine Learning Framework For Networks and Systems
Clouds gather a vast volume of telemetry from their networked systems which
contain valuable information that can help solve many of the problems that
continue to plague them. However, it is hard to extract useful information from
such raw data. Machine Learning (ML) models are useful tools that enable
operators to either leverage this data to solve such problems or develop
intuition about whether/how they can be solved. Building practical ML models is
time-consuming and requires experts in both ML and networked systems to tailor
the model to the system/network (a.k.a "domain-customize" it). The number of
applications we deploy exacerbates the problem. The speed with which our
systems evolve and with which new monitoring systems are deployed (deprecated)
means these models often need to be adapted to keep up. Today, the lack of
individuals with both sets of expertise is becoming one of the bottlenecks for
adopting ML in cloud operations. This paper argues it is possible to build a
domain-customized automated ML framework for networked systems that can help
save valuable operator time and effort
- …