15 research outputs found
Algorithm Selection Framework for Cyber Attack Detection
The number of cyber threats against both wired and wireless computer systems
and other components of the Internet of Things continues to increase annually.
In this work, an algorithm selection framework is employed on the NSL-KDD data
set and a novel paradigm of machine learning taxonomy is presented. The
framework uses a combination of user input and meta-features to select the best
algorithm to detect cyber attacks on a network. Performance is compared between
a rule-of-thumb strategy and a meta-learning strategy. The framework removes
the conjecture of the common trial-and-error algorithm selection method. The
framework recommends five algorithms from the taxonomy. Both strategies
recommend a high-performing algorithm, though not the best performing. The work
demonstrates the close connectedness between algorithm selection and the
taxonomy for which it is premised.Comment: 6 pages, 7 figures, 1 table, accepted to WiseML '2
Toward Intelligent Machine Learning Algorithms
Coordinated Science Laboratory was formerly known as Control Systems LaboratoryNational Science Foundation / NSF IST-85-11170Office of Naval Research / N00014-82-K-0186Defense Advanced Research Projects Agency / N00014-87-K-0874Texas Instruments, Inc
A Recommendation System for Meta-modeling: A Meta-learning Based Approach
Various meta-modeling techniques have been developed to replace computationally expensive simulation models. The performance of these meta-modeling techniques on different models is varied which makes existing model selection/recommendation approaches (e.g., trial-and-error, ensemble) problematic. To address these research gaps, we propose a general meta-modeling recommendation system using meta-learning which can automate the meta-modeling recommendation process by intelligently adapting the learning bias to problem characterizations. The proposed intelligent recommendation system includes four modules: (1) problem module, (2) meta-feature module which includes a comprehensive set of meta-features to characterize the geometrical properties of problems, (3) meta-learner module which compares the performance of instance-based and model-based learning approaches for optimal framework design, and (4) performance evaluation module which introduces two criteria, Spearman\u27s ranking correlation coefficient and hit ratio, to evaluate the system on the accuracy of model ranking prediction and the precision of the best model recommendation, respectively. To further improve the performance of meta-learning for meta-modeling recommendation, different types of feature reduction techniques, including singular value decomposition, stepwise regression and ReliefF, are studied. Experiments show that our proposed framework is able to achieve 94% correlation on model rankings, and a 91% hit ratio on best model recommendation. Moreover, the computational cost of meta-modeling recommendation is significantly reduced from an order of minutes to seconds compared to traditional trial-and-error and ensemble process. The proposed framework can significantly advance the research in meta-modeling recommendation, and can be applied for data-driven system modeling
Algorithm Selection Framework: A Holistic Approach to the Algorithm Selection Problem
A holistic approach to the algorithm selection problem is presented. The “algorithm selection framework uses a combination of user input and meta-data to streamline the algorithm selection for any data analysis task. The framework removes the conjecture of the common trial and error strategy and generates a preference ranked list of recommended analysis techniques. The framework is performed on nine analysis problems. Each of the recommended analysis techniques are implemented on the corresponding data sets. Algorithm performance is assessed using the primary metric of recall and the secondary metric of run time. In six of the problems, the recall of the top ranked recommendation is considered excellent with at least 95 percent of the best observed recall; the average of this metric is 79 percent due to two poorly performing recommendations. The top recommendation is Pareto efficient for three of the problems. The framework measures well against an a-priori set of criteria. The framework provides value by filtering the candidate of analytic techniques and, often, selecting a high performing technique as the top ranked recommendation. The user input and meta-data used by the framework contain information with high potential for effective algorithm selection. Future work should optimize the recommendation logic and expand the scope of techniques for other types of analysis problems. Further, the results of this proposed study should be leveraged in order to better understand the behavior of meta-learning models
An intelligent decision support system for machine learning algorithms recommendation
Machine learning is a very central topic in Artificial Intelligence and even
computer science in general. Nowadays, its use in Big Data problems is quite
well known. However, while the big data, and machine learning problems in
general, are quite varied and in needing of different kinds of solutions, there are
as well many different methods in machine learning that can be used. In this
work, we propose an application that might help deciding on which machine
learning methods a user needs for a specified problem.
The application is an Intelligent Decision Support System for Machine
Learning Algorithm Recommendation for which we present the design, which
is centered around the combined use of the Case-Based Reasoning and RuleBased
Reasoning, for the recommending process, while also trying to make
the system easy to use and manage. We present a prototype of such a system,
and the implementation details of the two recommender algorithms. The
preliminary testing of the prototype shows it to be a promising tool
On the predictive power of meta-features in OpenML
The demand for performing data analysis is steadily rising. As a consequence, people of different profiles (i.e., non-experienced users) have started to analyze their data. However, this is challenging for them. A key step that poses difficulties and determines the success of the analysis is data mining (model/algorithm selection problem). Meta-learning is a technique used for assisting non-expert users in this step. The effectiveness of meta-learning is, however, largely dependent on the description/characterization of datasets (i.e., meta-features used for meta-learning). There is a need for improving the effectiveness of meta-learning by identifying and designing more predictive meta-features. In this work, we use a method from exploratory factor analysis to study the predictive power of different meta-features collected in OpenML, which is a collaborative machine learning platform that is designed to store and organize meta-data about datasets, data mining algorithms, models and their evaluations. We first use the method to extract latent features, which are abstract concepts that group together meta-features with common characteristics. Then, we study and visualize the relationship of the latent features with three different performance measures of four classification algorithms on hundreds of datasets available in OpenML, and we select the latent features with the highest predictive power. Finally, we use the selected latent features to perform meta-learning and we show that our method improves the meta-learning process. Furthermore, we design an easy to use application for retrieving different meta-data from OpenML as the biggest source of data in this domain.Peer ReviewedPostprint (published version
A Model of Inductive Bias Learning
A major problem in machine learning is that of inductive bias: how to choose
a learner's hypothesis space so that it is large enough to contain a solution
to the problem being learnt, yet small enough to ensure reliable generalization
from reasonably-sized training sets. Typically such bias is supplied by hand
through the skill and insights of experts. In this paper a model for
automatically learning bias is investigated. The central assumption of the
model is that the learner is embedded within an environment of related learning
tasks. Within such an environment the learner can sample from multiple tasks,
and hence it can search for a hypothesis space that contains good solutions to
many of the problems in the environment. Under certain restrictions on the set
of all hypothesis spaces available to the learner, we show that a hypothesis
space that performs well on a sufficiently large number of training tasks will
also perform well when learning novel tasks in the same environment. Explicit
bounds are also derived demonstrating that learning multiple tasks within an
environment of related tasks can potentially give much better generalization
than learning a single task