27,249 research outputs found
Classifier selection with permutation tests
This work presents a content-based recommender system for machine learning classifier algorithms. Given a new data set, a recommendation of what classifier is likely to perform best is made based on classifier performance over similar known data sets. This similarity is measured according to a data set characterization that includes several state-of-the-art metrics taking into account physical structure, statistics, and information theory. A novelty with respect to prior work is the use of a robust approach based on permutation tests to directly assess whether a given learning algorithm is able to exploit the attributes in a data set to predict class labels, and compare it to the more commonly used F-score metric for evaluating classifier performance. To evaluate our approach, we have conducted an extensive experimentation including 8 of the main machine learning classification methods with varying configurations and 65 binary data sets, leading to over 2331 experiments. Our results show that using the information from the permutation test clearly improves the quality of the recommendations.Peer ReviewedPostprint (author's final draft
Algorithm Selection Framework for Cyber Attack Detection
The number of cyber threats against both wired and wireless computer systems
and other components of the Internet of Things continues to increase annually.
In this work, an algorithm selection framework is employed on the NSL-KDD data
set and a novel paradigm of machine learning taxonomy is presented. The
framework uses a combination of user input and meta-features to select the best
algorithm to detect cyber attacks on a network. Performance is compared between
a rule-of-thumb strategy and a meta-learning strategy. The framework removes
the conjecture of the common trial-and-error algorithm selection method. The
framework recommends five algorithms from the taxonomy. Both strategies
recommend a high-performing algorithm, though not the best performing. The work
demonstrates the close connectedness between algorithm selection and the
taxonomy for which it is premised.Comment: 6 pages, 7 figures, 1 table, accepted to WiseML '2
Knowledge Graph semantic enhancement of input data for improving AI
Intelligent systems designed using machine learning algorithms require a
large number of labeled data. Background knowledge provides complementary, real
world factual information that can augment the limited labeled data to train a
machine learning algorithm. The term Knowledge Graph (KG) is in vogue as for
many practical applications, it is convenient and useful to organize this
background knowledge in the form of a graph. Recent academic research and
implemented industrial intelligent systems have shown promising performance for
machine learning algorithms that combine training data with a knowledge graph.
In this article, we discuss the use of relevant KGs to enhance input data for
two applications that use machine learning -- recommendation and community
detection. The KG improves both accuracy and explainability
Recommender Systems
The ongoing rapid expansion of the Internet greatly increases the necessity
of effective recommender systems for filtering the abundant information.
Extensive research for recommender systems is conducted by a broad range of
communities including social and computer scientists, physicists, and
interdisciplinary researchers. Despite substantial theoretical and practical
achievements, unification and comparison of different approaches are lacking,
which impedes further advances. In this article, we review recent developments
in recommender systems and discuss the major challenges. We compare and
evaluate available algorithms and examine their roles in the future
developments. In addition to algorithms, physical aspects are described to
illustrate macroscopic behavior of recommender systems. Potential impacts and
future directions are discussed. We emphasize that recommendation has a great
scientific depth and combines diverse research fields which makes it of
interests for physicists as well as interdisciplinary researchers.Comment: 97 pages, 20 figures (To appear in Physics Reports
One-Class Classification: Taxonomy of Study and Review of Techniques
One-class classification (OCC) algorithms aim to build classification models
when the negative class is either absent, poorly sampled or not well defined.
This unique situation constrains the learning of efficient classifiers by
defining class boundary just with the knowledge of positive class. The OCC
problem has been considered and applied under many research themes, such as
outlier/novelty detection and concept learning. In this paper we present a
unified view of the general problem of OCC by presenting a taxonomy of study
for OCC problems, which is based on the availability of training data,
algorithms used and the application domains applied. We further delve into each
of the categories of the proposed taxonomy and present a comprehensive
literature review of the OCC algorithms, techniques and methodologies with a
focus on their significance, limitations and applications. We conclude our
paper by discussing some open research problems in the field of OCC and present
our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure
- …