329 research outputs found
Random Prism: An Alternative to Random Forests.
Ensemble learning techniques generate multiple classifiers, so called base classifiers, whose combined classification results are used in order to increase the overall classification accuracy. In most ensemble classifiers the base classifiers are based on the Top Down Induction of Decision Trees (TDIDT) approach. However, an alternative approach for the induction of rule based classifiers is the Prism family of algorithms. Prism algorithms produce modular classification rules that do not necessarily fit into a decision tree structure. Prism classification rulesets achieve a comparable and sometimes higher classification accuracy compared with decision tree classifiers, if the data is noisy and large. Yet Prism still suffers from overfitting on noisy and large datasets. In practice ensemble techniques tend to reduce the overfitting, however there exists no ensemble learner for modular classification rule inducers such as the Prism family of algorithms. This article describes the first development of an ensemble learner based on the Prism family of algorithms in order to enhance Prism’s classification accuracy by reducing overfitting
A Machine Learning Based Analytical Framework for Semantic Annotation Requirements
The Semantic Web is an extension of the current web in which information is
given well-defined meaning. The perspective of Semantic Web is to promote the
quality and intelligence of the current web by changing its contents into
machine understandable form. Therefore, semantic level information is one of
the cornerstones of the Semantic Web. The process of adding semantic metadata
to web resources is called Semantic Annotation. There are many obstacles
against the Semantic Annotation, such as multilinguality, scalability, and
issues which are related to diversity and inconsistency in content of different
web pages. Due to the wide range of domains and the dynamic environments that
the Semantic Annotation systems must be performed on, the problem of automating
annotation process is one of the significant challenges in this domain. To
overcome this problem, different machine learning approaches such as supervised
learning, unsupervised learning and more recent ones like, semi-supervised
learning and active learning have been utilized. In this paper we present an
inclusive layered classification of Semantic Annotation challenges and discuss
the most important issues in this field. Also, we review and analyze machine
learning applications for solving semantic annotation problems. For this goal,
the article tries to closely study and categorize related researches for better
understanding and to reach a framework that can map machine learning techniques
into the Semantic Annotation challenges and requirements
Automated Construction of Relational Attributes ACORA: A Progress Report
Data mining research has not only development a large number of algorithms, but also
enhanced our knowledge and understanding of their applicability and performance.
However, the application of data mining technology in business environments is still no
very common, despite the fact that organizations have access to large amounts of data
and make decisions that could profit from data mining on a daily basis. One of the
reasons is the mismatch between data representation for data storage and data analysis.
Data are most commonly stored in multi-table relational databases whereas data mining
methods require that the data be represented as a simple feature vector. This work
presents a general framework for feature construction from multiple relational tables for
data mining applications. The second part describes our prototype implementation
ACORA (Automated Construction of Relational Features).Information Systems Working Papers Serie
Meta learning of bounds on the Bayes classifier error
Meta learning uses information from base learners (e.g. classifiers or
estimators) as well as information about the learning problem to improve upon
the performance of a single base learner. For example, the Bayes error rate of
a given feature space, if known, can be used to aid in choosing a classifier,
as well as in feature selection and model selection for the base classifiers
and the meta classifier. Recent work in the field of f-divergence functional
estimation has led to the development of simple and rapidly converging
estimators that can be used to estimate various bounds on the Bayes error. We
estimate multiple bounds on the Bayes error using an estimator that applies
meta learning to slowly converging plug-in estimators to obtain the parametric
convergence rate. We compare the estimated bounds empirically on simulated data
and then estimate the tighter bounds on features extracted from an image patch
analysis of sunspot continuum and magnetogram images.Comment: 6 pages, 3 figures, to appear in proceedings of 2015 IEEE Signal
Processing and SP Education Worksho
Detecting Phishing E-mails by Heterogeneous Classification
This paper presents a system for classifying e-mails into two
categories, legitimate and fraudulent. This classifier system is based on the
serial application of three filters: a Bayesian filter that classifies the textual
content of e-mails, a rule- based filter that classifies the non grammatical
content of e-mails and, finally, a filter based on an emulator of fictitious
accesses which classifies the responses from websites referenced by links
contained in e-mails. This system is based on an approach that is hybrid,
because it uses different classification methods, and also integrated, because it
takes into account all kind of data and information contained in e-mails. This
approach aims to provide an effective and efficient classification. The system
first applies fast and reliable classification methods, and only when the resulting
classification decision is imprecise does the system apply more complex
analysis and classification methods.Peer reviewe
- …