8,533 research outputs found
On the design of an ECOC-compliant genetic algorithm
Genetic Algorithms (GA) have been previously applied to Error-Correcting Output Codes (ECOC) in state-of-the-art works in order to find a suitable coding matrix. Nevertheless, none of the presented techniques directly take into account the properties of the ECOC matrix. As a result the considered search space is unnecessarily large. In this paper, a novel Genetic strategy to optimize the ECOC coding step is presented. This novel strategy redefines the usual crossover and mutation operators in order to take into account the theoretical properties of the ECOC framework. Thus, it reduces the search space and lets the algorithm to converge faster. In addition, a novel operator that is able to enlarge the code in a smart way is introduced. The novel methodology is tested on several UCI datasets and four challenging computer vision problems. Furthermore, the analysis of the results done in terms of performance, code length and number of Support Vectors shows that the optimization process is able to find very efficient codes, in terms of the trade-off between classification performance and the number of classifiers. Finally, classification performance per dichotomizer results shows that the novel proposal is able to obtain similar or even better results while defining a more compact number of dichotomies and SVs compared to state-of-the-art approaches
Chromatic PAC-Bayes Bounds for Non-IID Data: Applications to Ranking and Stationary -Mixing Processes
Pac-Bayes bounds are among the most accurate generalization bounds for
classifiers learned from independently and identically distributed (IID) data,
and it is particularly so for margin classifiers: there have been recent
contributions showing how practical these bounds can be either to perform model
selection (Ambroladze et al., 2007) or even to directly guide the learning of
linear classifiers (Germain et al., 2009). However, there are many practical
situations where the training data show some dependencies and where the
traditional IID assumption does not hold. Stating generalization bounds for
such frameworks is therefore of the utmost interest, both from theoretical and
practical standpoints. In this work, we propose the first - to the best of our
knowledge - Pac-Bayes generalization bounds for classifiers trained on data
exhibiting interdependencies. The approach undertaken to establish our results
is based on the decomposition of a so-called dependency graph that encodes the
dependencies within the data, in sets of independent data, thanks to graph
fractional covers. Our bounds are very general, since being able to find an
upper bound on the fractional chromatic number of the dependency graph is
sufficient to get new Pac-Bayes bounds for specific settings. We show how our
results can be used to derive bounds for ranking statistics (such as Auc) and
classifiers trained on data distributed according to a stationary {\ss}-mixing
process. In the way, we show how our approach seemlessly allows us to deal with
U-processes. As a side note, we also provide a Pac-Bayes generalization bound
for classifiers learned on data from stationary -mixing distributions.Comment: Long version of the AISTATS 09 paper:
http://jmlr.csail.mit.edu/proceedings/papers/v5/ralaivola09a/ralaivola09a.pd
Machine Learning in Automated Text Categorization
The automated categorization (or classification) of texts into predefined
categories has witnessed a booming interest in the last ten years, due to the
increased availability of documents in digital form and the ensuing need to
organize them. In the research community the dominant approach to this problem
is based on machine learning techniques: a general inductive process
automatically builds a classifier by learning, from a set of preclassified
documents, the characteristics of the categories. The advantages of this
approach over the knowledge engineering approach (consisting in the manual
definition of a classifier by domain experts) are a very good effectiveness,
considerable savings in terms of expert manpower, and straightforward
portability to different domains. This survey discusses the main approaches to
text categorization that fall within the machine learning paradigm. We will
discuss in detail issues pertaining to three different problems, namely
document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey
Analysis of group evolution prediction in complex networks
In the world, in which acceptance and the identification with social
communities are highly desired, the ability to predict evolution of groups over
time appears to be a vital but very complex research problem. Therefore, we
propose a new, adaptable, generic and mutli-stage method for Group Evolution
Prediction (GEP) in complex networks, that facilitates reasoning about the
future states of the recently discovered groups. The precise GEP modularity
enabled us to carry out extensive and versatile empirical studies on many
real-world complex / social networks to analyze the impact of numerous setups
and parameters like time window type and size, group detection method,
evolution chain length, prediction models, etc. Additionally, many new
predictive features reflecting the group state at a given time have been
identified and tested. Some other research problems like enriching learning
evolution chains with external data have been analyzed as well
- …