612 research outputs found
Improved Second-Order Bounds for Prediction with Expert Advice
This work studies external regret in sequential prediction games with both
positive and negative payoffs. External regret measures the difference between
the payoff obtained by the forecasting strategy and the payoff of the best
action. In this setting, we derive new and sharper regret bounds for the
well-known exponentially weighted average forecaster and for a new forecaster
with a different multiplicative update rule. Our analysis has two main
advantages: first, no preliminary knowledge about the payoff sequence is
needed, not even its range; second, our bounds are expressed in terms of sums
of squared payoffs, replacing larger first-order quantities appearing in
previous bounds. In addition, our most refined bounds have the natural and
desirable property of being stable under rescalings and general translations of
the payoff sequence
Hierarchical cost-sensitive algorithms for genome-wide gene function prediction
In this work we propose new ensemble methods for the hierarchical classification of gene functions. Our methods exploit the hierarchical relationships between the classes in different ways: each ensemble node is trained \u201clocally\u201d, according to its position in the hierarchy; moreover, in the evaluation phase the set of predicted annotations is built so
to minimize a global loss function defined over the hierarchy. We also
address the problem of sparsity of annotations by introducing a cost-
sensitive parameter that allows to control the precision-recall trade-off.
Experiments with the model organism S. cerevisiae, using the FunCat
taxonomy and 7 biomolecular data sets, reveal a significant advantage of
our techniques over \u201cflat\u201d and cost-insensitive hierarchical ensembles
Competing with stationary prediction strategies
In this paper we introduce the class of stationary prediction strategies and
construct a prediction algorithm that asymptotically performs as well as the
best continuous stationary strategy. We make mild compactness assumptions but
no stochastic assumptions about the environment. In particular, no assumption
of stationarity is made about the environment, and the stationarity of the
considered strategies only means that they do not depend explicitly on time; we
argue that it is natural to consider only stationary strategies even for highly
non-stationary environments.Comment: 20 page
Functional inference in FunCat through the combination of hierarchical ensembles with data fusion methods
The multi-label hierarchical prediction of
gene functions at genome and ontology-wide
level is a central problem in bioinformatics, and raises challenging questions from a machine learning standpoint. In this context, multi-label hierarchical ensemble methods that take into account the hierarchical relationships between functional classes have been recently proposed. Various studies also showed that the integration of multiple sources of data is one of the key issues
to significantly improve gene function prediction. We propose an integrated approach
that combines local data fusion strategies
with global hierarchical multi-label methods.
The label unbalance typically occurring in
gene functional classes is taken into account
through the use of cost-sensitive techniques.
Ontology-wide results with the yeast model
organism, using the FunCat taxonomy, show
the effectiveness of the proposed methodological approach
A correlation clustering approach to link classification in signed networks
Motivated by social balance theory, we develop a theory of link classification in signed networks using the correlation clustering index as measure of label regularity. We derive learning bounds in terms of correlation clustering within three fundamental transductive learning settings: online, batch and active. Our main algorithmic contribution is in the active setting, where we introduce a new family of efficient link classifiers based on covering the input graph with small circuits. These are the first active algorithms for link classification with mistake bounds that hold for arbitrary signed networks
Leading strategies in competitive on-line prediction
We start from a simple asymptotic result for the problem of on-line
regression with the quadratic loss function: the class of continuous
limited-memory prediction strategies admits a "leading prediction strategy",
which not only asymptotically performs at least as well as any continuous
limited-memory strategy but also satisfies the property that the excess loss of
any continuous limited-memory strategy is determined by how closely it imitates
the leading strategy. More specifically, for any class of prediction strategies
constituting a reproducing kernel Hilbert space we construct a leading
strategy, in the sense that the loss of any prediction strategy whose norm is
not too large is determined by how closely it imitates the leading strategy.
This result is extended to the loss functions given by Bregman divergences and
by strictly proper scoring rules.Comment: 20 pages; a conference version is to appear in the ALT'2006
proceeding
- …