28,897 research outputs found
The zero exemplar distance problem
Given two genomes with duplicate genes, \textsc{Zero Exemplar Distance} is
the problem of deciding whether the two genomes can be reduced to the same
genome without duplicate genes by deleting all but one copy of each gene in
each genome. Blin, Fertin, Sikora, and Vialette recently proved that
\textsc{Zero Exemplar Distance} for monochromosomal genomes is NP-hard even if
each gene appears at most two times in each genome, thereby settling an
important open question on genome rearrangement in the exemplar model. In this
paper, we give a very simple alternative proof of this result. We also study
the problem \textsc{Zero Exemplar Distance} for multichromosomal genomes
without gene order, and prove the analogous result that it is also NP-hard even
if each gene appears at most two times in each genome. For the positive
direction, we show that both variants of \textsc{Zero Exemplar Distance} admit
polynomial-time algorithms if each gene appears exactly once in one genome and
at least once in the other genome. In addition, we present a polynomial-time
algorithm for the related problem \textsc{Exemplar Longest Common Subsequence}
in the special case that each mandatory symbol appears exactly once in one
input sequence and at least once in the other input sequence. This answers an
open question of Bonizzoni et al. We also show that \textsc{Zero Exemplar
Distance} for multichromosomal genomes without gene order is fixed-parameter
tractable if the parameter is the maximum number of chromosomes in each genome.Comment: Strengthened and reorganize
Naive Bayes and Exemplar-Based approaches to Word Sense Disambiguation Revisited
This paper describes an experimental comparison between two standard
supervised learning methods, namely Naive Bayes and Exemplar-based
classification, on the Word Sense Disambiguation (WSD) problem. The aim of the
work is twofold. Firstly, it attempts to contribute to clarify some confusing
information about the comparison between both methods appearing in the related
literature. In doing so, several directions have been explored, including:
testing several modifications of the basic learning algorithms and varying the
feature space. Secondly, an improvement of both algorithms is proposed, in
order to deal with large attribute sets. This modification, which basically
consists in using only the positive information appearing in the examples,
allows to improve greatly the efficiency of the methods, with no loss in
accuracy. The experiments have been performed on the largest sense-tagged
corpus available containing the most frequent and ambiguous English words.
Results show that the Exemplar-based approach to WSD is generally superior to
the Bayesian approach, especially when a specific metric for dealing with
symbolic attributes is used.Comment: 5 page
Clustering by soft-constraint affinity propagation: Applications to gene-expression data
Motivation: Similarity-measure based clustering is a crucial problem
appearing throughout scientific data analysis. Recently, a powerful new
algorithm called Affinity Propagation (AP) based on message-passing techniques
was proposed by Frey and Dueck \cite{Frey07}. In AP, each cluster is identified
by a common exemplar all other data points of the same cluster refer to, and
exemplars have to refer to themselves. Albeit its proved power, AP in its
present form suffers from a number of drawbacks. The hard constraint of having
exactly one exemplar per cluster restricts AP to classes of regularly shaped
clusters, and leads to suboptimal performance, {\it e.g.}, in analyzing gene
expression data. Results: This limitation can be overcome by relaxing the AP
hard constraints. A new parameter controls the importance of the constraints
compared to the aim of maximizing the overall similarity, and allows to
interpolate between the simple case where each data point selects its closest
neighbor as an exemplar and the original AP. The resulting soft-constraint
affinity propagation (SCAP) becomes more informative, accurate and leads to
more stable clustering. Even though a new {\it a priori} free-parameter is
introduced, the overall dependence of the algorithm on external tuning is
reduced, as robustness is increased and an optimal strategy for parameter
selection emerges more naturally. SCAP is tested on biological benchmark data,
including in particular microarray data related to various cancer types. We
show that the algorithm efficiently unveils the hierarchical cluster structure
present in the data sets. Further on, it allows to extract sparse gene
expression signatures for each cluster.Comment: 11 pages, supplementary material:
http://isiosf.isi.it/~weigt/scap_supplement.pd
Learning feed-forward one-shot learners
One-shot learning is usually tackled by using generative models or
discriminative embeddings. Discriminative methods based on deep learning, which
are very effective in other learning scenarios, are ill-suited for one-shot
learning as they need large amounts of training data. In this paper, we propose
a method to learn the parameters of a deep model in one shot. We construct the
learner as a second deep network, called a learnet, which predicts the
parameters of a pupil network from a single exemplar. In this manner we obtain
an efficient feed-forward one-shot learner, trained end-to-end by minimizing a
one-shot classification objective in a learning to learn formulation. In order
to make the construction feasible, we propose a number of factorizations of the
parameters of the pupil network. We demonstrate encouraging results by learning
characters from single exemplars in Omniglot, and by tracking visual objects
from a single initial exemplar in the Visual Object Tracking benchmark.Comment: The first three authors contributed equally, and are listed in
alphabetical orde
Secondary generalisation in categorisation: an exemplar-based account
The parallel rule activation and rule synthesis (PRAS) model is a computational model for generalisation in category learning, proposed by Vandierendonck (1995). An important concept underlying the PRAS model is the distinction between primary and secondary generalisation. In Vandierendonck (1995), an empirical study is reported that provides support for the concept of secondary generalisation. In this paper, we re-analyse the data reported by Vandierendonck (1995) by fitting three different variants of the Generalised Context Model (GCM) which do not rely on secondary generalisation. Although some of the GCM variants outperformed the PRAS model in terms of global fit, they all have difficulty in providing a qualitatively good fit of a specific critical pattern
- …