25,151 research outputs found
MBT: A Memory-Based Part of Speech Tagger-Generator
We introduce a memory-based approach to part of speech tagging. Memory-based
learning is a form of supervised learning based on similarity-based reasoning.
The part of speech tag of a word in a particular context is extrapolated from
the most similar cases held in memory. Supervised learning approaches are
useful when a tagged corpus is available as an example of the desired output of
the tagger. Based on such a corpus, the tagger-generator automatically builds a
tagger which is able to tag new text the same way, diminishing development time
for the construction of a tagger considerably. Memory-based tagging shares this
advantage with other statistical or machine learning approaches. Additional
advantages specific to a memory-based approach include (i) the relatively small
tagged corpus size sufficient for training, (ii) incremental learning, (iii)
explanation capabilities, (iv) flexible integration of information in case
representations, (v) its non-parametric nature, (vi) reasonably good results on
unknown words without morphological analysis, and (vii) fast learning and
tagging. In this paper we show that a large-scale application of the
memory-based approach is feasible: we obtain a tagging accuracy that is on a
par with that of known statistical approaches, and with attractive space and
time complexity properties when using {\em IGTree}, a tree-based formalism for
indexing and searching huge case bases.} The use of IGTree has as additional
advantage that optimal context size for disambiguation is dynamically computed.Comment: 14 pages, 2 Postscript figure
Statistical Mechanical Development of a Sparse Bayesian Classifier
The demand for extracting rules from high dimensional real world data is
increasing in various fields. However, the possible redundancy of such data
sometimes makes it difficult to obtain a good generalization ability for novel
samples. To resolve this problem, we provide a scheme that reduces the
effective dimensions of data by pruning redundant components for bicategorical
classification based on the Bayesian framework. First, the potential of the
proposed method is confirmed in ideal situations using the replica method.
Unfortunately, performing the scheme exactly is computationally difficult. So,
we next develop a tractable approximation algorithm, which turns out to offer
nearly optimal performance in ideal cases when the system size is large.
Finally, the efficacy of the developed classifier is experimentally examined
for a real world problem of colon cancer classification, which shows that the
developed method can be practically useful.Comment: 13 pages, 6 figure
Ensembles of wrappers for automated feature selection in fish age classification
In feature selection, the most important features must be chosen so as to decrease the number thereof while retaining their discriminatory information. Within this context, a novel feature selection method based on an ensemble of wrappers is proposed and applied for automatically select features in fish age classification. The effectiveness of this procedure using an Atlantic cod database has been tested for different powerful statistical learning classifiers. The subsets based on few features selected, e.g. otolith weight and fish weight, are particularly noticeable given current biological findings and practices in fishery research and the classification results obtained with them outperforms those of previous studies in which a manual feature selection was performed.Peer ReviewedPostprint (author's final draft
Deep Gaussian Processes
In this paper we introduce deep Gaussian process (GP) models. Deep GPs are a
deep belief network based on Gaussian process mappings. The data is modeled as
the output of a multivariate GP. The inputs to that Gaussian process are then
governed by another GP. A single layer model is equivalent to a standard GP or
the GP latent variable model (GP-LVM). We perform inference in the model by
approximate variational marginalization. This results in a strict lower bound
on the marginal likelihood of the model which we use for model selection
(number of layers and nodes per layer). Deep belief networks are typically
applied to relatively large data sets using stochastic gradient descent for
optimization. Our fully Bayesian treatment allows for the application of deep
models even when data is scarce. Model selection by our variational bound shows
that a five layer hierarchy is justified even when modelling a digit data set
containing only 150 examples.Comment: 9 pages, 8 figures. Appearing in Proceedings of the 16th
International Conference on Artificial Intelligence and Statistics (AISTATS)
201
- …