3,627 research outputs found
Advances in Hyperspectral Image Classification: Earth monitoring with statistical learning methods
Hyperspectral images show similar statistical properties to natural grayscale
or color photographic images. However, the classification of hyperspectral
images is more challenging because of the very high dimensionality of the
pixels and the small number of labeled examples typically available for
learning. These peculiarities lead to particular signal processing problems,
mainly characterized by indetermination and complex manifolds. The framework of
statistical learning has gained popularity in the last decade. New methods have
been presented to account for the spatial homogeneity of images, to include
user's interaction via active learning, to take advantage of the manifold
structure with semisupervised learning, to extract and encode invariances, or
to adapt classifiers and image representations to unseen yet similar scenes.
This tutuorial reviews the main advances for hyperspectral remote sensing image
classification through illustrative examples.Comment: IEEE Signal Processing Magazine, 201
Nonparametric Bayesian Double Articulation Analyzer for Direct Language Acquisition from Continuous Speech Signals
Human infants can discover words directly from unsegmented speech signals
without any explicitly labeled data. In this paper, we develop a novel machine
learning method called nonparametric Bayesian double articulation analyzer
(NPB-DAA) that can directly acquire language and acoustic models from observed
continuous speech signals. For this purpose, we propose an integrative
generative model that combines a language model and an acoustic model into a
single generative model called the "hierarchical Dirichlet process hidden
language model" (HDP-HLM). The HDP-HLM is obtained by extending the
hierarchical Dirichlet process hidden semi-Markov model (HDP-HSMM) proposed by
Johnson et al. An inference procedure for the HDP-HLM is derived using the
blocked Gibbs sampler originally proposed for the HDP-HSMM. This procedure
enables the simultaneous and direct inference of language and acoustic models
from continuous speech signals. Based on the HDP-HLM and its inference
procedure, we developed a novel double articulation analyzer. By assuming
HDP-HLM as a generative model of observed time series data, and by inferring
latent variables of the model, the method can analyze latent double
articulation structure, i.e., hierarchically organized latent words and
phonemes, of the data in an unsupervised manner. The novel unsupervised double
articulation analyzer is called NPB-DAA.
The NPB-DAA can automatically estimate double articulation structure embedded
in speech signals. We also carried out two evaluation experiments using
synthetic data and actual human continuous speech signals representing Japanese
vowel sequences. In the word acquisition and phoneme categorization tasks, the
NPB-DAA outperformed a conventional double articulation analyzer (DAA) and
baseline automatic speech recognition system whose acoustic model was trained
in a supervised manner.Comment: 15 pages, 7 figures, Draft submitted to IEEE Transactions on
Autonomous Mental Development (TAMD
Morphological Analysis of the Dravidian Language Family
The Dravidian family is one of the most
widely spoken set of languages in the
world, yet there are very few annotated resources
available to NLP researchers. To
remedy this, we create DravMorph, a corpus
annotated for morphological segmentation
and part-of-speech. Also, we exploit
novel features and higher-order models to
achieve promising results on these corpora
on both tasks, beating techniques proposed
in the literature by as much as 4 points in
segmentation F1.Postprint (published version
Unsupervised learning of allomorphs in Turkish
© 2017 The Author. Published by The Scientific and Technological Research Council of Turkey. This is an open access article available under a Creative Commons licence.
The published version can be accessed at the following link on the publisher’s website: https://journals.tubitak.gov.tr/elektrik/issues/elk-17-25-4/elk-25-4-57-1605-216.pdfOne morpheme may have several surface forms that correspond to allomorphs. In English, ed and d are
surface forms of the past tense morpheme, and s, es, and ies are surface forms of the plural or present tense morpheme.
Turkish has a large number of allomorphs due to its morphophonemic processes. One morpheme can have tens of different
surface forms in Turkish. This leads to a sparsity problem in natural language processing tasks in Turkish. Detection
of allomorphs has not been studied much because of its difficulty. For example, t¨u and di are Turkish allomorphs (i.e.
past tense morpheme), but all of their letters are different. This paper presents an unsupervised model to extract the
allomorphs in Turkish. We are able to obtain an F-measure of 73.71% in the detection of allomorphs, and our model
outperforms previous unsupervised models on morpheme clustering.Published versio
Paradigm Completion for Derivational Morphology
The generation of complex derived word forms has been an overlooked problem
in NLP; we fill this gap by applying neural sequence-to-sequence models to the
task. We overview the theoretical motivation for a paradigmatic treatment of
derivational morphology, and introduce the task of derivational paradigm
completion as a parallel to inflectional paradigm completion. State-of-the-art
neural models, adapted from the inflection task, are able to learn a range of
derivation patterns, and outperform a non-neural baseline by 16.4%. However,
due to semantic, historical, and lexical considerations involved in
derivational morphology, future work will be needed to achieve performance
parity with inflection-generating systems.Comment: EMNLP 201
- …