135,387 research outputs found
SU-RUG at the CoNLL-SIGMORPHON 2017 shared task: Morphological Inflection with Attentional Sequence-to-Sequence Models
This paper describes the Stockholm University/University of Groningen
(SU-RUG) system for the SIGMORPHON 2017 shared task on morphological
inflection. Our system is based on an attentional sequence-to-sequence neural
network model using Long Short-Term Memory (LSTM) cells, with joint training of
morphological inflection and the inverse transformation, i.e. lemmatization and
morphological analysis. Our system outperforms the baseline with a large
margin, and our submission ranks as the 4th best team for the track we
participate in (task 1, high-resource).Comment: 4 pages, to appear at CoNLL-SIGMORPHON 201
MBT: A Memory-Based Part of Speech Tagger-Generator
We introduce a memory-based approach to part of speech tagging. Memory-based
learning is a form of supervised learning based on similarity-based reasoning.
The part of speech tag of a word in a particular context is extrapolated from
the most similar cases held in memory. Supervised learning approaches are
useful when a tagged corpus is available as an example of the desired output of
the tagger. Based on such a corpus, the tagger-generator automatically builds a
tagger which is able to tag new text the same way, diminishing development time
for the construction of a tagger considerably. Memory-based tagging shares this
advantage with other statistical or machine learning approaches. Additional
advantages specific to a memory-based approach include (i) the relatively small
tagged corpus size sufficient for training, (ii) incremental learning, (iii)
explanation capabilities, (iv) flexible integration of information in case
representations, (v) its non-parametric nature, (vi) reasonably good results on
unknown words without morphological analysis, and (vii) fast learning and
tagging. In this paper we show that a large-scale application of the
memory-based approach is feasible: we obtain a tagging accuracy that is on a
par with that of known statistical approaches, and with attractive space and
time complexity properties when using {\em IGTree}, a tree-based formalism for
indexing and searching huge case bases.} The use of IGTree has as additional
advantage that optimal context size for disambiguation is dynamically computed.Comment: 14 pages, 2 Postscript figure
A novel hybrid algorithm for morphological analysis: artificial Neural-Net-XMOR
In this study, we present a novel algorithm that combines a rule-based approach and an artificial neural network-based approach in morphological analysis. The usage of hybrid models including both techniques is evaluated for performance improvements. The proposed hybrid algorithm is based on the idea of the dynamic generation of an artificial neural network according to two-level phonological rules. In this study, the combination of linguistic parsing, a neural network-based error correction model, and statistical filtering is utilized to increase the coverage of pure morphological analysis. We experimented hybrid algorithm applying rule-based and long short-term memory-based (LSTM-based) techniques, and the results show that we improved the morphological analysis performance for optical character recognizer (OCR) and social media data. Thus, for the new hybrid algorithm with LSTM, the accuracy reached 99.91% for the OCR dataset and 99.82% for social media data. © TÜBİTAK
Distributed Component Forests in 2-D:Hierarchical Image Representations Suitable for Tera-Scale Images
The standard representations known as component trees, used in morphological connected attribute filtering and multi-scale analysis, are unsuitable for cases in which either the image itself or the tree do not fit in the memory of a single compute node. Recently, a new structure has been developed which consists of a collection of modified component trees, one for each image tile. It has to-date only been applied to fairly simple image filtering based on area. In this paper, we explore other applications of these distributed component forests, in particular to multi-scale analysis such as pattern spectra, and morphological attribute profiles and multi-scale leveling segmentations
Morphological Analysis as Classification: an Inductive-Learning Approach
Morphological analysis is an important subtask in text-to-speech conversion,
hyphenation, and other language engineering tasks. The traditional approach to
performing morphological analysis is to combine a morpheme lexicon, sets of
(linguistic) rules, and heuristics to find a most probable analysis. In
contrast we present an inductive learning approach in which morphological
analysis is reformulated as a segmentation task. We report on a number of
experiments in which five inductive learning algorithms are applied to three
variations of the task of morphological analysis. Results show (i) that the
generalisation performance of the algorithms is good, and (ii) that the lazy
learning algorithm IB1-IG performs best on all three tasks. We conclude that
lazy learning of morphological analysis as a classification task is indeed a
viable approach; moreover, it has the strong advantages over the traditional
approach of avoiding the knowledge-acquisition bottleneck, being fast and
deterministic in learning and processing, and being language-independent.Comment: 11 pages, 5 encapsulated postscript figures, uses non-standard NeMLaP
proceedings style nemlap.sty; inputs ipamacs (international phonetic
alphabet) and epsf macro
До відома авторів
We describe TADPOLE, a modular memory-based morphosyntactic tagger and dependency
parser for Dutch. Though primarily aimed at being accurate, the design of the system is
also driven by optimizing speed and memory usage, using a trie-based approximation of k-nearest neighbor classification as the basis of each module. We perform an evaluation
of its three main modules: a part-of-speech tagger, a morphological analyzer, and a dependency
parser, trained on manually annotated material available for Dutch – the parser is
additionally trained on automatically parsed data. A global analysis of the system shows
that it is able to process text in linear time close to an estimated 2,500 words per second,
while maintaining sufficient accuracy
- …