Search CORE

135,387 research outputs found

Memory-based morphological analysis

Author: Daelemans W.
van den Bosch A.
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/1999
Field of study

SU-RUG at the CoNLL-SIGMORPHON 2017 shared task: Morphological Inflection with Attentional Sequence-to-Sequence Models

Author: Bjerva Johannes
Östling Robert
Publication venue
Publication date: 01/01/2017
Field of study

This paper describes the Stockholm University/University of Groningen (SU-RUG) system for the SIGMORPHON 2017 shared task on morphological inflection. Our system is based on an attentional sequence-to-sequence neural network model using Long Short-Term Memory (LSTM) cells, with joint training of morphological inflection and the inverse transformation, i.e. lemmatization and morphological analysis. Our system outperforms the baseline with a large margin, and our submission ranks as the 4th best team for the track we participate in (task 1, high-resource).Comment: 4 pages, to appear at CoNLL-SIGMORPHON 201

arXiv.org e-Print Archive

Crossref

Proceedings - University of Groningen

Publikationer från Stockholms universitet

University of Groningen

ARTS repository - University of Groningen

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Dissertations of the University of Groningen

MBT: A Memory-Based Part of Speech Tagger-Generator

Author: Berck Peter
Daelemans Walter
Gillis Steven
Zavrel Jakub
Publication venue
Publication date: 01/01/1996
Field of study

We introduce a memory-based approach to part of speech tagging. Memory-based learning is a form of supervised learning based on similarity-based reasoning. The part of speech tag of a word in a particular context is extrapolated from the most similar cases held in memory. Supervised learning approaches are useful when a tagged corpus is available as an example of the desired output of the tagger. Based on such a corpus, the tagger-generator automatically builds a tagger which is able to tag new text the same way, diminishing development time for the construction of a tagger considerably. Memory-based tagging shares this advantage with other statistical or machine learning approaches. Additional advantages specific to a memory-based approach include (i) the relatively small tagged corpus size sufficient for training, (ii) incremental learning, (iii) explanation capabilities, (iv) flexible integration of information in case representations, (v) its non-parametric nature, (vi) reasonably good results on unknown words without morphological analysis, and (vii) fast learning and tagging. In this paper we show that a large-scale application of the memory-based approach is feasible: we obtain a tagging accuracy that is on a par with that of known statistical approaches, and with attractive space and time complexity properties when using {\em IGTree}, a tree-based formalism for indexing and searching huge case bases.} The use of IGTree has as additional advantage that optimal context size for disambiguation is dynamically computed.Comment: 14 pages, 2 Postscript figure

arXiv.org e-Print Archive

CiteSeerX

Institutional Repository Universiteit Antwerpen

Tilburg University Repository

A novel hybrid algorithm for morphological analysis: artificial Neural-Net-XMOR

Author: Kayabaş Ayla
Kılıç Özkan
Topçu Ahmet E.
Publication venue: The Scientific and Technological Research Council of Turkey (TUBITAK-ULAKBIM) - DIGITAL COMMONS JOURNALS
Publication date: 01/01/2022
Field of study

In this study, we present a novel algorithm that combines a rule-based approach and an artificial neural network-based approach in morphological analysis. The usage of hybrid models including both techniques is evaluated for performance improvements. The proposed hybrid algorithm is based on the idea of the dynamic generation of an artificial neural network according to two-level phonological rules. In this study, the combination of linguistic parsing, a neural network-based error correction model, and statistical filtering is utilized to increase the coverage of pure morphological analysis. We experimented hybrid algorithm applying rule-based and long short-term memory-based (LSTM-based) techniques, and the results show that we improved the morphological analysis performance for optical character recognizer (OCR) and social media data. Thus, for the new hybrid algorithm with LSTM, the accuracy reached 99.91% for the OCR dataset and 99.82% for social media data. © TÜBİTAK

Kırşehir Ahi Evran University Institutional Repository

Memory-based morphological analysis generation and part-of-speech tagging of Arabic

Author: Marsi E.C.
Soudi A.
van den Bosch A.
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2005
Field of study

Tilburg University Repository

Distributed Component Forests in 2-D:Hierarchical Image Representations Suitable for Tera-Scale Images

Author: Gazagnes Simon
Wilkinson Michael H. F.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/10/2019
Field of study

The standard representations known as component trees, used in morphological connected attribute filtering and multi-scale analysis, are unsuitable for cases in which either the image itself or the tree do not fit in the memory of a single compute node. Recently, a new structure has been developed which consists of a collection of modified component trees, one for each image tile. It has to-date only been applied to fairly simple image filtering based on area. In this paper, we explore other applications of these distributed component forests, in particular to multi-scale analysis such as pattern spectra, and morphological attribute profiles and multi-scale leveling segmentations

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Morphological Analysis as Classification: an Inductive-Learning Approach

Author: Bosch Antal van den
Daelemans Walter
Weijters Ton
Publication venue
Publication date: 01/01/1996
Field of study

Morphological analysis is an important subtask in text-to-speech conversion, hyphenation, and other language engineering tasks. The traditional approach to performing morphological analysis is to combine a morpheme lexicon, sets of (linguistic) rules, and heuristics to find a most probable analysis. In contrast we present an inductive learning approach in which morphological analysis is reformulated as a segmentation task. We report on a number of experiments in which five inductive learning algorithms are applied to three variations of the task of morphological analysis. Results show (i) that the generalisation performance of the algorithms is good, and (ii) that the lazy learning algorithm IB1-IG performs best on all three tasks. We conclude that lazy learning of morphological analysis as a classification task is indeed a viable approach; moreover, it has the strong advantages over the traditional approach of avoiding the knowledge-acquisition bottleneck, being fast and deterministic in learning and processing, and being language-independent.Comment: 11 pages, 5 encapsulated postscript figures, uses non-standard NeMLaP proceedings style nemlap.sty; inputs ipamacs (international phonetic alphabet) and epsf macro

arXiv.org e-Print Archive

CiteSeerX

Institutional Repository Universiteit Antwerpen

Tilburg University Repository

До відома авторів

Author: Bosch A. van den
Busser G.J.
Canisius S.V.M.
Daelemans W.
Publication venue: Інститут проблем штучного інтелекту МОН України та НАН України
Publication date: 01/01/2007
Field of study

We describe TADPOLE, a modular memory-based morphosyntactic tagger and dependency parser for Dutch. Though primarily aimed at being accurate, the design of the system is also driven by optimizing speed and memory usage, using a trie-based approximation of k-nearest neighbor classification as the basis of each module. We perform an evaluation of its three main modules: a part-of-speech tagger, a morphological analyzer, and a dependency parser, trained on manually annotated material available for Dutch – the parser is additionally trained on automatically parsed data. A global analysis of the system shows that it is able to process text in linear time close to an estimated 2,500 words per second, while maintaining sufficient accuracy

Наукова електронна бібліотека періодичних видань НАН України (Vernadsky National Library of Ukraine)

Institutional Repository Universiteit Antwerpen

Utrecht University Repository

Tilburg University Repository