Search CORE

53 research outputs found

A Case Study of Algorithms for Morphosyntactic Tagging of Polish Language

Author: Chrzaszcz Paweł
Kitowski Jacek
Kuta Marcin
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 30/01/2012
Field of study

The paper presents an evaluation of several part-of-speech taggers, representing main tagging algorithms, applied to corpus of frequency dictionary of the contemporary Polish language. We report our results considering two tagging schemes: IPI PAN positional tagset and its simplified version. Tagging accuracy is calculated for different training sets and takes into account many subcategories (accuracy on known and unknown tokens, word segments, sentences etc.) The comparison of results with other inflecting and analytic languages is done. Performance aspects (time demands) of used tagging tools are also discussed

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Learning morphology with Morfette

Author: Chrupała Grzegorz
Dinu Georgiana
van Genabith Josef
Publication venue
Publication date: 01/01/2008
Field of study

Morfette is a modular, data-driven, probabilistic system which learns to perform joint morphological tagging and lemmatization from morphologically annotated corpora. The system is composed of two learning modules which are trained to predict morphological tags and lemmas using the Maximum Entropy classifier. The third module dynamically combines the predictions of the Maximum-Entropy models and outputs a probability distribution over tag-lemma pair sequences. The lemmatization module exploits the idea of recasting lemmatization as a classification task by using class labels which encode mappings from wordforms to lemmas. Experimental evaluation results and error analysis on three morphologically rich languages show that the system achieves high accuracy with no language-specific feature engineering or additional resources

CiteSeerX

DCU Online Research Access Service

A free/open-source hybrid morphological disambiguation tool for Kazakh

Author: Abduali Balzhan
Amirova Dina
Assylbekov Zhenisbek
Karibayeva Aidana
Nurkas Assulan
Sundetova Aida
Tyers Francis
Washington Jonathan
Publication venue: DOI: 10.13140/RG.2.2.12467.43045
Publication date: 01/04/2016
Field of study

This paper presents the results of developing a morphological disambiguation tool for Kazakh. Starting with a previously developed rule-based approach, we tried to cope with the complex morphology of Kazakh by breaking up lexical forms across their derivational boundaries into inflectional groups and modeling their behavior with statistical methods. A hybrid rule-based/statistical approach appears to benefit morphological disambiguation demonstrating a per-token accuracy of 91% in running text

Nazarbayev University Repository

A free/open-source hybrid morphological disambiguation tool for Kazakh

Author: Abduali Balzhan
Amirova Dina
Assylbekov Zhenisbek
Karibayeva Aidana
Nurkas Assulan
Sundetova Aida
Tyers Francis
Washington Jonathan
Publication venue: DOI: 10.13140/RG.2.2.12467.43045
Publication date: 01/04/2016
Field of study

Nazarbayev University Repository

Comparison of Latent Semantic Analysis and Probabilistic Latent Semantic Analysis for Documents Clustering

Author: Kitowski Jacek
Kuta Marcin
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 04/02/2015
Field of study

In this paper we compare usefulness of statistical techniques of dimensionality reduction for improving clustering of documents in Polish. We start with partitional and agglomerative algorithms applied to Vector Space Model. Then we investigate two transformations: Latent Semantic Analysis and Probabilistic Latent Semantic Analysis. The obtained results showed advantage of Latent Semantic Analysis technique over probabilistic model. We also analyse time and memory consumption aspects of these transformations and present runtime details for IBM BladeCenter HS21 machine

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Voted Approach for Part of Speech Tagging in Bengali

Author: Bandyopadhyay Sivaji
Ekbal Asif
Hasanuzzaman Md.
Publication venue: City University of Hong Kong
Publication date: 01/01/2009
Field of study

PACLIC 23 / City University of Hong Kong / 3-5 December 200

Waseda University Repository

Al-Farahidi Arabic Diacrizer System

Author: Iyad Ahmad Mahmoud Abusamra
إياد أحمد محمود أبوسمرة
Publication venue: جامعة القدس
Publication date
Field of study

Al-Quds University Digital Repository

Methods and algorithms for unsupervised learning of morphology

Author: A. Gelbukh
A. Gispert de
B. Can
C. Monson
D. Blackwell
D. Harman
D.R. Morrison
E. Arısoy
E. Minkov
H. Ishwaran
H. Poon
H. Poon
J. Goldsmith
K. Järvelin
K. Kettunen
K. Kirchhoff
K. Sirts
K. Toutanova
L. Aunimo
M. Creutz
M. Kurimo
M.A. Hafer
M.R. Brent
N.A. Smith
P.F. Brown
R. Krovetz
S. Bordag
S. Manandhar
S. Neuvel
Z.S. Harris
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

This is an accepted manuscript of a chapter published by Springer in Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8403 in 2014 available online: https://doi.org/10.1007/978-3-642-54906-9_15 The accepted version of the publication may differ from the final published version.This paper is a survey of methods and algorithms for unsupervised learning of morphology. We provide a description of the methods and algorithms used for morphological segmentation from a computational linguistics point of view. We survey morphological segmentation methods covering methods based on MDL (minimum description length), MLE (maximum likelihood estimation), MAP (maximum a posteriori), parametric and non-parametric Bayesian approaches. A review of the evaluation schemes for unsupervised morphological segmentation is also provided along with a summary of evaluation results on the Morpho Challenge evaluations.Published versio

Crossref

Wolverhampton Intellectual Repository and E-theses