Search CORE

269 research outputs found

A Case Study of Algorithms for Morphosyntactic Tagging of Polish Language

Author: Chrzaszcz Paweł
Kitowski Jacek
Kuta Marcin
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 30/01/2012
Field of study

The paper presents an evaluation of several part-of-speech taggers, representing main tagging algorithms, applied to corpus of frequency dictionary of the contemporary Polish language. We report our results considering two tagging schemes: IPI PAN positional tagset and its simplified version. Tagging accuracy is calculated for different training sets and takes into account many subcategories (accuracy on known and unknown tokens, word segments, sentences etc.) The comparison of results with other inflecting and analytic languages is done. Performance aspects (time demands) of used tagging tools are also discussed

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Benchmarking High Performance Architectures With Natural Language Processing Algorithms

Author: Jacek Kitowski
Marcin Kuta
Publication venue: AGH University of Science and Technology Press
Publication date: 01/01/2011
Field of study

Natural Language Processing algorithms are resource demanding, especially when tuning toinflective language like Polish is needed. The paper presents time and memory requirementsof part of speech tagging and clustering algorithms applied to two corpora of the Polishlanguage. The algorithms are benchmarked on three high performance platforms of differentarchitectures. Additionally sequential versions and OpenMP implementations of clusteringalgorithms were compared

AGH (Akademia Górniczo-Hutnicza) University of Science and Technology: Journals

Computer Science Journal (AGH University of Science and Technology, Krakow)

Directory of Open Access Journals

Application of Weighted Voting Taggers to Languages Described with Large Tagsets

Author: Kitowski Jacek
Kuta Marcin
Wojcik Wojciech
Wrzeszcz Michał
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 26/01/2012
Field of study

The paper presents baseline and complex part-of-speech taggers applied to the modified corpus of Frequency Dictionary of Contemporary Polish, annotated with a large tagset. First, the paper examines accuracy of 6 baseline part-of-speech taggers. The main part of the work presents simple weighted voting and complex voting taggers. Special attention is paid to lexical voting methods and issues of ties and fallbacks. TagPair and WPDV voting methods achieve the top accuracy among all considered methods. Error reduction 10.8 % with respect to the best baseline tagger for the large tagset is comparable with other author's results for small tagsets

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Increasing Quality of the Corpus of Frequency Dictionary of Contemporary Polish for Morphosyntactic Tagging of the Polish Language

Author: Chrzaszcz Paweł
Kitowski Jacek
Kuta Marcin
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 26/01/2012
Field of study

The paper is devoted to the issue of correction of the erroneous and ambiguous corpus of Frequency Dictionary of Contemporary Polish (FDCP) and its application to morphosyntactic tagging of the Polish language. Several stages of corpus transformation are presented and baseline part-of-speech tagging algorithms are evaluated, too

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

An audio-visual corpus for multimodal automatic speech recognition

Author
Publication venue: Springer
Publication date: 07/01/2017
Field of study

Springer - Publisher Connector

Structure of pauses in speech in the context of speaker verification and classification of speech type

Author: Bartosz Ziółko
Magdalena Igras-Cybulska
Marcin Witkowski
Piotr Żelasko
Publication venue: Springer Nature
Publication date: 01/01/2016
Field of study

Springer - Publisher Connector

Comparison of Latent Semantic Analysis and Probabilistic Latent Semantic Analysis for Documents Clustering

Author: Kitowski Jacek
Kuta Marcin
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 04/02/2015
Field of study

In this paper we compare usefulness of statistical techniques of dimensionality reduction for improving clustering of documents in Polish. We start with partitional and agglomerative algorithms applied to Vector Space Model. Then we investigate two transformations: Latent Semantic Analysis and Probabilistic Latent Semantic Analysis. The obtained results showed advantage of Latent Semantic Analysis technique over probabilistic model. We also analyse time and memory consumption aspects of these transformations and present runtime details for IBM BladeCenter HS21 machine

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

N-Grams Model for Polish

Author: Bartosz Ziolko
Dawid Skurzok
Publication venue: 'IntechOpen'
Publication date: 21/06/2011
Field of study

IntechOpen

Improving Gossip Learning via Limited Model Merging

Author: Danner Gábor
Hegedűs István
Jelasity Márk
Publication venue: Springer Nature Switzerland
Publication date: 01/01/2023
Field of study

SZTE Publicatio Repozitórium - SZTE - Repository of Publications