60 research outputs found
Babel Treebank of Public Messages in Croatian
AbstractThe paper presents the process of constructing a publicly available treebank of public messages written in Croatian. The messages were collected from various electronic sources ā e-mail, blog, Facebook and SMS ā and published on the Zagreb Museum of Contemporary Art LED facade within the Babel art project. The project aimed to use the facade as an open-space blog or social interface for enabling citizens to publicly express their views. Construction and current state of the treebank is presented along with future work plans. A comparison of Babel Treebank with Croatian Dependency Treebank and SETimes.HR treebank regarding differing domains and annotation schemes is briefly sketched. The treebank is used as a test platform for introducing a new standard for syntactic annotation of Croatian texts. An experiment with morphosyntactic tagging and dependency parsing of the treebank is conducted, providing first insight to computational processing of non-standard text in Croatian
An Experiment in Verb Valency Frame Extraction from Croatian Dependency Treebank
The paper presents an approach to semi-automatic verb valency frame extraction from the Croatian Dependency Treebank. Our algorithm extracted 1923 verb valency frames for 594 different verbs. We discuss applicability of our method to semi-automatic verb valency lexicon creation and refinement, along with possibilities of utilizing it in the task of parsing Croatian texts
Tagset Reductions in Morphosyntactic Tagging of Croatian Texts
Morphosyntactic tagging of Croatian texts is performed with stochastic taggersby using a language model built on a manually annotated corpus implementingthe Multext East version 3 specifications for Croatian. Tagging accuracy in thisframework is basically predefined, i.e. proportionally dependent of two things:the size of the training corpus and the number of different morphosyntactic tagsencompassed by that corpus. Being that the 100 kw Croatia Weekly newspapercorpus by definition makes a rather small language model in terms of stochastictagging of free domain texts, the paper presents an approach dealing withtagset reductions. Several meaningful subsets of the Croatian Multext-East version3 morphosyntactic tagset specifications are created and applied on Croatiantexts with the CroTag stochastic tagger, measuring overall tagging accuracyand F1-measures. Obtained results are discussed in terms of applying differentreductions in different natural language processing systems and specifictasks defined by specific user requirements
Error Analysis in Croatian Morphosyntactic Tagging
In this paper, we provide detailed
insight on properties of errors generated by a
stochastic morphosyntactic tagger assigning
Multext-East morphosyntactic descriptions to
Croatian texts. Tagging the Croatia Weekly
newspaper corpus by the CroTag tagger in
stochastic mode revealed that approximately 85
percent of all tagging errors occur on nouns,
adjectives, pronouns and verbs. Moreover,
approximately 50 percent of these are shown to
be incorrect assignments of case values. We
provide various other distributional properties of
errors in assigning morphosyntactic descriptions
for these and other parts of speech. On the basis
of these properties, we propose rule-based and
stochastic strategies which could be integrated in
the tagging module, creating a hybrid procedure
in order to raise overall tagging accuracy for
Croatian
hr500k ā A Reference Training Corpus of Croatian.
In this paper we present hr500k, a Croatian reference training corpus of 500 thousand tokens, segmented at document, sentence and word level, and annotated for morphosyntax, lemmas, dependency syntax, named entities, and semantic roles. We present each annotation layer via basic label statistics and describe the final encoding of the resource in CoNLL and TEI formats. We also give a description of the rather turbulent history of the resource and give insights into the topic and genre distribution in the corpus. Finally, we discuss further enrichments of the corpus with additional layers, which are already underway
Cross-lingual Dependency Parsing of Related Languages with Rich Morphosyntactic Tagsets
This paper addresses cross-lingual dependency parsing using rich morphosyntactic tagsets. In our case study, we experiment with three related Slavic languages:
Croatian, Serbian and Slovene. Four different dependency treebanks are used for
monolingual parsing, direct cross-lingual
parsing, and a recently introduced crosslingual parsing approach that utilizes statistical machine translation and annotation projection. We argue for the benefits
of using rich morphosyntactic tagsets in
cross-lingual parsing and empirically support the claim by showing large improvements over an impoverished common feature representation in form of a reduced
part-of-speech tagset. In the process, we
improve over the previous state-of-the-art
scores in dependency parsing for all three
languages.Published versio
PHYTOPHILOUS FAUNA OF A SMALL AND ARTIFICIAL URBAN LAKE
Fitofilna zajednica na Myriophyllum spicatum prouÄavana je u malom umjetnom jezeru u gradu Osijeku (istoÄna Hrvatska) tijekom proljetne i ljetne sezone 2010. godine. U eutrofnim uvjetima makrofitni su bili dobro razvijeni, a na formiranom perifitonu zabilježeni su predstavnici slijedeÄih vrsta beskraljeÅ”njaka: Hidrozoa, Nematoda, Gastropoda, Cladocera, Copepoda, larve Insecta - ukljuÄujuÄi i obitelji Chironomidae i Coleoptera. Pokazivali su razlike u vremenskim oblicima pojavljivanja. Zabilježili smo dvije zasebne faze u kolonizaciji makrofita s razlikama u sastavu i obilju beskralježnjaka. Larve insekata, osobito Chironomidae-a, bili su najrasprostranjeniji u prvoj fazi, tijekom proljetnog razdoblja, a Hydra oligactis (smeÄa hidra) bila je u izobilju u drugoj fazi, tj. ljetnom razdoblju. Istodobno, obilje mikrorakuÅ”aca opadao je prema kraju ljeta. Rezultati analiza pokazali su da su temperatura vode i perifitonska biomasa bile varijable koje su imale glavni utjecaj na sastav beskralježnjaka, a zanimljivo je da su makrofitska veliÄina i biomasa negativno povezani s obiljem faune. S druge strane, smeÄa hidra negativno je bila povezana sa svim ostalim beskralježnjaÄkim svojstvima, osim gastropoda. VeÄa povrÅ”ina uronjenih makrofita glavni je parametar koji pomaže poveÄanju obilja beskraljeÅ”njaka zbog osiguranja zaÅ”tite od grabežljivaca i rasta perifitona, važnog izvora hrane za ove fitofilne organizme. Duljina makrofita bila je pozitivno povezana s bogatstvom hidre, dok su Chironomidi bili viÅ”e pod utjecajem perifitonske biomase. Ovi organizmi mogu ukazivati na kakvoÄu vode i potencijalno poveÄanje primarne i sekundarne proizvodnjePhytophilous community on Myriophyllum spicatum was studied in a small artificial urban lake in the city of Osijek (eastern Croatia), during the spring and summer season in 2010. In the eutrophic conditions, macrophyte stands were well developed and in the formed periphyton representatives of the following invertebrate taxa were found: Hydrozoa, Nematoda, Gastropoda, Cladocera, Copepoda, Insecta larvae - including families Chironomidae and Coleoptera. They displayed differences in temporal abundance patterns. Two separate phases in macrophyte colonization with differences in invertebrate composition and abundance were recorded. Insect larvae, particularly Chironomidae, were most abundant in the first phase, through the spring period, and Hydra oligactis (brown hydra) was most abundant in the second phase, i.e. summer period. Concurrently, microcrustacean abundance declined towards the end of the summer. Results of the analyses indicated that water temperature and perihyton biomass were the variables exerting the main influence on the invertebrate assemblage, while interestingly, macrophyte size and biomass were negatively correlated with most of the fauna abundance. On the other hand, brown hydra was negatively correlated with all other invertebrate taxa, except gastropods. Larger surface of submersed macrophytes is the main parameter supporting the increase of invertebrate abundance due to providing protection from predators and growth for periphyton, an important food source for these phytophilous organisms. Macrophyte length was positively correlated with Hydra abundance, while Chironomids were more influenced by periphyton biomass. These organisms can indicate water quality conditions and a potential increase in primary and secondary production
- ā¦