36 research outputs found
Croatian Language Resources for NooJ
This paper presents the Croatian module for NooJ. The module includes the novel āPosljednji StipanÄiÄiā by Vjenceslav Novak as a corpus with fully covered dictionary (i.e. zero unknowns). Examples of morphological and syntactic grammars are presented together with few examples of dictionary entries and their inflectional and
derivational paradigms
Hrvatski poredbeni idiomi: MWU pristup
This article presents the work aiming to describe comparative idioms in Croatian language for computational processing using NooJ linguistic environment. As a part of a larger project concentrated on annotating and extracting different Croatian idioms as multi-word units (MWUs), this work aims to present automated comparative idiom search in any Croatian text. Using NooJ environment, a user can find any comparative structure in a text and use it for translation, language learning or research purposes
Comparative Analysis of Automatic Term and Collocation Extraction
Monolingual and multilingual terminology and collocation bases, covering a specific domain, used independently or integrated with other resources, have become a valuable electronic resource. Building of such resources could be assisted by automatic term extraction tools, combining statistical and linguistic approaches. In this paper, the research on term extraction from monolingual corpus is presented. The corpus consists of publicly accessible English legislative documents. In the paper, results of two hybrid approaches are compared: extraction using the TermeX tool and an automatic statistical extraction procedure followed by linguistic filtering through the open source linguistic engineering tool. The results have been elaborated through statistical measures of precision, recall, and F-measure
Improved Parser for Simple Croatian Sentences
In this paper, authors will present the work that has been done to improve the existing syntactic parser. This work is a continuation of the work presented at the NooJ 2009 conference. We will show and explain the grammar for detecting nominal predicate in a simple sentence. The nominal predicate in Croatian language is made of the auxiliary verb āto beā and an in Nominative case. The can be a complex made of a single noun and any number of adjectives, pronouns and numbers proceeding that noun and agreeing with it in number, gender and case, but also a single noun, a single pronoun, a single adjective or even an adverb. A problem of coordination of two or more nodes of different gender and its agreement with the main verb in the cases where coordination is a subject of a sentence will be discussed. The work will further enlight and discuss other important properties of Croatian sentence complexity. At the end of the paper, the results will be evaluated through precision, recall and f-measure to show the adequacy of the model
AmAMorph: Finite State Morphological Analyzer for Amazighe
This paper presents AmAMorph, a morphological analyzer for Amazighe language using a system based on the NooJ linguistic development environment. The paper begins with the development of Amazighe lexicons with large coverage formalization. The built electronic lexicons, named āNAmLexā, āVAmLexā and āPAmLexā which stand for āNoun Amazighe Lexiconā, āVerb Amazighe Lexiconā and āParticles Amazighe Lexiconā, link inflectional, morphological, and syntacticsemantic information to the list of lemmas. Automated inflectional and derivational routines are applied to each lemma producing over inflected forms. To our knowledge,AmAMorph is the first morphological analyzer for Amazighe. It identifies the component morphemes of the forms using large coverage morphological grammars. Along with the description of how the analyzer is implemented, this paper gives an evaluation of the analyzer
The Adventures of HlapiÄ in Burgenland Croatian
The paper presents the results of a digital comparative text analysis of the Croatian original and the Burgenland editions of a childrenās classic performed in combination with research methods of Translation Studies. The Croatian childrenās novel of 1913, Äudnovate zgode Å”egrta HlapiÄa [The Strange Adventures of HlapiÄ the Apprentice] by Ivana BrliÄ-MažuraniÄ (1874ā1938), appeared in Burgenland Croatian in 1960 and again, with minor alterations, in 2000. Burgenland Croatian is the language of the Croatian minority predominantly positioned in Austria, considered to be a regional variant of Croatian. These two languages are similar, but they still differ in structural and semantic elements as they have been separately developing since the 15th century. The similarities allowed for a digital comparative text analysis of the linguistic aspects of source and target texts, including their linguistic complexity. The results of the digital analysis demonstrate the applicability of digital linguistics methodology in analyzing translated and rewritten literary texts when source and target language idioms are similar, especially in determining the stylistic differences between source and target texts. The results of the analysis of culture-specific items rendered in two target texts, as compared to the original, indicate there exist not many differences on the language text levels between the analyzed source and target texts, yet some discrepancies between the two editions of the translation into the Burgenland Croatian have been detected, and thus explained in the historical and cultural context of their appearance
Towards Parsing Croatian Complex Sentences: Dependent Noun Clauses
In this paper, authors will present methods for parsing Croatian complex sentences in which a dependent clause serves as a direct object to the main verb. This research is based on the resources that have already been developed for parsing simple Croatian sentences. So far, sentences that we were able to parse using these resources are of the basic structure consisting of a subject, verb, direct and indirect object, adverbial of time and place. Methods we shall present in this paper will extend this structure to the following sentence structure
> and, although quite rare and stylistically marked, to the structure . Our primary indicator for this type of sentence will be the absence of the required direct object in the main clause as well as the presence of one of the subordinating conjunctions (ādaā, ākakoā) or complementizers (relative pronoun, adverb of place, time, cause or manner). Since this type of complex sentences is very common in Croatian language, we believe that this research will be a valuable contribution to Croatian module for NooJ. At the end of the paper, we will evaluate the adequacy of the model through precision, recall and f-measure