36 research outputs found

    Croatian Language Resources for NooJ

    Get PDF
    This paper presents the Croatian module for NooJ. The module includes the novel ā€œPosljednji Stipančićiā€ by Vjenceslav Novak as a corpus with fully covered dictionary (i.e. zero unknowns). Examples of morphological and syntactic grammars are presented together with few examples of dictionary entries and their inflectional and derivational paradigms

    Hrvatski poredbeni idiomi: MWU pristup

    Get PDF
    This article presents the work aiming to describe comparative idioms in Croatian language for computational processing using NooJ linguistic environment. As a part of a larger project concentrated on annotating and extracting different Croatian idioms as multi-word units (MWUs), this work aims to present automated comparative idiom search in any Croatian text. Using NooJ environment, a user can find any comparative structure in a text and use it for translation, language learning or research purposes

    Comparative Analysis of Automatic Term and Collocation Extraction

    Get PDF
    Monolingual and multilingual terminology and collocation bases, covering a specific domain, used independently or integrated with other resources, have become a valuable electronic resource. Building of such resources could be assisted by automatic term extraction tools, combining statistical and linguistic approaches. In this paper, the research on term extraction from monolingual corpus is presented. The corpus consists of publicly accessible English legislative documents. In the paper, results of two hybrid approaches are compared: extraction using the TermeX tool and an automatic statistical extraction procedure followed by linguistic filtering through the open source linguistic engineering tool. The results have been elaborated through statistical measures of precision, recall, and F-measure

    Improved Parser for Simple Croatian Sentences

    Get PDF
    In this paper, authors will present the work that has been done to improve the existing syntactic parser. This work is a continuation of the work presented at the NooJ 2009 conference. We will show and explain the grammar for detecting nominal predicate in a simple sentence. The nominal predicate in Croatian language is made of the auxiliary verb ā€˜to beā€™ and an in Nominative case. The can be a complex made of a single noun and any number of adjectives, pronouns and numbers proceeding that noun and agreeing with it in number, gender and case, but also a single noun, a single pronoun, a single adjective or even an adverb. A problem of coordination of two or more nodes of different gender and its agreement with the main verb in the cases where coordination is a subject of a sentence will be discussed. The work will further enlight and discuss other important properties of Croatian sentence complexity. At the end of the paper, the results will be evaluated through precision, recall and f-measure to show the adequacy of the model

    AmAMorph: Finite State Morphological Analyzer for Amazighe

    Get PDF
    This paper presents AmAMorph, a morphological analyzer for Amazighe language using a system based on the NooJ linguistic development environment. The paper begins with the development of Amazighe lexicons with large coverage formalization. The built electronic lexicons, named ā€˜NAmLexā€™, ā€˜VAmLexā€™ and ā€˜PAmLexā€™ which stand for ā€˜Noun Amazighe Lexiconā€™, ā€˜Verb Amazighe Lexiconā€™ and ā€˜Particles Amazighe Lexiconā€™, link inflectional, morphological, and syntacticsemantic information to the list of lemmas. Automated inflectional and derivational routines are applied to each lemma producing over inflected forms. To our knowledge,AmAMorph is the first morphological analyzer for Amazighe. It identifies the component morphemes of the forms using large coverage morphological grammars. Along with the description of how the analyzer is implemented, this paper gives an evaluation of the analyzer

    The Adventures of Hlapić in Burgenland Croatian

    Get PDF
    The paper presents the results of a digital comparative text analysis of the Croatian original and the Burgenland editions of a childrenā€™s classic performed in combination with research methods of Translation Studies. The Croatian childrenā€™s novel of 1913, Čudnovate zgode Å”egrta Hlapića [The Strange Adventures of Hlapić the Apprentice] by Ivana Brlić-Mažuranić (1874ā€“1938), appeared in Burgenland Croatian in 1960 and again, with minor alterations, in 2000. Burgenland Croatian is the language of the Croatian minority predominantly positioned in Austria, considered to be a regional variant of Croatian. These two languages are similar, but they still differ in structural and semantic elements as they have been separately developing since the 15th century. The similarities allowed for a digital comparative text analysis of the linguistic aspects of source and target texts, including their linguistic complexity. The results of the digital analysis demonstrate the applicability of digital linguistics methodology in analyzing translated and rewritten literary texts when source and target language idioms are similar, especially in determining the stylistic differences between source and target texts. The results of the analysis of culture-specific items rendered in two target texts, as compared to the original, indicate there exist not many differences on the language text levels between the analyzed source and target texts, yet some discrepancies between the two editions of the translation into the Burgenland Croatian have been detected, and thus explained in the historical and cultural context of their appearance

    Towards Parsing Croatian Complex Sentences: Dependent Noun Clauses

    Get PDF
    In this paper, authors will present methods for parsing Croatian complex sentences in which a dependent clause serves as a direct object to the main verb. This research is based on the resources that have already been developed for parsing simple Croatian sentences. So far, sentences that we were able to parse using these resources are of the basic structure consisting of a subject, verb, direct and indirect object, adverbial of time and place. Methods we shall present in this paper will extend this structure to the following sentence structure > and, although quite rare and stylistically marked, to the structure . Our primary indicator for this type of sentence will be the absence of the required direct object in the main clause as well as the presence of one of the subordinating conjunctions (ā€˜daā€™, ā€˜kakoā€™) or complementizers (relative pronoun, adverb of place, time, cause or manner). Since this type of complex sentences is very common in Croatian language, we believe that this research will be a valuable contribution to Croatian module for NooJ. At the end of the paper, we will evaluate the adequacy of the model through precision, recall and f-measure
    corecore