Search CORE

9 research outputs found

A Comparison Between BLEU and METEOR Metrics Used for Assessing Students within an Informatics Discipline Course

Author: Dobre Iuliana
Publication venue: The Authors. Published by Elsevier Ltd.
Publication date: 05/05/2015
Field of study

AbstractThis paper is proposing a very brief review of the Intelligent Tutoring System general structure and the use of Natural Language Processing techniques within such system. Also, this paper is proposing a review of the comparison performed by the author between the use of the Natural language Processing metrics, BLEU and METEOR algorithms, for students knowledge assessment module, within an Intelligent Tutoring System developed by the author for teaching, learning and assessing students with applicability to the course of Programming of Computers and C language

Elsevier - Publisher Connector

CoRoLa Starts Blooming – An update on the Reference Corpus of Contemporary Romanian Language

Author: Barbu Mititelu Verginica
Bolea Cecilia
Boroș Tiberiu
Cristea Dan
Dumitrescu Ștefan Daniel
Irimia Elena
Moruz Alex
Pistol Laura
Scutelnicu Andrei
Teodorescu Horia Nicolai
Tufiș Dan
Publication venue: Mannheim : Institut für Deutsche Sprache
Publication date: 02/07/2015
Field of study

This article reports on the on-going CoRoLa project, aiming at creating a reference corpus of contemporary Romanian (from 1945 onwards), opened for online free exploitation by researchers in linguistics and language processing, teachers of Romanian, students. We invest serious efforts in persuading large publishing houses and other owners of IPR on relevant language data to join us and contribute the project with selections of their text and speech repositories. The CoRoLa project is coordinated by two Computer Science institutes of the Romanian Academy, but enjoys cooperation of and consulting from professional linguists from other institutes of the Romanian Academy. We foresee a written component of the corpus of more than 500 million word forms, and a speech component of about 300 hours of recordings. The entire collection of texts (covering all functional styles of the language) will be pre-processed and annotated at several levels, and also documented with standardized metadata. The pre-processing includes cleaning the data and harmonising the diacritics, sentence splitting and tokenization. Annotation will include morpho-lexical tagging and lemmatization in the first stage, followed by syntactic, semantic and discourse annotation in a later stage

Publikationsserver des Instituts für Deutsche Sprache

When Will ITS Speak Your Language?

Author: Gulyás András
Niculescu Mihai
Tadić Marko
Váradi Tamás
Publication venue
Publication date: 01/01/2015
Field of study

Repository of the Academy's Library

Dependency-based translation equivalents for factored machine translation

Author: Alexandru Ceauşu
Irimia Elena
Publication venue
Publication date: 23/04/2020
Field of study

Abstract. One of the major concerns of the machine translation practitioners is to create good translation models: correctly extracted translation equivalents and a reduced size of the translation table are the most important evaluation criteria. This paper presents a method for extracting translation examples using the dependency linkage of both the source and target sentence. To decompose the source/target sentence into fragments, we identified two types of dependency link-structures -super-links and chains -and used these structures to set the translation example borders. The option for the dependency-linked ngrams approach is based on the assumption that a decomposition of the sentence in coherent segments, with complete syntactical structure and which accounts for extra-phrasal syntactic dependency would guarantee "better" translation examples and would make a better use of the storage space. The performance of the dependency-based approach is measured with the BLEU-NIST score and in comparison with a baseline system

CiteSeerX

RSS-TOBI - a Prosodically Enhanced Romanian Speech Corpus

Author: Boroș Tiberiu
Dumitrescu Stefan Daniel
Stan Adriana
Watts Oliver
Publication venue
Publication date: 01/05/2014
Field of study

This paper introduces a recent development of a Romanian Speech corpus to include prosodic annotations of the speech data in the form of ToBI labels. We describe the methodology of determining the required pitch patterns that are common for the Romanian language, annotate the speech resource, and then provide a comparison of two text-to-speech synthesis systems to establish the benefits of using this type of information to our speech resource. The result is a publicly available speech dataset which can be used to further develop speech synthesis systems or to automatically learn the prediction of ToBI labels from text in Romanian language

CiteSeerX

Edinburgh Research Explorer

Proceedings of the Sixth International Conference Formal Approaches to South Slavic and Balkan languages

Author
Publication venue: Croatian Language Technologies Society, Faculty of Humanities and Social Science
Publication date: 01/01/2008
Field of study

Proceedings of the Sixth International Conference Formal Approaches to South Slavic and Balkan Languages publishes 22 papers that were presented at the conference organised in Dubrovnik, Croatia, 25-28 Septembre 2008

Repozitorij Filozofskog fakulteta u Zagrebu' at University of Zagreb

Proceedings of the Seventh International Conference Formal Approaches to South Slavic and Balkan languages

Author
Publication venue: Croatian Language Technologies Society, Faculty of Humanities and Social Science
Publication date: 01/01/2010
Field of study

Proceedings of the Seventh International Conference Formal Approaches to South Slavic and Balkan Languages publishes 17 papers that were presented at the conference organised in Dubrovnik, Croatia, 4-6 Octobre 2010

Repozitorij Filozofskog fakulteta u Zagrebu' at University of Zagreb

ABC - Language Identifier

Author
Publication venue: Research Institute for Artificial Intelligence, Romanian Academy of Sciences
Publication date: 30/07/2014
Field of study

The application, developed in C#, automatically identifies the language of a text written in one of the 21 European Union languages. By using training texts in different languages (approx. 1.5Mb of text for each language), a training module counts the prefixes (the first 3 characters) and the suffixes (4 characters endings) for all the words in the texts, for each language. For every language two models are constructed, containing the weights (percentages) of prefixes and suffixes in the texts representing a language. In the prediction phase, for a new text, two models are built on the fly in a similar manner. These models are then compared with the stored models representing each language for which the application was trained. Using comparison functions, the best model is chose. More detailed descriptions are available in [[http://www.racai.ro/~tufis/papers|the following papers]]: -- Dan Tufiş, Radu Ion, Alexandru Ceauşu, and Dan Ştefănescu (2008). RACAI's Linguistic Web Services. In Proceedings of the 6th Language Resources and Evaluation Conference - LREC 2008, Marrakech, Morocco, May 2008. ELRA - European Language Resources Association. ISBN 2-9517408-4-0. -- Dan Tufiş and Alexandru Ceauşu (2007). Diacritics Restoration in Romanian Texts. In Elena Paskaleva and Milena Slavcheva (eds.), A Common Natural Language Processing Paradigm for Balkan Languages - RANLP 2007 Workshop Proceedings, pp. 49-56, Borovets, Bulgaria, September 2007. INCOMA Ltd., Shoumen, Bulgaria. ISBN 978-954-91743-8-0. -- Dan Tufiş and Adrian Chiţu (1999). Automatic Insertion of Diacritics in Romanian Texts. In Ferenc Kiefer, Gábor Kiss, and Júlia Pajzs (eds.), Proceedings of the 5th International Workshop on Computational Lexicography (COMPLEX 1999), pp. 185-194, Pecs, Hungary, May 1999. Linguistics Institute, Hungarian Academy of Sciences

LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University