38 research outputs found

    Dynamic Vs Static Term-Expansion using Semantic Resources in Information Retrieval

    Get PDF
    Information Retrieval in a Telugu language is upcoming area of research. Telugu is one of the recognized Indian languages. We present a novel approach in reformulating item terms at the time of crawling and indexing. The idea is not new, but use of synset and other lexical resources in Indian languages context has limitations due to unavailability of language resources. We prepared a synset for 1,43,001 root words out of 4,83,670 unique words from training corpus of 3500 documents during indexing. Index time document expansion gave improved recall ratio, when compared to base line approach i.e. simple information retrieval without term expansion at both the ends. We studied the effect of query terms expansion at search time using synset and compared with simple information retrieval process without expansion, recall is greatly affected and improved. We further extended this work by expanding terms in two sides and plotted results, which resemble recall growth. Surprisingly all expansions are showing improvement in recall and little fall in precision. We argue that expansion of terms at any level may cause inverse effect on precision. Necessary care is required while expanding documents or queries with help of language resources like Synset, WordNet and other resources

    Knowledge Expansion of a Statistical Machine Translation System using Morphological Resources

    Get PDF
    Translation capability of a Phrase-Based Statistical Machine Translation (PBSMT) system mostly depends on parallel data and phrases that are not present in the training data are not correctly translated. This paper describes a method that efficiently expands the existing knowledge of a PBSMT system without adding more parallel data but using external morphological resources. A set of new phrase associations is added to translation and reordering models; each of them corresponds to a morphological variation of the source/target/both phrases of an existing association. New associations are generated using a string similarity score based on morphosyntactic information. We tested our approach on En-Fr and Fr-En translations and results showed improvements of the performance in terms of automatic scores (BLEU and Meteor) and reduction of out-of-vocabulary (OOV) words. We believe that our knowledge expansion framework is generic and could be used to add different types of information to the model.JRC.G.2-Global security and crisis managemen

    Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018 : 10-12 December 2018, Torino

    Get PDF
    On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-­‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-­‐it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after five years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges

    Analyzing Text Complexity and Text Simplification: Connecting Linguistics, Processing and Educational Applications

    Get PDF
    Reading plays an important role in the process of learning and knowledge acquisition for both children and adults. However, not all texts are accessible to every prospective reader. Reading difficulties can arise when there is a mismatch between a reader’s language proficiency and the linguistic complexity of the text they read. In such cases, simplifying the text in its linguistic form while retaining all the content could aid reader comprehension. In this thesis, we study text complexity and simplification from a computational linguistic perspective. We propose a new approach to automatically predict the text complexity using a wide range of word level and syntactic features of the text. We show that this approach results in accurate, generalizable models of text readability that work across multiple corpora, genres and reading scales. Moving from documents to sentences, We show that our text complexity features also accurately distinguish different versions of the same sentence in terms of the degree of simplification performed. This is useful in evaluating the quality of simplification performed by a human expert or a machine-generated output and for choosing targets to simplify in a difficult text. We also experimentally show the effect of text complexity on readers’ performance outcomes and cognitive processing through an eye-tracking experiment. Turning from analyzing text complexity and identifying sentential simplifications to generating simplified text, one can view automatic text simplification as a process of translation from English to simple English. In this thesis, we propose a statistical machine translation based approach for text simplification, exploring the role of focused training data and language models in the process. Exploring the linguistic complexity analysis further, we show that our text complexity features can be useful in assessing the language proficiency of English learners. Finally, we analyze German school textbooks in terms of their linguistic complexity, across various grade levels, school types and among different publishers by applying a pre-existing set of text complexity features developed for German

    Lexicography of coronavirus-related neologisms

    Get PDF
    This volume brings together contributions by international experts reflecting on Covid19-related neologisms and their lexicographic processing and representation. The papers analyze new words, new meanings of existing words, and new multiword units, where they come from, how they are transmitted (or differ) across languages, and how their use and meaning are reflected in dictionaries of all sorts. Recent trends in as many as ten languages are considered, including general and specialized language, monolingual as well as bilingual and printed as well as online dictionaries

    Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018

    Get PDF
    On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-­‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-­‐it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after five years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges

    Lexicography of Coronavirus-related Neologisms

    Get PDF
    This volume brings together contributions by international experts reflecting on Covid19-related neologisms and their lexicographic processing and representation. The papers analyze new words, new meanings of existing words, and new multiword units in as many as ten languages, considering both specialized and general language, monolingual as well as bilingual and printed as well as online dictionaries
    corecore