6 research outputs found

    Evaluation of a cross-lingual Romanian-English multi-document summariser

    Get PDF
    The rapid growth of the Internet means that more information is available than ever before. Multilingual multi-document summarisation offers a way to access this information even when it is not in a language spoken by the reader by extracting the GIST from related documents and translating it automatically. This paper presents an experiment in which Maximal Marginal Relevance (MMR), a well known multi-document summarisation method, is used to produce summaries from Romanian news articles. A task-based evaluation performed on both the original summaries and on their automatically translated versions reveals that they still contain a significant portion of the important information from the original texts. However, direct evaluation of the automatically translated summaries shows that they are not very legible and this can put off some readers who want to find out more about a topic.Published versio

    A minority language in the globalizing world: The Buryat language on the Internet

    Get PDF
    A minority language in the globalizing world: The Buryat language on the InternetThe situation of languages on the Internet seems to reproduce their situation in offline reality: dominant languages with a large number of users and support from state and society are more widespread. This does not mean, however, that minority languages are not present in the Internet. In this paper, using the example of the Buryat language, we are trying to show that websites or webpages in minority languages are created not only for instrumental but also autotelic reasons. Buryats make efforts to preserve their own language and culture; they are driven by a desire to emphasize their activity or by comparison with other nations which have websites in their own languages. An important issue in our discussion is the relationship between efforts aimed at the preservation and development of ethnic language in spoken and written form and the development of web content in that language. We thus show the relationship between “on-line” and “off-line” problems faced by Buryats today. Język mniejszości w globalizującym się świecie: język buriacki w InternecieSytuacja języków w Internecie zdaje się reprodukować ich sytuację w rzeczywistym świecie: języki dominujące, z dużą liczbą użytkowników, mogące liczyć na pomoc państw są bardziej rozprzestrzenione. Nie znaczy to jednak, że języki mniejszości nie są w Internecie obecne. W niniejszym opracowaniu, posługując się przykładem języka buriackiego, staramy się pokazać, że witryny czy też strony internetowe w językach mniejszościowych są tworzone nie tylko w celach instrumentalnych, ale także autotelicznych. Buriaci podejmują wysiłki, by chronić swój język i swoją kulturę, kierowani są chęcią podkreślenia swojej aktywności, porównaniami z innymi narodami, które mają witryny internetowe w swoich własnych językach. Ważną częścią naszej analizy są związki między wysiłkami mającymi na celu zachowanie i rozwój języka tnicznego w formie mówionej i pisanej a rozwojem treści internetowych w tym języku. Pokazujemy więc związki między problemami on-line i off-line, z jakimi borykają się współcześnie Buriaci

    Cross-Language Text Summarization using Sentence and Multi-Sentence Compression

    Get PDF
    long paperInternational audienceCross-Language Automatic Text Summarization produces a summary in a language different from the language of the source documents. In this paper, we propose a French-to-English cross-lingual sum-marization framework that analyzes the information in both languages to identify the most relevant sentences. In order to generate more informative cross-lingual summaries, we introduce the use of chunks and two compression methods at the sentence and multi-sentence levels. Experimental results on the MultiLing 2011 dataset show that our framework improves the results obtained by state-of-the art approaches according to ROUGE metrics

    Knowledge Expansion of a Statistical Machine Translation System using Morphological Resources

    Get PDF
    Translation capability of a Phrase-Based Statistical Machine Translation (PBSMT) system mostly depends on parallel data and phrases that are not present in the training data are not correctly translated. This paper describes a method that efficiently expands the existing knowledge of a PBSMT system without adding more parallel data but using external morphological resources. A set of new phrase associations is added to translation and reordering models; each of them corresponds to a morphological variation of the source/target/both phrases of an existing association. New associations are generated using a string similarity score based on morphosyntactic information. We tested our approach on En-Fr and Fr-En translations and results showed improvements of the performance in terms of automatic scores (BLEU and Meteor) and reduction of out-of-vocabulary (OOV) words. We believe that our knowledge expansion framework is generic and could be used to add different types of information to the model.JRC.G.2-Global security and crisis managemen
    corecore