4 research outputs found

    Boosting bitext compression

    Get PDF
    Bilingual parallel corpora, also know as bitexts, convey the same information in two different languages. This implies that when modelling bitexts one can take advantage of the fact that there exists a relation between both texts; the text alignment task allow to establish such relationship. In this paper we propose different approaches that use words and biwords (pairs made of two words, each one from a different text) as representation symbolic units. The properties of these approaches are analysed from a statistical point of view and tested as a preprocessing step to general purpose compressors. The results obtained suggest interesting conclusions concerning the use of both words and biwords. When encoded models are used as compression boosters we achieve compression ratios improving state-of-the-art compressors up to 6.5 percentage points, being up to 40% faster.Work supported by the Spanish Government through projects TIN2009-14009-C02-01 and TIN2009-14009-C02-02; and by the Millennium Institute for Cell Dynamics and Biotechnology (ICDB) (Grant ICM P05-001-F)

    5th International Conference on Technological Ecosystems for Enhancing Multiculturality (TEEM 2017)

    No full text
    Google Classroom (GC) is gaining momentum in the educational milieu, but its functionalities are limited. Learning analytics applications integrated with GC can help to face these limitations, but to reach this aim, developers need access to the data generated by GC’s users. This paper reports on the results of an analysis of the existing alternatives to collect data from GC. The study is based on the analysis of the documentation provided by the involved tools. The analysis shows that GC’s API is a potential source of data about the activity of the users in GC-enabled settings, but that the information it provides is limited. Further work is needed to explore if Chrome OS synchronization functions can deliver more detailed information about GC usage, thus enabling for more advanced learning analytics applications.Ministerio de Economía, Industria y Competitividad (Projects TIN2014-53199-C3-2-R and TIN2015-71669-REDT)Junta de Castilla y León (programa de apoyo a proyectos de investigación - Ref. VA082U16

    Modelling parallel texts for boosting compression

    Get PDF
    Bilingual parallel corpora, also known as bitexts, convey the same information in two different languages. This implies that to model a bitext we can take advantage of the translation relationship that exists between the two texts; the text alignment task makes it possible to establish such a translation relationship. A biword is defined as a pair of words, each from a different text, that are mutual translations in the bitext; the use of biwords allows both texts in the bitext to be represented on a single model. Several biword-based schemes have been proposed leading to good compression ratios. Bearing in mind Melamed's affirmation which states that "the translation of a text into another language can be viewed as a detailed annotation of what that text means", we propose a new model for bitexts in agreement with this affirmation, dubbed MAR. The idea is to represent the words in the right text with respect to the preceding word in the left text; thus, a first-order model based on alignment relationships is proposed.Work supported by Spanish projects TIN2009-14009-C02-01 and TIN2009-14009-C02-02. Miguel A. Martínez-Prieto is granted by JCyL and ESF
    corecore