151 research outputs found

    Multilingual Unsupervised Sentence Simplification

    Full text link
    Progress in Sentence Simplification has been hindered by the lack of supervised data, particularly in languages other than English. Previous work has aligned sentences from original and simplified corpora such as English Wikipedia and Simple English Wikipedia, but this limits corpus size, domain, and language. In this work, we propose using unsupervised mining techniques to automatically create training corpora for simplification in multiple languages from raw Common Crawl web data. When coupled with a controllable generation mechanism that can flexibly adjust attributes such as length and lexical complexity, these mined paraphrase corpora can be used to train simplification systems in any language. We further incorporate multilingual unsupervised pretraining methods to create even stronger models and show that by training on mined data rather than supervised corpora, we outperform the previous best results. We evaluate our approach on English, French, and Spanish simplification benchmarks and reach state-of-the-art performance with a totally unsupervised approach. We will release our models and code to mine the data in any language included in Common Crawl

    Producció d'un programa de televisió multimèdia en català sobre videojocs

    Get PDF
    El projecte vol crear un espai multimèdia (TV i internet) en llengua catalana per parlar de videojocs. Per fer-ho s'analitza l'evolució de la indústria cultural dels videojocs, s'estudien els antecedents, es planteja el material i recursos humans necessaris per posar-lo en marxa i s'estructura el programa a través de les diferents seccions i continguts, a més de tractar el llibre d'estil que el conforma i el model d'amortització econòmica a seguir.El proyecto pretende crear un espacio multimedia (TV e internet) en catalán para hablar sobre videojuegos. Para hacerlo se analiza la evolución de la industria cultural de los videojuegos, se estudian los antecedentes, se plantea el material y recursos humanos necesarios para ponerlo en marcha y se estructura el programa a través de los diferentes contenidos, además de tratar su modelo de amortización económica y el libro de estilo.This project wants to create a multimedia show (TV, internet) in Catalan about video games. To make it viable analyses the evolution of this cultural industry, study the antecedents, proposes the material and human resources to set up it and structures the program through the distinct contents, in addition to treating the style book that conforms it and the economic model

    Controllable Sentence Simplification

    Get PDF
    Due to COVID19 pandemic, the 12th edition is cancelled. The LREC 2020 Proceedings are available at http://www.lrec-conf.org/proceedings/lrec2020/index.htmlInternational audienceText simplification aims at making a text easier to read and understand by simplifying grammar and structure while keeping the underlying information identical. It is often considered an all-purpose generic task where the same simplification is suitable for all; however multiple audiences can benefit from simplified text in different ways. We adapt a discrete parametrization mechanism that provides explicit control on simplification systems based on Sequence-to-Sequence models. As a result, users can condition the simplifications returned by a model on attributes such as length, amount of paraphrasing, lexical complexity and syntactic complexity. We also show that carefully chosen values of these attributes allow out-of-the-box Sequence-to-Sequence models to outperform their standard counterparts on simplification benchmarks. Our model, which we call ACCESS (as shorthand for AudienCe-CEntric Sentence Simplification), establishes the state of the art at 41.87 SARI on the WikiLarge test set, a +1.42 improvement over the best previously reported score

    MUSS: Multilingual Unsupervised Sentence Simplification by Mining Paraphrases

    Get PDF
    International audienceProgress in sentence simplification has been hindered by a lack of labeled parallel simplification data, particularly in languages other than English. We introduce MUSS, a Multilingual Unsupervised Sentence Simplification system that does not require labeled simplification data. MUSS uses a novel approach to sentence simplification that trains strong models using sentencelevel paraphrase data instead of proper simplification data. These models leverage unsupervised pretraining and controllable generation mechanisms to flexibly adjust attributes such as length and lexical complexity at inference time. We show that this paraphrase data can be mined in any language from Common Crawl using semantic sentence embeddings, thus removing the need for labeled data. We evaluate our approach on English, French, and Spanish simplification benchmarks and closely match or outperform the previous best supervised results, despite not using any labeled simplification data. We push the state of the art further by incorporating labeled simplification data

    Reference-less Quality Estimation of Text Simplification Systems

    Get PDF
    International audienceThe evaluation of text simplification (TS) systems remains an open challenge. As the task has common points with machine translation (MT), TS is often evaluated using MT metrics such as BLEU. However, such metrics require high quality reference data, which is rarely available for TS. TS has the advantage over MT of being a monolingual task, which allows for direct comparisons to be made between the simplified text and its original version. In this paper, we compare multiple approaches to reference-less quality estimation of sentence-level text simplification systems, based on the dataset used for the QATS 2016 shared task. We distinguish three different dimensions: gram-maticality, meaning preservation and simplicity. We show that n-gram-based MT metrics such as BLEU and METEOR correlate the most with human judgment of grammaticality and meaning preservation, whereas simplicity is best evaluated by basic length-based metrics

    Avances y retos en la cooperación en materia de gestión de aguas transfronterizas en los países del ámbito iberoamericano : (ODS Nº 6. Meta 6.5. Gestión de los recursos hídricos)

    Get PDF
    Precede al tit.: Análisis del indicador 6.5.2La Agenda 2030, que establece los Objetivos de Desarrollo Sostenible (ODS), fue adoptada por los 193 miembros de las Naciones Unidas en septiembre de 2015. Esta agenda incluye 17 objetivos y 169 metas de carácter integrado e indivisible, que abarcan las esferas económica, social y ambiental, y que han ido marcando la hoja de ruta a diversos niveles. Los indicadores de cada una de las metas de los ODS reflejan el grado de avance en el desarrollo de las distintas políticas públicas para el logro de cada uno de los objetivos. El análisis detallado de los indicadores del ODS6 es, por tanto, un buen reflejo del grado de implantación de las políticas de recursos hídricos

    Following a foraging fish-finder : diel habitat use of Blainville's beaked whales revealed by echolocation

    Get PDF
    © The Author(s), 2011. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in PLoS One 6 (2011): e28353, doi:10.1371/journal.pone.0028353.Simultaneous high resolution sampling of predator behavior and habitat characteristics is often difficult to achieve despite its importance in understanding the foraging decisions and habitat use of predators. Here we tap into the biosonar system of Blainville's beaked whales, Mesoplodon densirostris, using sound and orientation recording tags to uncover prey-finding cues available to echolocating predators in the deep-sea. Echolocation sounds indicate where whales search and encounter prey, as well as the altitude of whales above the sea-floor and the density of organisms around them, providing a link between foraging activity and the bio-physical environment. Tagged whales (n = 9) hunted exclusively at depth, investing most of their search time either in the lower part of the deep scattering layer (DSL) or near the sea-floor with little diel change. At least 43% (420/974) of recorded prey-capture attempts were performed within the benthic boundary layer despite a wide range of dive depths, and many dives included both meso- and bentho-pelagic foraging. Blainville's beaked whales only initiate searching when already deep in the descent and encounter prey suitable for capture within 2 min of the start of echolocation, suggesting that these whales are accessing prey in reliable vertical strata. Moreover, these prey resources are sufficiently dense to feed the animals in what is effectively four hours of hunting per day enabling a strategy in which long dives to exploit numerous deep-prey with low nutritional value require protracted recovery periods (average 1.5 h) between dives. This apparent searching efficiency maybe aided by inhabiting steep undersea slopes with access to both the DSL and the sea-floor over small spatial scales. Aggregations of prey in these biotopes are located using biosonar-derived landmarks and represent stable and abundant resources for Blainville's beaked whales in the otherwise food-limited deep-ocean.The work was funded by the Office of Naval Research and the National Ocean Partnership Program (US), by a consortium consisting of the Canary Islands Government, the Spanish Ministry of Environment and the Spanish Ministry of Defense, and by the European environmental funding LIFE-INDEMARES program for the inventory and designation of the Natura 2000 network in marine areas of the Spanish territory, headed by Fundacion Biodiversidad, with additional support from the Cabildo Insular of El Hierro. PA is currently supported by the National Research Project: Cetacean, Oceanography and Biodiversity from La Palma and El Hierro (CGL2009-13112) of the Spanish Ministry of Science and NAS by a Marie Curie fellowship from the 7th European Frame Program. MJ was supported by grants from the Strategic Environmental Research Development Program and from the National Ocean Partnership Program. PTM was supported by frame grants from the National Danish Science Foundation

    Ramification of lithic production and the search of small tools in Iberian Peninsula Middle Paleolithic

    Get PDF
    The notion of recycling and it relationship with ramified productions and small tool production in Late Middle Paleolithic from the Iberian Peninsula are investigated. Results from Amalda, Axlor, Pe~na Miel, and Quebrada show that the production of small tools is one of the principal objectives of lithic provisioning in these sites. Whereas in Axlor and Amalda, this is achieved through the ramification of production, due to the remoteness of flint sources, in Quebrada, where raw material sources are closer, small flakes are obtained at the end of Levallois production. The implications for Neandertal society organization of this small tool production is discussed, and its evolution is observed from a diachronic perspective
    corecore