137 research outputs found

    Process of Expropriation of Buildings for the Construction of the New Dock of the Port of Cartagena (Spain) in the 18th Century

    Full text link
    The need of the Bourbon monarchy to build a Naval Base in the Bay of Cartagena (Spain) during the eighteenth century, implied performing various actions on the environment which allowed the construction of the new dock. One of the priority actions was the transformation of the watershed of the streams that flowed into Mandaraches´s sea. For this reason, a dike was designed and constructed in the northern part of the city. The design of this great work, which was designed as a fortification of the city, was subject to considerable uncertainties. Its proximity to the city involved the demolition of several buildings in the San Roque´s neighborhood. The greater or lesser number of affected buildings and the value of the just indemnification for the expropriation of them, become decisive factors to determine if the work was viable for the Royal Estate or not

    Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages

    Full text link
    Large, curated, web-crawled corpora play a vital role in training language models (LMs). They form the lion's share of the training data in virtually all recent LMs, such as the well-known GPT, LLaMA and XLM-RoBERTa models. However, despite this importance, relatively little attention has been given to the quality of these corpora. In this paper, we compare four of the currently most relevant large, web-crawled corpora (CC100, MaCoCu, mC4 and OSCAR) across eleven lower-resourced European languages. Our approach is two-fold: first, we perform an intrinsic evaluation by performing a human evaluation of the quality of samples taken from different corpora; then, we assess the practical impact of the qualitative differences by training specific LMs on each of the corpora and evaluating their performance on downstream tasks. We find that there are clear differences in quality of the corpora, with MaCoCu and OSCAR obtaining the best results. However, during the extrinsic evaluation, we actually find that the CC100 corpus achieves the highest scores. We conclude that, in our experiments, the quality of the web-crawled corpora does not seem to play a significant role when training LMs.Comment: Accepted to LREC-COLING 2024 (long

    Plataforma digital interactiva como modelo de gestión en el campo de la arquitectura e ingeniería: ecosistema parque natural de 'El Hondo'

    Full text link
    Plataforma digital interactiva como modelo de gestión en el campo de la arquitectura e ingeniería: ecosistema parque natural de 'El Hondo

    Machine translation for everyone: Empowering users in the age of artificial intelligence

    Get PDF
    Language learning and translation have always been complementary pillars of multilingualism in the European Union. Both have been affected by the increasing availability of machine translation (MT): language learners now make use of free online MT to help them both understand and produce texts in a second language, but there are fears that uninformed use of the technology could undermine effective language learning. At the same time, MT is promoted as a technology that will change the face of professional translation, but the technical opacity of contemporary approaches, and the legal and ethical issues they raise, can make the participation of human translators in contemporary MT workflows particularly complicated. Against this background, this book attempts to promote teaching and learning about MT among a broad range of readers, including language learners, language teachers, trainee translators, translation teachers, and professional translators. It presents a rationale for learning about MT, and provides both a basic introduction to contemporary machine-learning based MT, and a more advanced discussion of neural MT. It explores the ethical issues that increased use of MT raises, and provides advice on its application in language learning. It also shows how users can make the most of MT through pre-editing, post-editing and customization of the technology

    Apertium: a free/open-source platform for rule-based machine translation

    Get PDF
    Apertium is a free/open-source platform for rule-based machine translation. It is being widely used to build machine translation systems for a variety of language pairs, especially in those cases (mainly with related-language pairs) where shallow transfer suffices to produce good quality translations, although it has also proven useful in assimilation scenarios with more distant pairs involved. This article summarises the Apertium platform: the translation engine, the encoding of linguistic data, and the tools developed around the platform. The present limitations of the platform and the challenges posed for the coming years are also discussed. Finally, evaluation results for some of the most active language pairs are presented. An appendix describes Apertium as a free/open-source project.We thank the support of the Spanish Ministry of Science and Innovation through project TIN2009-14009-C02-01. Apertium has been mainly funded by the Ministries of Industry, Tourism and Commerce, of Education and Science, and of Science and Technology of Spain, the Government of Catalonia, the Ministry of Foreign Affairs of Romania, the Universitat d’Alacant, the Universidade de Vigo, Ofis ar Brezhoneg and Google Summer of Code (2009, 2010 and 2011 editions). Many companies have also invested in it: Prompsit Language Engineering, ABC Enciklopedioj, Eleka Ingeniaritza Linguistikoa, imaxin|software, etc

    Análisis de la gestión del turismo sostenible en las Comunidades Autónomas costeras del litoral mediterráneo español en el s. XXI.

    Get PDF
    El crecimiento azul se ha convertido en una prioridad fundamental para la Unión Europea como se puede observar en diversos documentos publicados en esta década (Comisión Europea, 2014a; Comisión Europea, 2014b; Comisión Europea, 2014c; Parlamento Europeo, 2015; Conecturmed, 2104). En todos ellos se afirma la necesidad de un nuevo modelo turístico que se fundamente en los principios de sostenibilidad ambiental, económica, cultural y social. Por ello, y aprovechando la publicación del sistema de indicadores de desarrollo turístico sostenible para Andalucía (Con- sejería de Turismo y Comercio, 2015), se pretende analizar la gestión turística de las comunidades autónomas litorales del mediterráneo español. Este análisis se realizará con la selección de indicadores de territorio, calidad, diversifica- ción y medio ambiente. En este trabajo se pueden observar las diferencias de gestión existentes entre las Comunidades Autónomas litorales mediterráneas y la falta de políticas que afrontan las consecuencias del cambio climático. De igual forma se detecta la falta de instrumentos de participación y coordinación en muchos de los territorios analizados que evidencia la inexistencia de una gestión integradas de los destinos turísticos que incluya tanto las áreas terrestres como las marinas. Por último el reto con el que se enfrentan estas comunidades autónomas es la transformación de un modelo maduro de turismo de sol y playa por un modelo turístico que podríamos denominar turismo azul que incluya e integre todos los recursos existentes en las zonas costeras reflejados en el patrimonio natural y cultural

    OpusCleaner and OpusTrainer, open source toolkits for training Machine Translation and Large language models

    Full text link
    Developing high quality machine translation systems is a labour intensive, challenging and confusing process for newcomers to the field. We present a pair of tools OpusCleaner and OpusTrainer that aim to simplify the process, reduce the amount of work and lower the entry barrier for newcomers. OpusCleaner is a data downloading, cleaning, and proprocessing toolkit. It is designed to allow researchers to quickly download, visualise and preprocess bilingual (or monolingual) data that comes from many different sources, each of them with different quality, issues, and unique filtering/preprocessing requirements. OpusTrainer is a data scheduling and data augmenting tool aimed at building large scale, robust machine translation systems and large language models. It features deterministic data mixing from many different sources, on-the-fly data augmentation and more. Using these tools, we showcase how we can use it to create high quality machine translation model robust to noisy user input; multilingual models and terminology aware models.Comment: Code on Github: https://github.com/hplt-project/OpusCleaner and https://github.com/hplt-project/OpusTraine
    • …
    corecore