8,899 research outputs found

    Say Oui to We : A Longitudinal Analysis of Pronouns and Articles in French and English

    Get PDF
    Modern English only uses gender in personal, reflexive, and possessive third person singular pronouns. Modern English also does not use gendered articles, which extends to not assigning an arbitrary gender to inanimate objects. This study examines how recent this aspect of grammar is, and to what degree did cultural interaction with the French throughout history influence the use of gendered pronouns. Two written texts in British English (one in Old English, one in Modern English) and one written text in French are analyzed for elements of grammatical gender embedded within articles, pronouns, and possessive adjectives. The geopolitical influences on incorporating gender into language were also considered. This study found that gender is altogether more present in Old English and Modern French. Old English is found to have more gendered articles and pronouns than other later evolutions of English. Interactions between Norman pirates and Celtic Britons up through French words being fashionably borrowed by English nobles are evidence of geopolitical and international relations impacting the evolution of the English language

    Improving the translation environment for professional translators

    Get PDF
    When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side. This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project

    Robust input representations for low-resource information extraction

    Get PDF
    Recent advances in the field of natural language processing were achieved with deep learning models. This led to a wide range of new research questions concerning the stability of such large-scale systems and their applicability beyond well-studied tasks and datasets, such as information extraction in non-standard domains and languages, in particular, in low-resource environments. In this work, we address these challenges and make important contributions across fields such as representation learning and transfer learning by proposing novel model architectures and training strategies to overcome existing limitations, including a lack of training resources, domain mismatches and language barriers. In particular, we propose solutions to close the domain gap between representation models by, e.g., domain-adaptive pre-training or our novel meta-embedding architecture for creating a joint representations of multiple embedding methods. Our broad set of experiments demonstrates state-of-the-art performance of our methods for various sequence tagging and classification tasks and highlight their robustness in challenging low-resource settings across languages and domains.Die jüngsten Fortschritte auf dem Gebiet der Verarbeitung natürlicher Sprache wurden mit Deep-Learning-Modellen erzielt. Dies führte zu einer Vielzahl neuer Forschungsfragen bezüglich der Stabilität solcher großen Systeme und ihrer Anwendbarkeit über gut untersuchte Aufgaben und Datensätze hinaus, wie z. B. die Informationsextraktion für Nicht-Standardsprachen, aber auch Textdomänen und Aufgaben, für die selbst im Englischen nur wenige Trainingsdaten zur Verfügung stehen. In dieser Arbeit gehen wir auf diese Herausforderungen ein und leisten wichtige Beiträge in Bereichen wie Repräsentationslernen und Transferlernen, indem wir neuartige Modellarchitekturen und Trainingsstrategien vorschlagen, um bestehende Beschränkungen zu überwinden, darunter fehlende Trainingsressourcen, ungesehene Domänen und Sprachbarrieren. Insbesondere schlagen wir Lösungen vor, um die Domänenlücke zwischen Repräsentationsmodellen zu schließen, z.B. durch domänenadaptives Vortrainieren oder unsere neuartige Meta-Embedding-Architektur zur Erstellung einer gemeinsamen Repräsentation mehrerer Embeddingmethoden. Unsere umfassende Evaluierung demonstriert die Leistungsfähigkeit unserer Methoden für verschiedene Klassifizierungsaufgaben auf Word und Satzebene und unterstreicht ihre Robustheit in anspruchsvollen, ressourcenarmen Umgebungen in verschiedenen Sprachen und Domänen

    Neurocognitive Informatics Manifesto.

    Get PDF
    Informatics studies all aspects of the structure of natural and artificial information systems. Theoretical and abstract approaches to information have made great advances, but human information processing is still unmatched in many areas, including information management, representation and understanding. Neurocognitive informatics is a new, emerging field that should help to improve the matching of artificial and natural systems, and inspire better computational algorithms to solve problems that are still beyond the reach of machines. In this position paper examples of neurocognitive inspirations and promising directions in this area are given

    The Application of Translation Strategies and Predicate Transformations in Teaching Written Translation

    Get PDF
    The aim of the article was to investigate the impact of the context and instructions on the situation during written translation. In particular, there was an objective to clarify two aspects: pragmatic (preservation of the pragmatics of the original in the translation, taking into account the task) and syntactic (the impact of changing the syntax on the correctness of the translation). The following methods were used: contextual analysis of discourse, description, classification, observation. Experimental research has shown that when translating without context and situation, the translator can only rely on background knowledge and internal context. The conclusions presented in the paper can be used to build theoretical models, to conduct future experimental research to study the translation process or directly in the practice of teaching translation. Besides, the constructed and substantiated method of experimental study of translation can be studied in more detail in future works, supplemented and expanded, with the involvement of texts from various scientific subjects (natural sciences, humanities, social sciences, etc.). It will also be effective during the practical application of scientific, journalistic, artistic, official-business styles and their substyles, as well as various target audiences

    Lexical simplification for the systematic support of cognitive accessibility guidelines

    Get PDF
    The Internet has come a long way in recent years, contributing to the proliferation of large volumes of digitally available information. Through user interfaces we can access these contents, however, they are not accessible to everyone. The main users affected are people with disabilities, who are already a considerable number, but accessibility barriers affect a wide range of user groups and contexts of use in accessing digital information. Some of these barriers are caused by language inaccessibility when texts contain long sentences, unusual words and complex linguistic structures. These accessibility barriers directly affect people with cognitive disabilities. For the purpose of making textual content more accessible, there are initiatives such as the Easy Reading guidelines, the Plain Language guidelines and some of the languagespecific Web Content Accessibility Guidelines (WCAG). These guidelines provide documentation, but do not specify methods for meeting the requirements implicit in these guidelines in a systematic way. To obtain a solution, methods from the Natural Language Processing (NLP) discipline can provide support for achieving compliance with the cognitive accessibility guidelines for the language. The task of text simplification aims at reducing the linguistic complexity of a text from a syntactic and lexical perspective, the latter being the main focus of this Thesis. In this sense, one solution space is to identify in a text which words are complex or uncommon, and in the case that there were, to provide a more usual and simpler synonym, together with a simple definition, all oriented to people with cognitive disabilities. With this goal in mind, this Thesis presents the study, analysis, design and development of an architecture, NLP methods, resources and tools for the lexical simplification of texts for the Spanish language in a generic domain in the field of cognitive accessibility. To achieve this, each of the steps present in the lexical simplification processes is studied, together with methods for word sense disambiguation. As a contribution, different types of word embedding are explored and created, supported by traditional and dynamic embedding methods, such as transfer learning methods. In addition, since most of the NLP methods require data for their operation, a resource in the framework of cognitive accessibility is presented as a contribution.Internet ha avanzado mucho en los últimos años contribuyendo a la proliferación de grandes volúmenes de información disponible digitalmente. A través de interfaces de usuario podemos acceder a estos contenidos, sin embargo, estos no son accesibles a todas las personas. Los usuarios afectados principalmente son las personas con discapacidad siendo ya un número considerable, pero las barreras de accesibilidad afectan a un gran rango de grupos de usuarios y contextos de uso en el acceso a la información digital. Algunas de estas barreras son causadas por la inaccesibilidad al lenguaje cuando los textos contienen oraciones largas, palabras inusuales y estructuras lingüísticas complejas. Estas barreras de accesibilidad afectan directamente a las personas con discapacidad cognitiva. Con el fin de hacer el contenido textual más accesible, existen iniciativas como las pautas de Lectura Fácil, las pautas de Lenguaje Claro y algunas de las pautas de Accesibilidad al Contenido en la Web (WCAG) específicas para el lenguaje. Estas pautas proporcionan documentación, pero no especifican métodos para cumplir con los requisitos implícitos en estas pautas de manera sistemática. Para obtener una solución, los métodos de la disciplina del Procesamiento del Lenguaje Natural (PLN) pueden dar un soporte para alcanzar la conformidad con las pautas de accesibilidad cognitiva relativas al lenguaje La tarea de la simplificación de textos del PLN tiene como objetivo reducir la complejidad lingüística de un texto desde una perspectiva sintáctica y léxica, siendo esta última el enfoque principal de esta Tesis. En este sentido, un espacio de solución es identificar en un texto qué palabras son complejas o poco comunes, y en el caso de que sí hubiera, proporcionar un sinónimo más usual y sencillo, junto con una definición sencilla, todo ello orientado a las personas con discapacidad cognitiva. Con tal meta, en esta Tesis, se presenta el estudio, análisis, diseño y desarrollo de una arquitectura, métodos PLN, recursos y herramientas para la simplificación léxica de textos para el idioma español en un dominio genérico en el ámbito de la accesibilidad cognitiva. Para lograr esto, se estudia cada uno de los pasos presentes en los procesos de simplificación léxica, junto con métodos para la desambiguación del sentido de las palabras. Como contribución, diferentes tipos de word embedding son explorados y creados, apoyados por métodos embedding tradicionales y dinámicos, como son los métodos de transfer learning. Además, debido a que gran parte de los métodos PLN requieren datos para su funcionamiento, se presenta como contribución un recurso en el marco de la accesibilidad cognitiva.Programa de Doctorado en Ciencia y Tecnología Informática por la Universidad Carlos III de MadridPresidente: José Antonio Macías Iglesias.- Secretario: Israel González Carrasco.- Vocal: Raquel Hervás Ballestero

    Proceedings of the 17th Annual Conference of the European Association for Machine Translation

    Get PDF
    Proceedings of the 17th Annual Conference of the European Association for Machine Translation (EAMT
    corecore