3,085 research outputs found

    Marco para parsing predictivo interactivo aplicado a la lengua castellana

    Get PDF
    El marco teórico de Parsing Predictivo Interactivo (IPP) permite construir sistemas de anotación sintáctica interactivos. Los anotadores humanos pueden utilizar estos sistemas de ayuda para crear árboles sintácticos con muy poco esfuerzo (en comparación con el trabajo requerido para corregir manualmente árboles obtenidos a partir de un analizador sintáctico completamente automático). En este artículo se presenta la adaptación a la lengua castellana del marco IPP y su herramienta de anotación IPP-Ann, usando modelos obtenidos a partir del UAM Spanish Treebank. Hemos llevado a cabo experimentación simulando al usuario para obtener métricas de evaluación objetivas para nuestro sistema. Estos resultados muestran que el marco IPP aplicado al UAM Spanish Treebank se traduce en una importante cantidad de esfuerzo ahorrado, comparable con el obtenido al aplicar el marco IPP para analizar la lengua inglesa mediante el Penn Treebank.The Interactive Predictive Parsing (IPP) framework allows us the construction of interactive tree annotation systems. These can help human annotators in creating error-free parse trees with little effort (compared to manually post-editing the trees obtained from a completely automatic parser). In this paper we adapt the IPP framework and the IPP-Ann annotation tool for parse of the Spanish language, by using models obtained from the UAM Spanish Treebank. We performed user simulation experimentation and obtained objective evaluation metrics. The results establish that the IPP framework over the UAM Treebank shows important amounts of user effort reduction, comparable to the gains obtained when applying IPP to the English language on the Penn Treebank.Work supported by the EC (FEDER, FSE), the Spanish Government and Generalitat Valenciana (MICINN, ”Plan E”, under grants MIPRCV ”Consolider Ingenio 2010” CSD2007-00018, MIT-TRAL TIN2009-14633-C03-01, ALMPR Prometeo/2009/014 and FPU AP2006-01363)

    Introduction to the special issue on cross-language algorithms and applications

    Get PDF
    With the increasingly global nature of our everyday interactions, the need for multilingual technologies to support efficient and efective information access and communication cannot be overemphasized. Computational modeling of language has been the focus of Natural Language Processing, a subdiscipline of Artificial Intelligence. One of the current challenges for this discipline is to design methodologies and algorithms that are cross-language in order to create multilingual technologies rapidly. The goal of this JAIR special issue on Cross-Language Algorithms and Applications (CLAA) is to present leading research in this area, with emphasis on developing unifying themes that could lead to the development of the science of multi- and cross-lingualism. In this introduction, we provide the reader with the motivation for this special issue and summarize the contributions of the papers that have been included. The selected papers cover a broad range of cross-lingual technologies including machine translation, domain and language adaptation for sentiment analysis, cross-language lexical resources, dependency parsing, information retrieval and knowledge representation. We anticipate that this special issue will serve as an invaluable resource for researchers interested in topics of cross-lingual natural language processing.Postprint (published version

    Improving the translation environment for professional translators

    Get PDF
    When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side. This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project

    Novel statistical approaches to text classification, machine translation and computer-assisted translation

    Full text link
    Esta tesis presenta diversas contribuciones en los campos de la clasificación automática de texto, traducción automática y traducción asistida por ordenador bajo el marco estadístico. En clasificación automática de texto, se propone una nueva aplicación llamada clasificación de texto bilingüe junto con una serie de modelos orientados a capturar dicha información bilingüe. Con tal fin se presentan dos aproximaciones a esta aplicación; la primera de ellas se basa en una asunción naive que contempla la independencia entre las dos lenguas involucradas, mientras que la segunda, más sofisticada, considera la existencia de una correlación entre palabras en diferentes lenguas. La primera aproximación dió lugar al desarrollo de cinco modelos basados en modelos de unigrama y modelos de n-gramas suavizados. Estos modelos fueron evaluados en tres tareas de complejidad creciente, siendo la más compleja de estas tareas analizada desde el punto de vista de un sistema de ayuda a la indexación de documentos. La segunda aproximación se caracteriza por modelos de traducción capaces de capturar correlación entre palabras en diferentes lenguas. En nuestro caso, el modelo de traducción elegido fue el modelo M1 junto con un modelo de unigramas. Este modelo fue evaluado en dos de las tareas más simples superando la aproximación naive, que asume la independencia entre palabras en differentes lenguas procedentes de textos bilingües. En traducción automática, los modelos estadísticos de traducción basados en palabras M1, M2 y HMM son extendidos bajo el marco de la modelización mediante mixturas, con el objetivo de definir modelos de traducción dependientes del contexto. Asimismo se extiende un algoritmo iterativo de búsqueda basado en programación dinámica, originalmente diseñado para el modelo M2, para el caso de mixturas de modelos M2. Este algoritmo de búsqueda nCivera Saiz, J. (2008). Novel statistical approaches to text classification, machine translation and computer-assisted translation [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/2502Palanci

    Reassessing second language reading comprehension: Insights from the psycholinguistics notion of sentence processing

    Get PDF
    Theories and practices in second language reading pedagogy often overlook the sentence processing description from the psycholinguistics perspective. Second language reading comprehension is easily associated with vocabulary learning or discourse strategy. Yet, such activities can lead to an unnatural way of reading such as translating vocabularies or pointing out information as required. Meanwhile the authentic way of reading should encourage a natural stream of ideas to be interpreted from sentence to sentence. As suggested by the sentence processing notion from the psycholinguistics point of view, syntax appears to be the key to effective and authentic reading as opposed to the general belief of semantic or discourse information being the primary concern. This article argues that understanding the architecture of sentence processing, with syntactic parsing at the core of the underlying mechanism, can offer insights into the second language reading pedagogy. The concepts of syntactic parsing, reanalysis, and sentence processing models are described to give the idea of how sentence processing works. Additionally, a critical review on the differences between L1 and L2 sentence processing is presented considering the recent debate on individual differences as significant indicators of nativelike L2 sentence processing. Lastly, implications for the L2 reading pedagogy and potential implementation in instructional setting are discussed

    GALENA: tabular DCG parsing for natural languages

    Get PDF
    [Abstract] We present a definite clause based parsing environment for natural languages, whose operational model is the dynamic interpretation of logical push-down automata. We attempt to briefly explain our design decisions in terms of a set of properties that practical natural language processing systems should incorporate. The aim is to show both the advantages and the drawbacks of our approach.España. Gobierno; HF96-36Xunta de Galcia; XUGA10505B96Xunta de Galcia; XUGA20402B9

    An integrated theory of language production and comprehension

    Get PDF
    Currently, production and comprehension are regarded as quite distinct in accounts of language processing. In rejecting this dichotomy, we instead assert that producing and understanding are interwoven, and that this interweaving is what enables people to predict themselves and each other. We start by noting that production and comprehension are forms of action and action perception. We then consider the evidence for interweaving in action, action perception, and joint action, and explain such evidence in terms of prediction. Specifically, we assume that actors construct forward models of their actions before they execute those actions, and that perceivers of others' actions covertly imitate those actions, then construct forward models of those actions. We use these accounts of action, action perception, and joint action to develop accounts of production, comprehension, and interactive language. Importantly, they incorporate well-defined levels of linguistic representation (such as semantics, syntax, and phonology). We show (a) how speakers and comprehenders use covert imitation and forward modeling to make predictions at these levels of representation, (b) how they interweave production and comprehension processes, and (c) how they use these predictions to monitor the upcoming utterances. We show how these accounts explain a range of behavioral and neuroscientific data on language processing and discuss some of the implications of our proposal
    corecore