2,691 research outputs found

    Automatic reconstruction of itineraries from descriptive texts

    Get PDF
    Esta tesis se inscribe dentro del marco del proyecto PERDIDO donde los objetivos son la extracción y reconstrucción de itinerarios a partir de documentos textuales. Este trabajo se ha realizado en colaboración entre el laboratorio LIUPPA de l' Université de Pau et des Pays de l' Adour (France), el grupo de Sistemas de Información Avanzados (IAAA) de la Universidad de Zaragoza y el laboratorio COGIT de l' IGN (France). El objetivo de esta tesis es concebir un sistema automático que permita extraer, a partir de guías de viaje o descripciones de itinerarios, los desplazamientos, además de representarlos sobre un mapa. Se propone una aproximación para la representación automática de itinerarios descritos en lenguaje natural. Nuestra propuesta se divide en dos tareas principales. La primera pretende identificar y extraer de los textos describiendo itinerarios información como entidades espaciales y expresiones de desplazamiento o percepción. El objetivo de la segunda tarea es la reconstrucción del itinerario. Nuestra propuesta combina información local extraída gracias al procesamiento del lenguaje natural con datos extraídos de fuentes geográficas externas (por ejemplo, gazetteers). La etapa de anotación de informaciones espaciales se realiza mediante una aproximación que combina el etiquetado morfo-sintáctico y los patrones léxico-sintácticos (cascada de transductores) con el fin de anotar entidades nombradas espaciales y expresiones de desplazamiento y percepción. Una primera contribución a la primera tarea es la desambiguación de topónimos, que es un problema todavía mal resuelto dentro del reconocimiento de entidades nombradas (Named Entity Recognition - NER) y esencial en la recuperación de información geográfica. Se plantea un algoritmo no supervisado de georreferenciación basado en una técnica de clustering capaz de proponer una solución para desambiguar los topónimos los topónimos encontrados en recursos geográficos externos, y al mismo tiempo, la localización de topónimos no referenciados. Se propone un modelo de grafo genérico para la reconstrucción automática de itinerarios, donde cada nodo representa un lugar y cada arista representa un camino enlazando dos lugares. La originalidad de nuestro modelo es que además de tener en cuenta los elementos habituales (caminos y puntos del recorrido), permite representar otros elementos involucrados en la descripción de un itinerario, como por ejemplo los puntos de referencia visual. Se calcula de un árbol de recubrimiento mínimo a partir de un grafo ponderado para obtener automáticamente un itinerario bajo la forma de un grafo. Cada arista del grafo inicial se pondera mediante un método de análisis multicriterio que combina criterios cualitativos y cuantitativos. El valor de estos criterios se determina a partir de informaciones extraídas del texto e informaciones provenientes de recursos geográficos externos. Por ejemplo, se combinan las informaciones generadas por el procesamiento del lenguaje natural como las relaciones espaciales describiendo una orientación (ej: dirigirse hacia el sur) con las coordenadas geográficas de lugares encontrados dentro de los recursos para determinar el valor del criterio ``relación espacial''. Además, a partir de la definición del concepto de itinerario y de las informaciones utilizadas en la lengua para describir un itinerario, se ha modelado un lenguaje de anotación de información espacial adaptado a la descripción de desplazamientos, apoyándonos en las recomendaciones del consorcio TEI (Text Encoding and Interchange). Finalmente, se ha implementado y evaluado las diferentes etapas de nuestra aproximación sobre un corpus multilingüe de descripciones de senderos y excursiones (francés, español, italiano)

    The Interplay Of Syntactic Parsing Strategies And Prosodic Phrase Lengths In Processing Turkish Sentences

    Full text link
    Many experiments have shown that the prosody (rhythm and melody) with which a sentence is uttered can provide a listener with cues to its syntactic structure (Lehiste, 1973, and since). A few studies have observed in addition that an inappropriate prosodic contour can mislead the syntactic parsing routines, resulting in a prosody-induced garden-path. These include, among others, Speer et al. (1996) and Kjelgaard and Speer (1999) for English. The studies by Speer et al. and Kjelgaard and Speer (SKS) showed that misplaced prosodic cues caused more processing difficulty in sentences with early closure of a clause (EC syntax) than in ones with late closure of a clause (LC syntax). One possible explanation for these results is that when prosody is misleading about the syntactic structure, the parser may ignore it and resort to a syntactic Late Closure strategy, as it does in reading where there is no overt prosodic boundary to inform the parser about the syntactic structure of the sentence. Augurzky\u27s (2006) observation of an LC syntax advantage for prosody-syntax mismatch conditions in her investigation of German relative clause attachment ambiguities provides support for this explanation. An alternative explanation considers the possibility that constituent lengths could have influenced the perceived informativeness of overt prosodic cues in these studies, as proposed in the Rational Speaker Hypothesis of Clifton et al. (2002, 2006). The Rational Speaker Hypothesis (RSH) maintains that prosodic breaks flanking shorter constituents are taken more seriously as indicators of syntactic structure than prosodic breaks flanking longer constituents, because the former cannot be justified as motivated by optimal length considerations. To test these two alternative hypotheses, four listening experiments were conducted. There was an additional reading experiment preceding the listening experiments to explore potential effects of the Late Closure strategy and constituent lengths in reading where there is no overt prosody. In all cases the target materials were temporarily ambiguous Turkish sentences which could be morphologically resolved as either LC or EC syntactic constructions. Constituent lengths were systematically manipulated in all target materials, such that the length-optimal prosodic phrasing was associated with LC syntax in one condition, and with EC syntax in the other. Experiment 1 employed a missing morpheme task developed for this study. In the missing morpheme task, underscores (length-averaged) replaced the disambiguating morphemes and participants had to insert them as they read the sentences aloud. Results revealed significant effects of phrase lengths in readers\u27 syntactic interpretations as indicated by the morphemes they inserted and the prosodic breaks they produced. Experiments 2A and 2B employed an end-of-sentence `got it\u27 task (Frazier et al., 1983), in which participants listened to spoken sentences and indicated after each one whether they understood or did not understand it. Sentences in Experiment 2A had phrase length distribution similar to the SKS English materials. Experiment 2B manipulated lengths in reverse. The stimuli had cooperating, conflicting or neutral prosody. Response time data supported an interplay of both syntactic Late Closure and RSH. Thus it was concluded that constituent lengths can indeed have a significant effect on listeners\u27 parsing decisions, in addition to the familiar syntactic parsing biases and prosodic influences. Experiments 3A and 3B used a lexical probe version of the phoneme restoration paradigm employed by Stoyneshka et al. (2010). In the phoneme restoration paradigm, the disambiguating phonemes (in the verb, in these materials) are replaced with noise (in this study, pink noise). In the lexical probe version of this paradigm (developed for this study) participants listened to the sentences with LC, EC or neutral prosody, and at the end of the sentence they were presented with a visual probe (one of the two possible disambiguating verbs, complete with all phonemes) that was congruent or incongruent or compatible with the prosody of the sentence they had heard. Their task was to respond to the visual probe either `yes\u27 (i.e., `I heard this word in the sentence I have just listened to\u27) or `no\u27 (i.e., `I didn\u27t hear this word\u27). Response time to the probe word indirectly taps which of the disambiguating morphemes on the verb the listener mentally supplies when it has been replaced by noise. The materials for Experiments 3A and 3B were identical to those used in Experiments 2A and 2B respectively except that the disambiguating phonemes were noise-replaced. Results of Experiments 3A and 3B showed that listeners were highly sensitive to the sentential prosody as revealed by their phoneme restoration responses and response time data, confirming Stoyneshka et al.\u27s findings establishing the reliability of the phoneme restoration paradigm in investigating effects of prosody in ambiguity resolution. Response time data showed a pattern similar to what SKS observed for English (except for one condition in Experiment 3A, with incongruent probes): despite the phrase length reversal in Experiment 3B, there was no influence of phrase length distribution on ambiguity resolution. This has a natural explanation in light of the difference between the `got it\u27 task with disambiguating morphology within the sentence stimulus, and the phoneme restoration task in which the listener can project onto the verb whatever morphology is compatible with the heard prosody. LC and EC were processed equally well for congruent probes, and there was an LC advantage in the incongruent and compatible probe conditions. Overall results support the hypothesis that syntactic Late Closure becomes evident in listening when prosody is absent or misleading, and also that phrase lengths can play a significant role

    Accepting Preposition-Stranding under Sluicing Cross-linguistically; a Noisy-Channel Approach

    Get PDF
    This thesis investigates the representation and processing of sluicing, a type of ellipsis where an interrogative CP is reduced to its initial wh-element (the remnant), e.g. Mary danced with someone, but I can't remember (with) who. It is debated whether remnants from within a PP (with who) must appear with this P or whether they can appear without it (`Pstranding'). Existing theoretical literature (Merchant, 2001; a.o.) argues that only languages allowing overt CPs to move wh-elements without their embedding P will allow P-stranding remnants (P-Stranding Generalisation/PSG). Anecdotally, many languages appear to defy this pattern, allowing P-stranding remnants despite disallowing P-stranding overtly. None of these examples, however, are supported by adequate experimental evidence, nor o er a cross-linguistically generalisable explanation. This thesis addresses both these issues. Novel large-scale acceptability data show that both Greek and German, previously proposed robust PSG-examples, do indeed defy it. This behaviour is explained by proposing ellipsis is a type of `noisy channel' (Shannon, 1948; Gibson, Bergen & Piantadosi, 2013), through which the parser must estimate the probability of the intended (elided) message. The parser simultaneously considers the prior likelihood of the intended message (a remnant as part of a full PP) as well as the likelihood of this message being corrupted through `noise' (a deleted P). P-stranding is thus considered a form of deletion, given deletion has been shown to be a likely corruption in noisy channels. A series of reading time studies aimed at supporting this noisy channel model in online processing found results overall consistent with this approach, but also discovered previous work on the processing of sluicing was inaccurate in concluding its active prediction by the parser. Collectively, the work argues for a theory of sluicing involving syntactic structure at the e-site together with sluicing being treated as a noisy channel by the parser

    Complexity, efficiency, and language contact: Pronoun omission in World Englishes

    Get PDF
    The book provides an assessment of the contribution of pronoun omission to the complexity and efficiency of varieties of English and the influence of language contact on its attestation and pervasiveness. On the one hand, omitted pronouns result in simpler and more efficient structures, provided their antecedents are retrievable from the context. On the other hand, the choice between overt and omitted pronouns depends on several grammatical constraints, which in turn may entail an increase in system complexity. Two methodologically different but complementary case studies are presented, which contribute new findings to the literature at the crossroads of research on World Englishes, complexity, efficiency, and pronoun omission.European Regional Development Fund and the following institutions: Regional Government of Galicia (Directorate General for Scientific and Technological Promotion, grants ED431B 2017/12 and ED431D 2017/09); Spanish Ministry of Innovation, Science and Universities (grants FFI2017-86884-P, FFI2014-52188-P and BES-2015-071233)

    Conjunctive Queries for Logic-Based Information Extraction

    Full text link
    This thesis offers two logic-based approaches to conjunctive queries in the context of information extraction. The first and main approach is the introduction of conjunctive query fragments of the logics FC and FC[REG], denoted as FC-CQ and FC[REG]-CQ respectively. FC is a first-order logic based on word equations, where the semantics are defined by limiting the universe to the factors of some finite input word. FC[REG] is FC extended with regular constraints. The second approach is to consider the dynamic complexity of FC.Comment: Based on the author's PhD thesis and contains work from two conference publications (arXiv:2104.04758, arXiv:1909.10869) which are joint work with Dominik D. Freydenberge

    Studies in the linguistic sciences. 08 (1978)

    Get PDF
    MLA international bibliography of books and articles on the modern languages and literatures (Complete edition) 0024-821

    NASA publications manual 1974

    Get PDF
    The various types of NASA publications are described, including formal series, contributions to external publications, informal papers, and supplementary report material. The physical appearance and reproduction procedures for the format of the NASA formal series are discussed, and samples are provided. Matters relating to organization, content, and general style are also considered

    The current approaches in pattern recognition

    Get PDF
    corecore