293 research outputs found

    Large-Scale Pattern-Based Information Extraction from the World Wide Web

    Get PDF
    Extracting information from text is the task of obtaining structured, machine-processable facts from information that is mentioned in an unstructured manner. It thus allows systems to automatically aggregate information for further analysis, efficient retrieval, automatic validation, or appropriate visualization. This work explores the potential of using textual patterns for Information Extraction from the World Wide Web

    An efficient graph algorithm for dominance constraints

    Get PDF
    Dominance constraints are logical descriptions of trees that are widely used in computational linguistics. Their general satisfiability problem is known to be NP-complete. Here we identify normal dominance constraints and present an efficient graph algorithm for testing their satisfiablity in deterministic polynomial time. Previously, no polynomial time algorithm was known

    Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation

    Get PDF
    This paper surveys the current state of the art in Natural Language Generation (NLG), defined as the task of generating text or speech from non-linguistic input. A survey of NLG is timely in view of the changes that the field has undergone over the past decade or so, especially in relation to new (usually data-driven) methods, as well as new applications of NLG technology. This survey therefore aims to (a) give an up-to-date synthesis of research on the core tasks in NLG and the architectures adopted in which such tasks are organised; (b) highlight a number of relatively recent research topics that have arisen partly as a result of growing synergies between NLG and other areas of artificial intelligence; (c) draw attention to the challenges in NLG evaluation, relating them to similar challenges faced in other areas of Natural Language Processing, with an emphasis on different evaluation methods and the relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118 pages, 8 figures, 1 tabl

    An Investigation of Reading Development Through Sensitivity to Sublexical Units

    Get PDF
    The present dissertation provides a novel perspective to the study of reading, focusing on sensitivity to sublexical units across reading development. Work towards this thesis has been conducted at SISSA and Macquarie University. The first study is an eye tracking experiment on natural reading, with 140 developing readers and 33 adult participants, who silently read multiline passages from story books in Italian. A developmental database of eye tracking during natural reading was created, filling a gap in the literature. We replicated well-documented developmental trends of reading behavior (e.g., reading rate and skipping rate increasing with age) and effects of word length and frequency on eye tracking measures. The second study, in collaboration with Dr Jon Carr, is a methodological paper presenting algorithms for accuracy enhancement of eye tracking recordings in multiline reading. Using the above-mentioned dataset and computational simulations, we assessed the performance of several algorithms (including two novel methods that we proposed) on the correction of vertical drift, the progressive displacement of fixation registrations on the vertical axis over time. We provided guidance for eye tracking researchers in the application of these methods, and one of the novel algorithms (based on Dynamic Time Warping) proved particularly promising in realigning fixations, especially in child recordings. This manuscript has recently been accepted for publication in Behavior Research Methods. In the third study, I examined sensitivity to statistical regularities in letter co-occurrence throughout reading development, by analysing the effects of n-gram frequency metrics on eye-tracking measures. To this end, the EyeReadIt eye-tracking corpus (presented in the first study) was used. Our results suggest that n-gram frequency effects (in particular related to maximum/average frequency metrics) are present even in developing readers, suggesting that sensitivity to sublexical orthographic regularities in reading is present as soon as the developing reading system can pick it up \u2013 in the case of this study, as early as in third grade. The results bear relevant implications for extant theories of learning to read, which largely overlook the contribution of statistical learning to reading acquisition. The fourth study is a magnetoencephalography experiment conducted at Macquarie University, in collaboration with Dr Lisi Beyersmann, Prof Paul Sowman, and Prof Anne Castles, on 28 adults and 17 children (5th and 6th grade). We investigated selective neural responses to morphemes at different stages of reading development, using Fast Periodic Visual Stimulation (FPVS) combined with an oddball design. Participants were presented with rapid sequences (6 Hz) of pseudoword combinations of stem/nonstem and suffix/nonsuffix components. Interleaved in this stream, oddball stimuli appeared periodically every 5 items (1.2 Hz) and were specifically designed to examine stem or suffix detection (e.g., stem+suffix oddballs, such as softity, were embedded in a sequence of nonstem+suffix base items, such as terpity). We predicted that neural responses at the oddball stimulation frequency (1.2 Hz) would reflect the detection of morphemes in the oddball stimuli. Sensor-level analysis revealed a selective response in a left occipito-temporal region of interest when the oddball stimuli were fully decomposable pseudowords. This response emerged for adults and children alike, showing that automatic morpheme identification occurs at relatively early stages of reading development, in line with major accounts of morphological decomposition. Critically, these findings also suggest that morpheme identification is modulated by the context in which the morphemes appear

    CLiFF Notes: Research In Natural Language Processing at the University of Pennsylvania

    Get PDF
    The Computational Linguistics Feedback Forum (CLIFF) is a group of students and faculty who gather once a week to discuss the members\u27 current research. As the word feedback suggests, the group\u27s purpose is the sharing of ideas. The group also promotes interdisciplinary contacts between researchers who share an interest in Cognitive Science. There is no single theme describing the research in Natural Language Processing at Penn. There is work done in CCG, Tree adjoining grammars, intonation, statistical methods, plan inference, instruction understanding, incremental interpretation, language acquisition, syntactic parsing, causal reasoning, free word order languages, ... and many other areas. With this in mind, rather than trying to summarize the varied work currently underway here at Penn, we suggest reading the following abstracts to see how the students and faculty themselves describe their work. Their abstracts illustrate the diversity of interests among the researchers, explain the areas of common interest, and describe some very interesting work in Cognitive Science. This report is a collection of abstracts from both faculty and graduate students in Computer Science, Psychology and Linguistics. We pride ourselves on the close working relations between these groups, as we believe that the communication among the different departments and the ongoing inter-departmental research not only improves the quality of our work, but makes much of that work possible

    Large-Scale Pattern-Based Information Extraction from the World Wide Web

    Get PDF
    Extracting information from text is the task of obtaining structured, machine-processable facts from information that is mentioned in an unstructured manner. It thus allows systems to automatically aggregate information for further analysis, efficient retrieval, automatic validation, or appropriate visualization. This work explores the potential of using textual patterns for Information Extraction from the World Wide Web
    corecore