653 research outputs found
Script acquisition : a crowdsourcing and text mining approach
According to Grice’s (1975) theory of pragmatics, people tend to omit basic information when participating in a conversation (or writing a narrative) under the assumption that left out details are already known or can be inferred from commonsense knowledge by the hearer (or reader). Writing and understanding of texts makes particular use of a specific kind of common-sense knowledge, referred to as script knowledge. Schank and Abelson (1977) proposed Scripts as a model of human knowledge represented in memory that stores the frequent habitual activities, called scenarios, (e.g. eating in a fast food restaurant, etc.), and the different courses of action in those routines. This thesis addresses measures to provide a sound empirical basis for high-quality script models. We work on three key areas related to script modeling: script knowledge acquisition, script induction and script identification in text. We extend the existing repository of script knowledge bases in two different ways. First, we crowdsource a corpus of 40 scenarios with 100 event sequence descriptions (ESDs) each, thus going beyond the size of previous script collections. Second, the corpus is enriched with partial alignments of ESDs, done by human annotators. The crowdsourced partial alignments are used as prior knowledge to guide the semi-supervised script-induction algorithm proposed in this dissertation. We further present a semi-supervised clustering approach to induce script structure from crowdsourced descriptions of event sequences by grouping event descriptions into paraphrase sets and inducing their temporal order. The proposed semi-supervised clustering model better handles order variation in scripts and extends script representation formalism, Temporal Script graphs, by incorporating "arbitrary order" equivalence classes in order to allow for the flexible event order inherent in scripts. In the third part of this dissertation, we introduce the task of scenario detection, in which we identify references to scripts in narrative texts. We curate a benchmark dataset of annotated narrative texts, with segments labeled according to the scripts they instantiate. The dataset is the first of its kind. The analysis of the annotation shows that one can identify scenario references in text with reasonable reliability. Subsequently, we proposes a benchmark model that automatically segments and identifies text fragments referring to given scenarios. The proposed model achieved promising results, and therefore opens up research on script parsing and wide coverage script acquisition.Gemäß der Grice’schen (1975) Pragmatiktheorie neigen Menschen dazu, grundlegende Informationen auszulassen, wenn sie an einem Gespräch teilnehmen (oder eine Geschichte schreiben). Dies geschieht unter der Annahme, dass die ausgelassenen Details bereits bekannt sind, oder vom Hörer (oder Leser) aus Weltwissen erschlossen werden können. Besonders beim Schreiben und Verstehen von Text wird Verwendung einer spezifischen Art von solchem Weltwissen gemacht, welches auch Skriptwissen genannt wird. Schank und Abelson (1977) erdachten Skripte als ein Modell menschlichen Wissens, welches im menschlichen Gedächtnis gespeichert ist und häufige Alltags-Aktivitäten sowie deren typischen Ablauf beinhaltet. Solche Skript-Aktivitäten werden auch als Szenarios bezeichnet und umfassen zum Beispiel Im Restaurant Essen etc. Diese Dissertation widmet sich der Bereitstellung einer soliden empirischen Grundlage zur Akquisition qualitativ hochwertigen Skriptwissens. Wir betrachten drei zentrale Aspekte im Bereich der Skriptmodellierung: Akquisition ition von Skriptwissen, Skript-Induktion und Skriptidentifizierung in Text. Wir erweitern das bereits bestehende Repertoire und Skript-Datensätzen in 2 Bereichen. Erstens benutzen wir Crowdsourcing zur Erstellung eines Korpus, das 40 Szenarien mit jeweils 100 Ereignissequenzbeschreibungen (Event Sequence Descriptions, ESDs) beinhaltet, und welches somit größer als bestehende Skript- Datensätze ist. Zweitens erweitern wir das Korpus mit partiellen ESD-Alignierungen, die von Hand annotiert werden. Die partiellen Alignierungen werden dann als Vorwissen für einen halbüberwachten Algorithmus zur Skriptinduktion benutzt, der im Rahmen dieser Dissertation vorgestellt wird. Wir präsentieren außerdem einen halbüberwachten Clusteringansatz zur Induktion von Skripten, basierend auf Ereignissequenzen, die via Crowdsourcing gesammelt wurden. Hierbei werden einzelne Ereignisbeschreibungen gruppiert, um Paraphrasenmengen und der deren temporale Ordnung abzuleiten. Der vorgestellte Clusteringalgorithmus ist im Stande, Variationen in der typischen Reihenfolge in Skripte besser abzubilden und erweitert damit einen Formalismus zur Skriptrepräsentation, temporale Skriptgraphen. Dies wird dadurch bewerkstelligt, dass Equivalenzklassen von Beschreibungen mit "arbiträrer Reihenfolge" genutzt werden, die es erlauben, eine flexible Ereignisordnung abzubilden, die inhärent bei Skripten vorhanden ist. Im dritten Teil der vorliegenden Arbeit führen wir den Task der SzenarioIdentifikation ein, also der automatischen Identifikation von Skriptreferenzen in narrativen Texten. Wir erstellen einen Benchmark-Datensatz mit annotierten narrativen Texten, in denen einzelne Segmente im Bezug auf das Skript, welches sie instantiieren, markiert wurden. Dieser Datensatz ist der erste seiner Art. Eine Analyse der Annotation zeigt, dass Referenzen zu Szenarien im Text mit annehmbarer Akkuratheit vorhergesagt werden können. Zusätzlich stellen wir ein Benchmark-Modell vor, welches Textfragmente automatisch erstellt und deren Szenario identifiziert. Das vorgestellte Modell erreicht erfolgversprechende Resultate und öffnet damit einen Forschungszweig im Bereich des Skript-Parsens und der Skript-Akquisition im großen Stil
Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding
Computer vision has a great potential to help our daily lives by searching
for lost keys, watering flowers or reminding us to take a pill. To succeed with
such tasks, computer vision methods need to be trained from real and diverse
examples of our daily dynamic scenes. While most of such scenes are not
particularly exciting, they typically do not appear on YouTube, in movies or TV
broadcasts. So how do we collect sufficiently many diverse but boring samples
representing our lives? We propose a novel Hollywood in Homes approach to
collect such data. Instead of shooting videos in the lab, we ensure diversity
by distributing and crowdsourcing the whole process of video creation from
script writing to video recording and annotation. Following this procedure we
collect a new dataset, Charades, with hundreds of people recording videos in
their own homes, acting out casual everyday activities. The dataset is composed
of 9,848 annotated videos with an average length of 30 seconds, showing
activities of 267 people from three continents. Each video is annotated by
multiple free-text descriptions, action labels, action intervals and classes of
interacted objects. In total, Charades provides 27,847 video descriptions,
66,500 temporally localized intervals for 157 action classes and 41,104 labels
for 46 object classes. Using this rich data, we evaluate and provide baseline
results for several tasks including action recognition and automatic
description generation. We believe that the realism, diversity, and casual
nature of this dataset will present unique challenges and new opportunities for
computer vision community
Are NLP Models Good at Tracing Thoughts: An Overview of Narrative Understanding
Narrative understanding involves capturing the author's cognitive processes,
providing insights into their knowledge, intentions, beliefs, and desires.
Although large language models (LLMs) excel in generating grammatically
coherent text, their ability to comprehend the author's thoughts remains
uncertain. This limitation hinders the practical applications of narrative
understanding. In this paper, we conduct a comprehensive survey of narrative
understanding tasks, thoroughly examining their key features, definitions,
taxonomy, associated datasets, training objectives, evaluation metrics, and
limitations. Furthermore, we explore the potential of expanding the
capabilities of modularized LLMs to address novel narrative understanding
tasks. By framing narrative understanding as the retrieval of the author's
imaginative cues that outline the narrative structure, our study introduces a
fresh perspective on enhancing narrative comprehension
A Systematic Survey of ML Datasets for Prime CV Research Areas-Media and Metadata
The ever-growing capabilities of computers have enabled pursuing Computer Vision through Machine Learning (i.e., MLCV). ML tools require large amounts of information to learn from (ML datasets). These are costly to produce but have received reduced attention regarding standardization. This prevents the cooperative production and exploitation of these resources, impedes countless synergies, and hinders ML research. No global view exists of the MLCV dataset tissue. Acquiring it is fundamental to enable standardization. We provide an extensive survey of the evolution and current state of MLCV datasets (1994 to 2019) for a set of specific CV areas as well as a quantitative and qualitative analysis of the results. Data were gathered from online scientific databases (e.g., Google Scholar, CiteSeerX). We reveal the heterogeneous plethora that comprises the MLCV dataset tissue; their continuous growth in volume and complexity; the specificities of the evolution of their media and metadata components regarding a range of aspects; and that MLCV progress requires the construction of a global standardized (structuring, manipulating, and sharing) MLCV "library". Accordingly, we formulate a novel interpretation of this dataset collective as a global tissue of synthetic cognitive visual memories and define the immediately necessary steps to advance its standardization and integration
Recommended from our members
Advances in statistical script learning
When humans encode information into natural language, they do so with the
clear assumption that the reader will be able to seamlessly make inferences
based on world knowledge. For example, given the sentence ``Mrs. Dalloway said
she would buy the flowers herself,'' one can make a number of probable
inferences based on event co-occurrences: she bought flowers, she went to a
store, she took the flowers home, and so on.
Observing this, it is clear that many different useful natural language
end-tasks could benefit from models of events as they typically co-occur
(so-called script models).
Robust question-answering systems must be able to infer highly-probable implicit
events from what is explicitly stated in a text, as must robust
information-extraction systems that map from unstructured text to formal
assertions about relations expressed in the text. Coreference resolution
systems, semantic role labeling, and even syntactic parsing systems could, in
principle, benefit from event co-occurrence models.
To this end, we present a number of contributions related to statistical
event co-occurrence models. First, we investigate a method of incorporating
multiple entities into events in a count-based co-occurrence model. We find that
modeling multiple entities interacting across events allows for improved
empirical performance on the task of modeling sequences of events in documents.
Second, we give a method of applying Recurrent Neural Network sequence models
to the task of predicting held-out predicate-argument structures from documents.
This model allows us to easily incorporate entity noun information, and can
allow for more complex, higher-arity events than a count-based co-occurrence
model. We find the neural model improves performance considerably over the
count-based co-occurrence model.
Third, we investigate the performance of a sequence-to-sequence encoder-decoder
neural model on the task of predicting held-out predicate-argument events from
text. This model does not explicitly model any external syntactic information,
and does not require a parser. We find the text-level model to be competitive in
predictive performance with an event level model directly mediated by an
external syntactic analysis.
Finally, motivated by this result, we investigate incorporating features derived
from these models into a baseline noun coreference resolution system. We find
that, while our additional features do not appreciably improve top-level
performance, we can nonetheless provide empirical improvement on a number of
restricted classes of difficult coreference decisions.Computer Science
Citizen Science and Geospatial Capacity Building
This book is a collection of the articles published the Special Issue of ISPRS International Journal of Geo-Information on “Citizen Science and Geospatial Capacity Building”. The articles cover a wide range of topics regarding the applications of citizen science from a geospatial technology perspective. Several applications show the importance of Citizen Science (CitSci) and volunteered geographic information (VGI) in various stages of geodata collection, processing, analysis and visualization; and for demonstrating the capabilities, which are covered in the book. Particular emphasis is given to various problems encountered in the CitSci and VGI projects with a geospatial aspect, such as platform, tool and interface design, ontology development, spatial analysis and data quality assessment. The book also points out the needs and future research directions in these subjects, such as; (a) data quality issues especially in the light of big data; (b) ontology studies for geospatial data suited for diverse user backgrounds, data integration, and sharing; (c) development of machine learning and artificial intelligence based online tools for pattern recognition and object identification using existing repositories of CitSci and VGI projects; and (d) open science and open data practices for increasing the efficiency, decreasing the redundancy, and acknowledgement of all stakeholders
- …