127 research outputs found

    A Provably Improved Algorithm for Crowdsourcing with Hard and Easy Tasks

    Full text link
    Crowdsourcing is a popular method used to estimate ground-truth labels by collecting noisy labels from workers. In this work, we are motivated by crowdsourcing applications where each worker can exhibit two levels of accuracy depending on a task's type. Applying algorithms designed for the traditional Dawid-Skene model to such a scenario results in performance which is limited by the hard tasks. Therefore, we first extend the model to allow worker accuracy to vary depending on a task's unknown type. Then we propose a spectral method to partition tasks by type. After separating tasks by type, any Dawid-Skene algorithm (i.e., any algorithm designed for the Dawid-Skene model) can be applied independently to each type to infer the truth values. We theoretically prove that when crowdsourced data contain tasks with varying levels of difficulty, our algorithm infers the true labels with higher accuracy than any Dawid-Skene algorithm. Experiments show that our method is effective in practical applications

    Script acquisition : a crowdsourcing and text mining approach

    Get PDF
    According to Grice’s (1975) theory of pragmatics, people tend to omit basic information when participating in a conversation (or writing a narrative) under the assumption that left out details are already known or can be inferred from commonsense knowledge by the hearer (or reader). Writing and understanding of texts makes particular use of a specific kind of common-sense knowledge, referred to as script knowledge. Schank and Abelson (1977) proposed Scripts as a model of human knowledge represented in memory that stores the frequent habitual activities, called scenarios, (e.g. eating in a fast food restaurant, etc.), and the different courses of action in those routines. This thesis addresses measures to provide a sound empirical basis for high-quality script models. We work on three key areas related to script modeling: script knowledge acquisition, script induction and script identification in text. We extend the existing repository of script knowledge bases in two different ways. First, we crowdsource a corpus of 40 scenarios with 100 event sequence descriptions (ESDs) each, thus going beyond the size of previous script collections. Second, the corpus is enriched with partial alignments of ESDs, done by human annotators. The crowdsourced partial alignments are used as prior knowledge to guide the semi-supervised script-induction algorithm proposed in this dissertation. We further present a semi-supervised clustering approach to induce script structure from crowdsourced descriptions of event sequences by grouping event descriptions into paraphrase sets and inducing their temporal order. The proposed semi-supervised clustering model better handles order variation in scripts and extends script representation formalism, Temporal Script graphs, by incorporating "arbitrary order" equivalence classes in order to allow for the flexible event order inherent in scripts. In the third part of this dissertation, we introduce the task of scenario detection, in which we identify references to scripts in narrative texts. We curate a benchmark dataset of annotated narrative texts, with segments labeled according to the scripts they instantiate. The dataset is the first of its kind. The analysis of the annotation shows that one can identify scenario references in text with reasonable reliability. Subsequently, we proposes a benchmark model that automatically segments and identifies text fragments referring to given scenarios. The proposed model achieved promising results, and therefore opens up research on script parsing and wide coverage script acquisition.Gemäß der Grice’schen (1975) Pragmatiktheorie neigen Menschen dazu, grundlegende Informationen auszulassen, wenn sie an einem Gespräch teilnehmen (oder eine Geschichte schreiben). Dies geschieht unter der Annahme, dass die ausgelassenen Details bereits bekannt sind, oder vom Hörer (oder Leser) aus Weltwissen erschlossen werden können. Besonders beim Schreiben und Verstehen von Text wird Verwendung einer spezifischen Art von solchem Weltwissen gemacht, welches auch Skriptwissen genannt wird. Schank und Abelson (1977) erdachten Skripte als ein Modell menschlichen Wissens, welches im menschlichen Gedächtnis gespeichert ist und häufige Alltags-Aktivitäten sowie deren typischen Ablauf beinhaltet. Solche Skript-Aktivitäten werden auch als Szenarios bezeichnet und umfassen zum Beispiel Im Restaurant Essen etc. Diese Dissertation widmet sich der Bereitstellung einer soliden empirischen Grundlage zur Akquisition qualitativ hochwertigen Skriptwissens. Wir betrachten drei zentrale Aspekte im Bereich der Skriptmodellierung: Akquisition ition von Skriptwissen, Skript-Induktion und Skriptidentifizierung in Text. Wir erweitern das bereits bestehende Repertoire und Skript-Datensätzen in 2 Bereichen. Erstens benutzen wir Crowdsourcing zur Erstellung eines Korpus, das 40 Szenarien mit jeweils 100 Ereignissequenzbeschreibungen (Event Sequence Descriptions, ESDs) beinhaltet, und welches somit größer als bestehende Skript- Datensätze ist. Zweitens erweitern wir das Korpus mit partiellen ESD-Alignierungen, die von Hand annotiert werden. Die partiellen Alignierungen werden dann als Vorwissen für einen halbüberwachten Algorithmus zur Skriptinduktion benutzt, der im Rahmen dieser Dissertation vorgestellt wird. Wir präsentieren außerdem einen halbüberwachten Clusteringansatz zur Induktion von Skripten, basierend auf Ereignissequenzen, die via Crowdsourcing gesammelt wurden. Hierbei werden einzelne Ereignisbeschreibungen gruppiert, um Paraphrasenmengen und der deren temporale Ordnung abzuleiten. Der vorgestellte Clusteringalgorithmus ist im Stande, Variationen in der typischen Reihenfolge in Skripte besser abzubilden und erweitert damit einen Formalismus zur Skriptrepräsentation, temporale Skriptgraphen. Dies wird dadurch bewerkstelligt, dass Equivalenzklassen von Beschreibungen mit "arbiträrer Reihenfolge" genutzt werden, die es erlauben, eine flexible Ereignisordnung abzubilden, die inhärent bei Skripten vorhanden ist. Im dritten Teil der vorliegenden Arbeit führen wir den Task der SzenarioIdentifikation ein, also der automatischen Identifikation von Skriptreferenzen in narrativen Texten. Wir erstellen einen Benchmark-Datensatz mit annotierten narrativen Texten, in denen einzelne Segmente im Bezug auf das Skript, welches sie instantiieren, markiert wurden. Dieser Datensatz ist der erste seiner Art. Eine Analyse der Annotation zeigt, dass Referenzen zu Szenarien im Text mit annehmbarer Akkuratheit vorhergesagt werden können. Zusätzlich stellen wir ein Benchmark-Modell vor, welches Textfragmente automatisch erstellt und deren Szenario identifiziert. Das vorgestellte Modell erreicht erfolgversprechende Resultate und öffnet damit einen Forschungszweig im Bereich des Skript-Parsens und der Skript-Akquisition im großen Stil

    Scalable Methods to Collect and Visualize Sidewalk Accessibility Data for People with Mobility Impairments

    Get PDF
    Poorly maintained sidewalks pose considerable accessibility challenges for people with mobility impairments. Despite comprehensive civil rights legislation of Americans with Disabilities Act, many city streets and sidewalks in the U.S. remain inaccessible. The problem is not just that sidewalk accessibility fundamentally affects where and how people travel in cities, but also that there are few, if any, mechanisms to determine accessible areas of a city a priori. To address this problem, my Ph.D. dissertation introduces and evaluates new scalable methods for collecting data about street-level accessibility using a combination of crowdsourcing, automated methods, and Google Street View (GSV). My dissertation has four research threads. First, we conduct a formative interview study to establish a better understanding of how people with mobility impairments currently assess accessibility in the built environment and the role of emerging location-based technologies therein. The study uncovers the existing methods for assessing accessibility of physical environment and identify useful features of future assistive technologies. Second, we develop and evaluate scalable crowdsourced accessibility data collection methods. We show that paid crowd workers recruited from an online labor marketplace can find and label accessibility attributes in GSV with accuracy of 81%. This accuracy improves to 93% with quality control mechanisms such as majority vote. Third, we design a system that combines crowdsourcing and automated methods to increase data collection efficiency. Our work shows that by combining crowdsourcing and automated methods, we can increase data collection efficiency by 13% without sacrificing accuracy. Fourth, we develop and deploy a web tool that lets volunteers to help us collect the street-level accessibility data from Washington, D.C. As of writing this dissertation, we have collected the accessibility data from 20% of the streets in D.C. We conduct a preliminary evaluation on how the said web tool is used. Finally, we implement proof-of-concept accessibility-aware applications with accessibility data collected with the help of volunteers. My dissertation contributes to the accessibility, computer science, and HCI communities by: (i) extending the knowledge of how people with mobility impairments interact with technology to navigate in cities; (ii) introducing the first work that demonstrates that GSV is a viable source for learning about the accessibility of the physical world; (iii) introducing the first method that combines crowdsourcing and automated methods to remotely collect accessibility information; (iv) deploying interactive web tools that allow volunteers to help populate the largest dataset about street-level accessibility of the world; and (v) demonstrating accessibility-aware applications that empower people with mobility impairments

    Hyperloop Transportation Technologies: Practices for Open Organizing Across VUCA Context

    Get PDF
    Open organizations are structures in which members of the public engage in work for the organization. Examples include open-source software, Amnesty International, Wikipedia, and Lego communities. Much research focuses on structural design characteristics of open organizations, such as pre-specified task divisions and integration teams. These practices require the organization to a priori structure in response to its mission. Increasingly, however, open organizations like CrowdDoing and Hyperloop Transportation Technologies (HyperloopTT) require public involvement across volatile, uncertain, complex, ambiguous (VUCA) contexts. These open organizations must respond to changing political, competitive, and socio-economic events. Structural clarity is more difficult, and contributors may participate in the creative development of new technologies, new policies, and new sources of funding. Working from practices supporting participant engagement in more stable environments, we qualitatively observe HyperloopTT to understand internal practices for open organizing in more VUCA contexts. We observe four practices allowing for the flexibility, versatility, and accommodations needed for open organizing in such settings. The HyperloopTT practices allow more porosity and self-determination — not simply in how people divide and integrate tasks but also in the exploration and experimentation of the work itself. More than task workers, we see a new class of open organizing participants: creative work designers

    FATREC Workshop on Responsible Recommendation Proceedings

    Get PDF
    We sought with this workshop, to foster a discussion of various topics that fall under the general umbrella of responsible recommendation: ethical considerations in recommendation, bias and discrimination in recommender systems, transparency and accountability, social impact of recommenders, user privacy, and other related concerns. Our goal was to encourage the community to think about how we build and study recommender systems in a socially-responsible manner. Recommendation systems are increasingly impacting people\u27s decisions in different walks of life including commerce, employment, dating, health, education and governance. As the impact and scope of recommendations increase, developing systems that tackle issues of fairness, transparency and accountability becomes important. This workshop was held in the spirit of FATML (Fairness, Accountability, and Transparency in Machine Learning), DAT (Data and Algorithmic Transparency), and similar workshops in related communities. With Responsible Recommendation , we brought that conversation to RecSys

    Changing the focus: worker-centric optimization in human-in-the-loop computations

    Get PDF
    A myriad of emerging applications from simple to complex ones involve human cognizance in the computation loop. Using the wisdom of human workers, researchers have solved a variety of problems, termed as “micro-tasks” such as, captcha recognition, sentiment analysis, image categorization, query processing, as well as “complex tasks” that are often collaborative, such as, classifying craters on planetary surfaces, discovering new galaxies (Galaxyzoo), performing text translation. The current view of “humans-in-the-loop” tends to see humans as machines, robots, or low-level agents used or exploited in the service of broader computation goals. This dissertation is developed to shift the focus back to humans, and study different data analytics problems, by recognizing characteristics of the human workers, and how to incorporate those in a principled fashion inside the computation loop. The first contribution of this dissertation is to propose an optimization framework and a real world system to personalize worker’s behavior by developing a worker model and using that to better understand and estimate task completion time. The framework judiciously frames questions and solicits worker feedback on those to update the worker model. Next, improving workers skills through peer interaction during collaborative task completion is studied. A suite of optimization problems are identified in that context considering collaborativeness between the members as it plays a major role in peer learning. Finally, “diversified” sequence of work sessions for human workers is designed to improve worker satisfaction and engagement while completing tasks

    Knowledge Graphs Evolution and Preservation -- A Technical Report from ISWS 2019

    Get PDF
    One of the grand challenges discussed during the Dagstuhl Seminar "Knowledge Graphs: New Directions for Knowledge Representation on the Semantic Web" and described in its report is that of a: "Public FAIR Knowledge Graph of Everything: We increasingly see the creation of knowledge graphs that capture information about the entirety of a class of entities. [...] This grand challenge extends this further by asking if we can create a knowledge graph of "everything" ranging from common sense concepts to location based entities. This knowledge graph should be "open to the public" in a FAIR manner democratizing this mass amount of knowledge." Although linked open data (LOD) is one knowledge graph, it is the closest realisation (and probably the only one) to a public FAIR Knowledge Graph (KG) of everything. Surely, LOD provides a unique testbed for experimenting and evaluating research hypotheses on open and FAIR KG. One of the most neglected FAIR issues about KGs is their ongoing evolution and long term preservation. We want to investigate this problem, that is to understand what preserving and supporting the evolution of KGs means and how these problems can be addressed. Clearly, the problem can be approached from different perspectives and may require the development of different approaches, including new theories, ontologies, metrics, strategies, procedures, etc. This document reports a collaborative effort performed by 9 teams of students, each guided by a senior researcher as their mentor, attending the International Semantic Web Research School (ISWS 2019). Each team provides a different perspective to the problem of knowledge graph evolution substantiated by a set of research questions as the main subject of their investigation. In addition, they provide their working definition for KG preservation and evolution
    • …
    corecore