9 research outputs found

    Wikification of Concept Mentions within Spoken Dialogues Using Domain Constraints from Wikipedia

    Get PDF
    Abstract While most previous work on Wikification has focused on written texts, this paper presents a Wikification approach for spoken dialogues. A set of analyzers are proposed to learn dialogue-specific properties along with domain knowledge of conversations from Wikipedia. Then, the analyzed properties are used as constraints for generating candidates, and the candidates are ranked to find the appropriate links. The experimental results show that our proposed approach can significantly improve the performances of the task in human-human dialogues

    Coreference resolution with and for Wikipedia

    Get PDF
    Wikipédia est une ressource embarquée dans de nombreuses applications du traite- ment des langues naturelles. Pourtant, aucune étude à notre connaissance n’a tenté de mesurer la qualité de résolution de coréférence dans les textes de Wikipédia, une étape préliminaire à la compréhension de textes. La première partie de ce mémoire consiste à construire un corpus de coréférence en anglais, construit uniquement à partir des articles de Wikipédia. Les mentions sont étiquetées par des informations syntaxiques et séman- tiques, avec lorsque cela est possible un lien vers les entités FreeBase équivalentes. Le but est de créer un corpus équilibré regroupant des articles de divers sujets et tailles. Notre schéma d’annotation est similaire à celui suivi dans le projet OntoNotes. Dans la deuxième partie, nous allons mesurer la qualité des systèmes de détection de coréférence à l’état de l’art sur une tâche simple consistant à mesurer les mentions du concept décrit dans une page Wikipédia (p. ex : les mentions du président Obama dans la page Wiki- pédia dédiée à cette personne). Nous tenterons d’améliorer ces performances en faisant usage le plus possible des informations disponibles dans Wikipédia (catégories, redi- rects, infoboxes, etc.) et Freebase (information du genre, du nombre, type de relations avec autres entités, etc.).Wikipedia is a resource of choice exploited in many NLP applications, yet we are not aware of recent attempts to adapt coreference resolution to this resource, a prelim- inary step to understand Wikipedia texts. The first part of this master thesis is to build an English coreference corpus, where all documents are from the English version of Wikipedia. We annotated each markable with coreference type, mention type and the equivalent Freebase topic. Our corpus has no restriction on the topics of the documents being annotated, and documents of various sizes have been considered for annotation. Our annotation scheme follows the one of OntoNotes with a few disparities. In part two, we propose a testbed for evaluating coreference systems in a simple task of measuring the particulars of the concept described in a Wikipedia page (eg. The statements of Pres- ident Obama the Wikipedia page dedicated to that person). We show that by exploiting the Wikipedia markup (categories, redirects, infoboxes, etc.) of a document, as well as links to external knowledge bases such as Freebase (information of the type, num- ber, type of relationship with other entities, etc.), we can acquire useful information on entities that helps to classify mentions as coreferent or not

    Wikification of Concept Mentions within Spoken Dialogues Using Domain Constraints from Wikipedia

    No full text
    While most previous work onWikification has focused on written texts, this paper presents a Wikification approach for spo-ken dialogues. A set of analyzers are pro-posed to learn dialogue-specific properties along with domain knowledge of conver-sations from Wikipedia. Then, the an-alyzed properties are used as constraints for generating candidates, and the candi-dates are ranked to find the appropriate links. The experimental results show that our proposed approach can significantly improve the performances of the task in human-human dialogues.

    CLARIN

    Get PDF
    The book provides a comprehensive overview of the Common Language Resources and Technology Infrastructure – CLARIN – for the humanities. It covers a broad range of CLARIN language resources and services, its underlying technological infrastructure, the achievements of national consortia, and challenges that CLARIN will tackle in the future. The book is published 10 years after establishing CLARIN as an Europ. Research Infrastructure Consortium

    Mapping Crisis

    Get PDF
    The digital age has thrown questions of representation, participation and humanitarianism back to the fore, as machine learning, algorithms and big data centres take over the process of mapping the subjugated and subaltern. Since the rise of Google Earth in 2005, there has been an explosion in the use of mapping tools to quantify and assess the needs of those in crisis, including those affected by climate change and the wider neo-liberal agenda. Yet, while there has been a huge upsurge in the data produced around these issues, the representation of people remains questionable. Some have argued that representation has diminished in humanitarian crises as people are increasingly reduced to data points. In turn, this data has become ever more difficult to analyse without vast computing power, leading to a dependency on the old colonial powers to refine the data collected from people in crisis, before selling it back to them. This book brings together critical perspectives on the role that mapping people, knowledges and data now plays in humanitarian work, both in cartographic terms and through data visualisations, and questions whether, as we map crises, it is the map itself that is in crisis

    Snap, pan, zoom, click, grab, and the embodied archive of geographic information systems

    Get PDF
    The aim of this thesis is to critically interrogate the question of ‘what is’ Geographical Information Systems (GIS) from an arts and humanities perspective, and to contribute to the emergence of what scholars have called a ‘third stage’, or ‘creative’ GIS. A significant element of this thesis is a practice-based research component that allowed for unpredictable avenues to emerge as the research unfolded, and the cultivation of an experimental approach that ‘tinkered’ with objects of inquiry regardless of preconceived outcomes. I begin with a critical assessment of the conceptual heritage of GIS, and related debates that situate GIS in the context of digital technologies and objects, structuralist, humanist and post-humanist geographic literatures on practice, and creativity as a productive geographic practice, before offering the notion of the ‘archive’ as a productive means of framing and interrogating GIS. In order to understand the doing of GIS, field studies were conducted to investigate what it means to learn and become immersed in GIS. I deployed more established social science methods at several sites, such as interviewing and participant observation, supplemented with auto-ethnographic accounts. From here, I sought to investigate how my own creative practice brought something new to the study of GIS, working through an abundance of materials, insights, and feelings amassed over the course of the PhD. Several artworks were created to tease-out, distil, and probe the aesthetic qualities of GIS that had become known to me throughout the PhD. This was a matter of ‘interfacing’, between GIS as broad discipline and my creative and aesthetic sensibilities and determining how my singular approach could recast our understanding of what GIS indeed is. This thesis renders GIS not only as a tool, as a means of producing geographic knowledge according ontologies past and present, but as a set of practices that the user takes part in, and asserts his or her agency, but also must surrender themselves (at least in part) to the agency manifest through GIS as a historically, socially, and technologically produced mechanism. The practices involved in GIS are not just productive to particular ends, such as map making. The emotional dispositions, frustrations, anxieties, affective atmospheres of GIS practice produce a material and embodied residue that must be taken into consideration when we consider what GIS is. The thesis thus concludes with a proposal for a curated exhibition to ‘open up’ the dissemination of the thesis beyond the page and provide some sense of the what of GIS via other mediums. This curated installation offers a moment of closure for the project, as a culmination, a coming together of many of the materials built up and collected during the project
    corecore