15,365 research outputs found

    Event-based Access to Historical Italian War Memoirs

    Full text link
    The progressive digitization of historical archives provides new, often domain specific, textual resources that report on facts and events which have happened in the past; among these, memoirs are a very common type of primary source. In this paper, we present an approach for extracting information from Italian historical war memoirs and turning it into structured knowledge. This is based on the semantic notions of events, participants and roles. We evaluate quantitatively each of the key-steps of our approach and provide a graph-based representation of the extracted knowledge, which allows to move between a Close and a Distant Reading of the collection.Comment: 23 pages, 6 figure

    Mining semantics for culturomics: towards a knowledge-based approach

    Get PDF
    The massive amounts of text data made available through the Google Books digitization project have inspired a new field of big-data textual research. Named culturomics, this field has attracted the attention of a growing number of scholars over recent years. However, initial studies based on these data have been criticized for not referring to relevant work in linguistics and language technology. This paper provides some ideas, thoughts and first steps towards a new culturomics initiative, based this time on Swedish data, which pursues a more knowledge-based approach than previous work in this emerging field. The amount of new Swedish text produced daily and older texts being digitized in cultural heritage projects grows at an accelerating rate. These volumes of text being available in digital form have grown far beyond the capacity of human readers, leaving automated semantic processing of the texts as the only realistic option for accessing and using the information contained in them. The aim of our recently initiated research program is to advance the state of the art in language technology resources and methods for semantic processing of Big Swedish text and focus on the theoretical and methodological advancement of the state of the art in extracting and correlating information from large volumes of Swedish text using a combination of knowledge-based and statistical methods

    Query Resolution for Conversational Search with Limited Supervision

    Get PDF
    In this work we focus on multi-turn passage retrieval as a crucial component of conversational search. One of the key challenges in multi-turn passage retrieval comes from the fact that the current turn query is often underspecified due to zero anaphora, topic change, or topic return. Context from the conversational history can be used to arrive at a better expression of the current turn query, defined as the task of query resolution. In this paper, we model the query resolution task as a binary term classification problem: for each term appearing in the previous turns of the conversation decide whether to add it to the current turn query or not. We propose QuReTeC (Query Resolution by Term Classification), a neural query resolution model based on bidirectional transformers. We propose a distant supervision method to automatically generate training data by using query-passage relevance labels. Such labels are often readily available in a collection either as human annotations or inferred from user interactions. We show that QuReTeC outperforms state-of-the-art models, and furthermore, that our distant supervision method can be used to substantially reduce the amount of human-curated data required to train QuReTeC. We incorporate QuReTeC in a multi-turn, multi-stage passage retrieval architecture and demonstrate its effectiveness on the TREC CAsT dataset.Comment: SIGIR 2020 full conference pape

    NarDis:Narrativizing Disruption -How exploratory search can support media researchers to interpret ‘disruptive’ media events as lucid narratives

    Get PDF
    This project investigates how CLARIAH’s exploratory search and linked open data (LO D) browser DIVE+ supports media researchers to construct narratives about events, especially ‘disruptive’ events such as terrorist attacks and natural disasters. This project approaches this question by conducting user studies to examine how researchers use and create narratives with exploratory search tools, particularly DIVE+, to understand media events. These user studies were organized as workshops (using co-creation as an iterative approach to map search practices and storytelling data, including: focus groups & interviews; tasks & talk aloud protocols; surveys/questionnaires; and research diaries) and included more than 100 (digital) humanities researchers across Europe. Insights from these workshops show that exploratory search does facilitate the development of new research questions around disruptive events. DIVE+ triggers academic curiosity, by suggesting alternative connections between entities. Beside learning about research practices of (digital) humanities researchers and how these can be supported with digital tools, the pilot also culminated in improvements to the DIVE+ browser. The pilot helped optimize the browser’s functionalities, making it possible for users to annotate paths of search narratives, and save these in CLARIAH’s overarching, personalised, user space. The pilot was widely promoted at (inter)national conferences, and DIVE+ won the international LO DLAM (Linked Open Data in Libraries, Archives and Museums) Challenge Grand Prize in Venice (2017)
    • …
    corecore