15,365 research outputs found
Event-based Access to Historical Italian War Memoirs
The progressive digitization of historical archives provides new, often
domain specific, textual resources that report on facts and events which have
happened in the past; among these, memoirs are a very common type of primary
source. In this paper, we present an approach for extracting information from
Italian historical war memoirs and turning it into structured knowledge. This
is based on the semantic notions of events, participants and roles. We evaluate
quantitatively each of the key-steps of our approach and provide a graph-based
representation of the extracted knowledge, which allows to move between a Close
and a Distant Reading of the collection.Comment: 23 pages, 6 figure
Mining semantics for culturomics: towards a knowledge-based approach
The massive amounts of text data made available through the Google Books digitization project have inspired a new field of big-data textual research. Named culturomics, this field has attracted the attention of a growing number of scholars over recent years. However, initial studies based on these data have been criticized for not referring to relevant work in linguistics and language technology. This paper provides some ideas, thoughts and first steps towards a new culturomics initiative, based this time on Swedish data, which pursues a more knowledge-based approach than previous work in this emerging field. The amount of new Swedish text produced daily and older texts being digitized in cultural heritage projects grows at an accelerating rate. These volumes of text being available in digital form have grown far beyond the capacity of human readers, leaving automated semantic processing of the texts as the only realistic option for accessing and using the information contained in them. The aim of our recently initiated research program is to advance the state of the art in language technology resources and methods for semantic processing of Big Swedish text and focus on the theoretical and methodological advancement of the state of the art in extracting and correlating information from large volumes of Swedish text using a combination of knowledge-based and statistical methods
Query Resolution for Conversational Search with Limited Supervision
In this work we focus on multi-turn passage retrieval as a crucial component
of conversational search. One of the key challenges in multi-turn passage
retrieval comes from the fact that the current turn query is often
underspecified due to zero anaphora, topic change, or topic return. Context
from the conversational history can be used to arrive at a better expression of
the current turn query, defined as the task of query resolution. In this paper,
we model the query resolution task as a binary term classification problem: for
each term appearing in the previous turns of the conversation decide whether to
add it to the current turn query or not. We propose QuReTeC (Query Resolution
by Term Classification), a neural query resolution model based on bidirectional
transformers. We propose a distant supervision method to automatically generate
training data by using query-passage relevance labels. Such labels are often
readily available in a collection either as human annotations or inferred from
user interactions. We show that QuReTeC outperforms state-of-the-art models,
and furthermore, that our distant supervision method can be used to
substantially reduce the amount of human-curated data required to train
QuReTeC. We incorporate QuReTeC in a multi-turn, multi-stage passage retrieval
architecture and demonstrate its effectiveness on the TREC CAsT dataset.Comment: SIGIR 2020 full conference pape
NarDis:Narrativizing Disruption -How exploratory search can support media researchers to interpret ‘disruptive’ media events as lucid narratives
This project investigates how CLARIAH’s exploratory search and linked open data (LO D) browser DIVE+ supports media researchers to construct narratives about events, especially ‘disruptive’ events such as terrorist attacks and natural disasters. This project approaches this question by conducting user studies to examine how researchers use and create narratives with exploratory search tools, particularly DIVE+, to understand media events. These user studies were organized as workshops (using co-creation as an iterative approach to map search practices and storytelling data, including: focus groups & interviews; tasks & talk aloud protocols; surveys/questionnaires; and research diaries) and included more than 100 (digital) humanities researchers across Europe. Insights from these workshops show that exploratory search does facilitate the development of new research questions around disruptive events. DIVE+ triggers academic curiosity, by suggesting alternative connections between entities. Beside learning about research practices of (digital) humanities researchers and how these can be supported with digital tools, the pilot also culminated in improvements to the DIVE+ browser. The pilot helped optimize the browser’s functionalities, making it possible for users to annotate paths of search narratives, and save these in CLARIAH’s overarching, personalised, user space. The pilot was widely promoted at (inter)national conferences, and DIVE+ won the international LO DLAM (Linked Open Data in Libraries, Archives and Museums) Challenge Grand Prize in Venice (2017)
- …