20,350 research outputs found

    Building Entity-Centric Event Collections

    Full text link
    Web archives preserve an unprecedented abundance of materials regarding major events and transformations in our society. In this paper, we present an approach for building event-centric sub-collections from such large archives, which includes not only the core documents related to the event itself but, even more importantly, documents describing related aspects (e.g., premises and consequences). This is achieved by 1) identifying relevant concepts and entities from a knowledge base, and 2) detecting their mentions in documents, which are interpreted as indicators for relevance. We extensively evaluate our system on two diachronic corpora, the New York Times Corpus and the US Congressional Record, and we test its performance on the TREC KBA Stream corpus, a large and publicly available web archive

    TopExNet: Entity-Centric Network Topic Exploration in News Streams

    Full text link
    The recent introduction of entity-centric implicit network representations of unstructured text offers novel ways for exploring entity relations in document collections and streams efficiently and interactively. Here, we present TopExNet as a tool for exploring entity-centric network topics in streams of news articles. The application is available as a web service at https://topexnet.ifi.uni-heidelberg.de/ .Comment: Published in Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, WSDM 2019, Melbourne, VIC, Australia, February 11-15, 201

    Event-based Access to Historical Italian War Memoirs

    Full text link
    The progressive digitization of historical archives provides new, often domain specific, textual resources that report on facts and events which have happened in the past; among these, memoirs are a very common type of primary source. In this paper, we present an approach for extracting information from Italian historical war memoirs and turning it into structured knowledge. This is based on the semantic notions of events, participants and roles. We evaluate quantitatively each of the key-steps of our approach and provide a graph-based representation of the extracted knowledge, which allows to move between a Close and a Distant Reading of the collection.Comment: 23 pages, 6 figure

    Towards Better Understanding Researcher Strategies in Cross-Lingual Event Analytics

    Full text link
    With an increasing amount of information on globally important events, there is a growing demand for efficient analytics of multilingual event-centric information. Such analytics is particularly challenging due to the large amount of content, the event dynamics and the language barrier. Although memory institutions increasingly collect event-centric Web content in different languages, very little is known about the strategies of researchers who conduct analytics of such content. In this paper we present researchers' strategies for the content, method and feature selection in the context of cross-lingual event-centric analytics observed in two case studies on multilingual Wikipedia. We discuss the influence factors for these strategies, the findings enabled by the adopted methods along with the current limitations and provide recommendations for services supporting researchers in cross-lingual event-centric analytics.Comment: In Proceedings of the International Conference on Theory and Practice of Digital Libraries 201

    Online event-based conservation documentation: A case study from the IIC website

    Full text link
    There is a wealth of conservation-related resources that are published online on institutional and personal websites. There is value in searching across these websites, but this is currently impossible because the published data do not conform to any universal standard. This paper begins with a review of the types of classifications employed for conservation content in several conservation websites. It continues with an analysis of these classifications and it identifies some of their limitations that are related to the lack of conceptual basis of the classification terms used. The paper then draws parallels with similar problems in other professional fields and investigates the technologies used to resolve them. Solutions developed in the fields of computer science and knowledge organization are then described. The paper continues with the survey of two important resources in cultural heritage: the ICOM-CIDOC-CRM and the Getty vocabularies and it explains how these resources can be combined in the field of conservation documentation to assist the implementation of a common publication framework across different resources. A case study for the proposed implementation is then presented based on recent work on the IIC website. The paper concludes with a summary of the benefits of the recommended approach. An appendix with a selection of classification terms with reasonable coverage for conservation content is included
    • …
    corecore