17 research outputs found

    Diachronic Variation of Temporal Expressions in Scientific Writing Through the Lens of Relative Entropy

    Get PDF
    The abundance of temporal information in documents has lead to an increased interest in processing such information in the NLP community by considering temporal expressions. Besides domain-adaptation, acquiring knowledge on variation of temporal expressions according to time is relevant for improvement in automatic processing. So far, frequency-based accounts dominate in the investigation of specific temporal expressions. We present an approach to investigate diachronic changes of temporal expressions based on relative entropy – with the advantage of using conditioned probabilities rather than mere frequency. While we focus on scientific writing, our approach is generalizable to other domains and interesting not only in the field of NLP, but also in humanities.This work is partially funded by Deutsche Forschungsgemeinschaft (DFG) under grant SFB 1102: Information Density and Linguistic Encoding (www.sfb1102.uni-saarland.de)

    Real-time timeline summarisation for high-impact events in Twitter.

    Get PDF
    Twitter has become a valuable source of event-related information, namely, breaking news and local event reports. Due to its capability of transmitting information in real-time, Twitter is further exploited for timeline summarisation of high-impact events, such as protests, accidents, natural disasters or disease outbreaks. Such summaries can serve as important event digests where users urgently need information, especially if they are directly affected by the events. In this paper, we study the problem of timeline summarisation of high-impact events that need to be generated in real-time. Our proposed approach includes four stages: classification of realworld events reporting tweets, online incremental clustering, postprocessing and sub-events summarisation. We conduct a comprehensive evaluation of different stages on the “Ebola outbreak” tweet stream, and compare our approach with several baselines, to demonstrate its effectiveness. Our approach can be applied as a replacement of a manually generated timeline and provides early alarms for disaster surveillance

    Learning to Predict a Time-aware Ranking Method

    No full text

    Insights into Entity Name Evolution on Wikipedia

    No full text

    An event extraction model based on timeline and user analysis in Latent Dirichlet allocation

    No full text

    Temporal information retrieval

    No full text
    Temporal dynamics and how they impact upon various components of information retrieval (IR) systems have received a large share of attention in the last decade. In particular, the study of relevance in information retrieval can now be framed within the so-called temporal IR approaches, which explain how user behavior, document content and scale vary with time, and how we can use them in our favor in order to improve retrieval effectiveness. This survey provides a comprehensive overview of temporal IR approaches, centered on the following questions: what are temporal dynamics, why do they occur, and when and how to leverage temporal information throughout the search cycle and architecture. We first explain the general and wide aspects associated to temporal dynamics by focusing on the web domain, from content and structural changes to variations of user behavior and inte ractions. Next, we pinpoint several research issues and the impact of such temporal characteristics on search, essentially regarding processing dynamic content, temporal query analysis and time-aware ranking. We also address particular aspects of temporal information extraction (for instance, how to timestamp documents and generate temporal profiles of text). To this end, we present existing temporal search engines and applications in related research areas, e.g., exploration, summarization, and clustering of search results, as well as future event retrieval and prediction, where the time dimension also plays an important role

    Retrieving Time from Scanned Books

    No full text
    Abstract. While millions of scanned books have become available in re-cent years, this vast collection of data remains under-utilized. Book search is often limited to summaries or metadata, and connecting information to primary sources can be a challenge. Even though digital books provide rich historical information on all subjects, leveraging this data is difficult. To explore how we can access this historical information, we study the problem of identifying relevant times for a given query. That is – given a user query or a description of an event, we attempt to use historical sources to locate that event in time. We use state-of-the-art NLP tools to identify and extract mentions of times present in our corpus, and then propose a number of models for organizing this historical information. Since no truth data is readily available for our task, we automatically derive dated event descriptions from Wikipedia, leveraging the both the wisdom of the crowd and the wisdom of experts. Using 15,000 events from between the years 1000 and 1925 as queries, we evaluate our approach on a collection of 50,000 books from the Internet Archive. We discuss the tradeoffs between context, retrieval performance, and efficiency.

    GTE-Rank

    No full text

    Leveraging Dynamic Query Subtopics for Time-Aware Search Result Diversification

    No full text

    Determining Time of Queries for Re-ranking Search Results

    No full text
    corecore