25 research outputs found

    Topic Retrospection with Storyline-based Summarization on News Reports

    Get PDF
    The electronics newspaper becomes a main source for online news readers. When facing the numerous stories of a series of events, news readers need some supports in order to review a topic in an efficient way. Besides identifying events and presenting the search results with news titles and keywords the TDT (Topic Detection and Tracking) is used to do, a summarized text to present event evolution is necessary for general news readers to review events under a news topic. This paper proposes a topic retrospection process and implements the SToRe system that identifies various events under a news topic, and composes a summary that news readers can get the sketch of event evolution in the topic. It consists of three main functions: event identification, main storyline construction and storyline-based summarization. The constructed main storyline can remove the irrelevant events and present a main theme. The summarization extracts the representative sentences and takes the main theme as the template to compose summary. The summarization not only provides enough information to comprehend the development of a topic, but also serves as an index to help readers to find more detailed information. A lab experiment is conducted to evaluate the SToRe system in the question-and-answer (Q&A) setting. The experimental results show that the SToRe system enables news readers to effectively and efficiently capture the evolution of a news topic

    Individualized Storyline-based News Topic Retrospection

    Get PDF
    It takes a great effort for common news readers to track events promptly, and not to mention that they can retrospect them precisely after it occurred for a long time period. Although topic detection and tracking techniques have been developed to promptly identify and keep track of similar events in a topic and monitor their progress, the cognitive load remains for a reader to digest these reports. A storyline-based summarization may facilitate readers to recall occurred events in a topic by extracting informative sentences of news reports to compose a concise summary with essential episodes. This paper proposes SToRe (Story-line based Topic Retrospection), that identifies events from news reports and composes a storyline summary to portray the event evolution in a topic. It consists of three main functions: event identification, main storyline construction and storyline-based summarization. The main storyline guides the extraction of representative sentences from news articles to summarize occurred events. This study demonstrates that different topic term sets result in different storylines, and in turn, different summaries. This adaptation is useful for users to review occurred news topics in different storylines

    A Survey on Event-based News Narrative Extraction

    Full text link
    Narratives are fundamental to our understanding of the world, providing us with a natural structure for knowledge representation over time. Computational narrative extraction is a subfield of artificial intelligence that makes heavy use of information retrieval and natural language processing techniques. Despite the importance of computational narrative extraction, relatively little scholarly work exists on synthesizing previous research and strategizing future research in the area. In particular, this article focuses on extracting news narratives from an event-centric perspective. Extracting narratives from news data has multiple applications in understanding the evolving information landscape. This survey presents an extensive study of research in the area of event-based news narrative extraction. In particular, we screened over 900 articles that yielded 54 relevant articles. These articles are synthesized and organized by representation model, extraction criteria, and evaluation approaches. Based on the reviewed studies, we identify recent trends, open challenges, and potential research lines.Comment: 37 pages, 3 figures, to be published in the journal ACM CSU

    Data mining techniques for complex application domains

    Get PDF
    The emergence of advanced communication techniques has increased availability of large collection of data in electronic form in a number of application domains including healthcare, e- business, and e-learning. Everyday a large amount of records are stored electronically. However, finding useful information from such a large data collection is a challenging issue. Data mining technology aims automatically extracting hidden knowledge from large data repositories exploiting sophisticated algorithms. The hidden knowledge in the electronic data may be potentially utilized to facilitate the procedures, productivity, and reliability of several application domains. The PhD activity has been focused on novel and effective data mining approaches to tackle the complex data coming from two main application domains: Healthcare data analysis and Textual data analysis. The research activity, in the context of healthcare data, addressed the application of different data mining techniques to discover valuable knowledge from real exam-log data of patients. In particular, efforts have been devoted to the extraction of medical pathways, which can be exploited to analyze the actual treatments followed by patients. The derived knowledge not only provides useful information to deal with the treatment procedures but may also play an important role in future predictions of potential patient risks associated with medical treatments. The research effort in textual data analysis is twofold. On the one hand, a novel approach to discovery of succinct summaries of large document collections has been proposed. On the other hand, the suitability of an established descriptive data mining to support domain experts in making decisions has been investigated. Both research activities are focused on adopting widely exploratory data mining techniques to textual data analysis, which require overcoming intrinsic limitations for traditional algorithms for handling textual documents efficiently and effectively

    Extracting Causal Relations between News Topics from Distributed Sources

    Get PDF
    The overwhelming amount of online news presents a challenge called news information overload. To mitigate this challenge we propose a system to generate a causal network of news topics. To extract this information from distributed news sources, a system called Forest was developed. Forest retrieves documents that potentially contain causal information regarding a news topic. The documents are processed at a sentence level to extract causal relations and news topic references, these are the phases used to refer to a news topic. Forest uses a machine learning approach to classify causal sentences, and then renders the potential cause and effect of the sentences. The potential cause and effect are then classified as news topic references, these are the phrases used to refer to a news topics, such as “The World Cup” or “The Financial Meltdown”. Both classifiers use an algorithm developed within our working group, the algorithm performs better than several well known classification algorithms for the aforementioned tasks. In our evaluations we found that participants consider causal information useful to understand the news, and that while we can not extract causal information for all news topics, it is highly likely that we can extract causal relation for the most popular news topics. To evaluate the accuracy of the extractions made by Forest, we completed a user survey. We found that by providing the top ranked results, we obtained a high accuracy in extracting causal relations between news topics

    Towards Personalized and Human-in-the-Loop Document Summarization

    Full text link
    The ubiquitous availability of computing devices and the widespread use of the internet have generated a large amount of data continuously. Therefore, the amount of available information on any given topic is far beyond humans' processing capacity to properly process, causing what is known as information overload. To efficiently cope with large amounts of information and generate content with significant value to users, we require identifying, merging and summarising information. Data summaries can help gather related information and collect it into a shorter format that enables answering complicated questions, gaining new insight and discovering conceptual boundaries. This thesis focuses on three main challenges to alleviate information overload using novel summarisation techniques. It further intends to facilitate the analysis of documents to support personalised information extraction. This thesis separates the research issues into four areas, covering (i) feature engineering in document summarisation, (ii) traditional static and inflexible summaries, (iii) traditional generic summarisation approaches, and (iv) the need for reference summaries. We propose novel approaches to tackle these challenges, by: i)enabling automatic intelligent feature engineering, ii) enabling flexible and interactive summarisation, iii) utilising intelligent and personalised summarisation approaches. The experimental results prove the efficiency of the proposed approaches compared to other state-of-the-art models. We further propose solutions to the information overload problem in different domains through summarisation, covering network traffic data, health data and business process data.Comment: PhD thesi

    Facilitating Reading through a Theme-Driven Approach

    Get PDF
    Readers often encounter the need to explore a document only for a specific point of interest. We call the phenomena of approaching a narrative not for its entirety, but for a thread of a particular topic, thematic reading. Present reading tools and information retrieval techniques provide only limited assistance to readers in such a situation. Our research centers on this phenomenon. We conducted investigations on both human behavior and machine automation, with a goal of better meeting the requirements of thematic reading. To observe readers? behavior and understand their expectations, we implemented a reader?s interface with designs targeting the predicted needs of thematic readers. We conducted user studies using both the system and Microsoft Word. We proved that thematic reading is capable of achieving the goal of understanding a specific topic, at least to a degree that succeeds in topic-wise tasks. We also reached guidelines for designing future reading platforms in major aspects such as view, navigation, and contextual awareness. As for machine automation, we investigated the potential to automatically locate thematically relevant excerpts. This investigation was inspired by the editorial compilation of a textbook index. To increase the search performance, we proposed a two-step methodology which first expands the query with expansion and then filters the intermediate results by checking the term-occurrence proximity. For query expansion, we compared the query expansion with WordNet, morphological inflections, and both processes together. Our results show that in the context of our study, WordNet made almost no contribution to the enhancement of recall, while expansion with the inflectional variants turned out to be a successful and essential scheme. For the refinement section, the results show that the proximity check on the alternative phrases formed after inflectional expansion can effectively increase the precision of the previously acquired return results. We further tested a different scheme ? using sliding window ? of defining target and verification units in the methodology. Our findings show that the structural delimitations (sentences and chapters) outperformed sliding windows. The first scheme was able to achieve consistently desirable results, while the results from the second were inconclusive

    歌詞の談話構造のモデル化

    Get PDF
    Tohoku University乾健太郎課
    corecore