2 research outputs found

    Social Network Extraction and Exploration of Historic Correspondences

    Get PDF
    Historic correspondences, in the form of letters, provide a scenario in which historic figures and events are reflected and thus play a ubiquitous role in the study of history. Confronted with the digitization of thousands of historic letters and motivated by the potentially valuable insights into history and intuitive quantitative relations between historic persons, researchers have recently focused on the network analysis of historic correspondences. However, most related research constructs the correspondence networks only based on the sender-recipient relation with the objective of visualization. Very few of them have proceeded beyond the above stage to exploit the detailed modeling of correspondence networks, let alone to develop novel concepts and algorithms derived from network analysis or formal approaches to the data uncertainty issue in historic correspondence. In the context of this dissertation, we develop a comprehensive correspondence network model, which integrates the personal, temporal, geographical, and topic information extracted from letter metadata and letter content into a hypergraph structure. Based on our correspondence network model, we analyze three types of person-person relations (sender-recipient, co-sender, and co-recipient) and two types of person-topic relations (author-topic and sender-recipient-topic) statically and dynamically. We develop multiple measurements, such as local and global reciprocity for quantifying reciprocal behavior in weighted networks, and the topic participation score for quantifying interests or the focus of individuals or real-life communities. We investigate the rising and the fading trends of topics in order to find correlations among persons, topics, and historic events. Furthermore, we develop a novel probabilistic framework for refinement of uncertain person names, geographical location names, and temporal expressions in the metadata of historic letters. We conduct extensive experiments using letter collections to validate and evaluate the proposed models and measurements in this dissertation. A thorough discussion of experimental results shows the effectiveness, applicability and advantages of our developed models and approaches

    Implicit Entity Networks: A Versatile Document Model

    Get PDF
    The time in which we live is often referred to as the Information Age. However, it can also aptly be characterized as an age of constant information overload. Nowhere is this more present than on the Web, which serves as an endless source of news articles, blog posts, and social media messages. Of course, this overload is even greater in professions that handle the creation or extraction of information and knowledge, such as journalists, lawyers, researchers, clerks, or medical professionals. The volume of available documents and the interconnectedness of their contents are both a blessing and a curse for the contemporary information consumer. On the one hand, they provide near limitless information, but on the other hand, their consumption and comprehension requires an amount of time that many of us cannot spare. As a result, automated extraction, aggregation, and summarization techniques have risen in popularity, even though they are a long way from being comprehensive. When we, as humans, are faced with an overload of information, we tend to look for patterns that bring order into the chaos. In news, we might identify familiar political figures or celebrities, whereas we might look for expressive symptoms in medicine, or precedential cases in law. In other words, we look for known entities as reference points, and then explore the content along the lines of their relations to others entities. Unfortunately, this approach is not reflected in current document models, which do not provide a similar focus on entities. As a direct result, the retrieval of entity-centric knowledge and relations from a flood of textual information becomes more difficult than it has to be, and the inclusion of external knowledge sources is impeded. In this thesis, we introduce implicit entity networks as a comprehensive document model that addresses this shortcoming and provides a holistic representation of document collections and document streams. Based on the premise of modelling the cooccurrence relations between terms and entities as first-class citizens, we investigate how the resulting network structure facilitates efficient and effective entity-centric search, and demonstrate the extraction of complex entity relations, as well as their summarization. We show that the implicit network model is fully compatible with dynamic streams of documents. Furthermore, we introduce document aggregation methods that are sensitive to the context of entity mentions, and can be used to distinguish between different entity relations. Beyond the relations of individual entities, we introduce network topics as a novel and scalable method for the extraction of topics from collections and streams of documents. Finally, we combine the insights gained from these applications in a versatile hypergraph document model that bridges the gap between unstructured text and structured knowledge sources