324 research outputs found

    Explicit diversification of event aspects for temporal summarization

    Get PDF
    During major events, such as emergencies and disasters, a large volume of information is reported on newswire and social media platforms. Temporal summarization (TS) approaches are used to automatically produce concise overviews of such events by extracting text snippets from related articles over time. Current TS approaches rely on a combination of event relevance and textual novelty for snippet selection. However, for events that span multiple days, textual novelty is often a poor criterion for selecting snippets, since many snippets are textually unique but are semantically redundant or non-informative. In this article, we propose a framework for the diversification of snippets using explicit event aspects, building on recent works in search result diversification. In particular, we first propose two techniques to identify explicit aspects that a user might want to see covered in a summary for different types of event. We then extend a state-of-the-art explicit diversification framework to maximize the coverage of these aspects when selecting summary snippets for unseen events. Through experimentation over the TREC TS 2013, 2014, and 2015 datasets, we show that explicit diversification for temporal summarization significantly outperforms classical novelty-based diversification, as the use of explicit event aspects reduces the amount of redundant and off-topic snippets returned, while also increasing summary timeliness

    Creating and validating a scholarly knowledge graph using natural language processing and microtask crowdsourcing

    Get PDF
    Due to the growing number of scholarly publications, finding relevant articles becomes increasingly difficult. Scholarly knowledge graphs can be used to organize the scholarly knowledge presented within those publications and represent them in machine-readable formats. Natural language processing (NLP) provides scalable methods to automatically extract knowledge from articles and populate scholarly knowledge graphs. However, NLP extraction is generally not sufficiently accurate and, thus, fails to generate high granularity quality data. In this work, we present TinyGenius, a methodology to validate NLP-extracted scholarly knowledge statements using microtasks performed with crowdsourcing. TinyGenius is employed to populate a paper-centric knowledge graph, using five distinct NLP methods. We extend our previous work of the TinyGenius methodology in various ways. Specifically, we discuss the NLP tasks in more detail and include an explanation of the data model. Moreover, we present a user evaluation where participants validate the generated NLP statements. The results indicate that employing microtasks for statement validation is a promising approach despite the varying participant agreement for different microtasks

    Symbiosis between the TRECVid benchmark and video libraries at the Netherlands Institute for Sound and Vision

    Get PDF
    Audiovisual archives are investing in large-scale digitisation efforts of their analogue holdings and, in parallel, ingesting an ever-increasing amount of born- digital files in their digital storage facilities. Digitisation opens up new access paradigms and boosted re-use of audiovisual content. Query-log analyses show the shortcomings of manual annotation, therefore archives are complementing these annotations by developing novel search engines that automatically extract information from both audio and the visual tracks. Over the past few years, the TRECVid benchmark has developed a novel relationship with the Netherlands Institute of Sound and Vision (NISV) which goes beyond the NISV just providing data and use cases to TRECVid. Prototype and demonstrator systems developed as part of TRECVid are set to become a key driver in improving the quality of search engines at the NISV and will ultimately help other audiovisual archives to offer more efficient and more fine-grained access to their collections. This paper reports the experiences of NISV in leveraging the activities of the TRECVid benchmark

    Thinking outside the graph: scholarly knowledge graph construction leveraging natural language processing

    Get PDF
    Despite improved digital access to scholarly knowledge in recent decades, scholarly communication remains exclusively document-based. The document-oriented workflows in science publication have reached the limits of adequacy as highlighted by recent discussions on the increasing proliferation of scientific literature, the deficiency of peer-review and the reproducibility crisis. In this form, scientific knowledge remains locked in representations that are inadequate for machine processing. As long as scholarly communication remains in this form, we cannot take advantage of all the advancements taking place in machine learning and natural language processing techniques. Such techniques would facilitate the transformation from pure text based into (semi-)structured semantic descriptions that are interlinked in a collection of big federated graphs. We are in dire need for a new age of semantically enabled infrastructure adept at storing, manipulating, and querying scholarly knowledge. Equally important is a suite of machine assistance tools designed to populate, curate, and explore the resulting scholarly knowledge graph. In this thesis, we address the issue of constructing a scholarly knowledge graph using natural language processing techniques. First, we tackle the issue of developing a scholarly knowledge graph for structured scholarly communication, that can be populated and constructed automatically. We co-design and co-implement the Open Research Knowledge Graph (ORKG), an infrastructure capable of modeling, storing, and automatically curating scholarly communications. Then, we propose a method to automatically extract information into knowledge graphs. With Plumber, we create a framework to dynamically compose open information extraction pipelines based on the input text. Such pipelines are composed from community-created information extraction components in an effort to consolidate individual research contributions under one umbrella. We further present MORTY as a more targeted approach that leverages automatic text summarization to create from the scholarly article's text structured summaries containing all required information. In contrast to the pipeline approach, MORTY only extracts the information it is instructed to, making it a more valuable tool for various curation and contribution use cases. Moreover, we study the problem of knowledge graph completion. exBERT is able to perform knowledge graph completion tasks such as relation and entity prediction tasks on scholarly knowledge graphs by means of textual triple classification. Lastly, we use the structured descriptions collected from manual and automated sources alike with a question answering approach that builds on the machine-actionable descriptions in the ORKG. We propose JarvisQA, a question answering interface operating on tabular views of scholarly knowledge graphs i.e., ORKG comparisons. JarvisQA is able to answer a variety of natural language questions, and retrieve complex answers on pre-selected sub-graphs. These contributions are key in the broader agenda of studying the feasibility of natural language processing methods on scholarly knowledge graphs, and lays the foundation of which methods can be used on which cases. Our work indicates what are the challenges and issues with automatically constructing scholarly knowledge graphs, and opens up future research directions

    From Compute to Data: Across-the-Stack System Design for Intelligent Applications

    Full text link
    Intelligent applications such as Apple Siri, Google Assistant and Amazon Alexa have gained tremendous popularity in recent years. With human-like understanding capabilities and natural language interface, this class of applications is quickly becoming peopleā€™s preferred way of interacting with their mobile, wearable and smart home devices. There have been considerable advancement in machine learning research that aim to further enhance the understanding capability of intelligent applications, however there exist significant roadblocks in applying state-of-the-art algorithms and techniques to a real-world use case. First, as machine learning algorithms becomes more sophisticated, it imposes higher computation requirements for the underlying software and hardware system to process intelligent application request efficiently. Second, state-of-the-art algorithms and techniques is not guaranteed to provide the same level of prediction and classification accuracy when applied to tasks required in real-world intelligent applications, which are often different and more complex than what are studied in a research environment. This dissertation addresses these roadblocks by investigating the key challenges across multiple components in an intelligent application system. Specifically, we identify the key compute and data challenges and presents system design and techniques. To improve the computational performance of the hardware and software system, we challenge the status-quo approach of cloud-only intelligent application processing and propose computation partitioning strategies that effectively leverage both the cycles in the cloud and on the mobile device to achieve low latency, low energy consumption and high datacenter throughput. We characterize and taxonomize state-of-the- art deep learning based natural language processing (NLP) applications to identify the algorithmic design elements and computational patterns that render conventional GPU acceleration techniques ineffective on this class of applications. Leveraging their unique characteristics, we design and implement a novel fine-grain cross-input batching techniques for providing GPU acceleration to a number of state-of-the-art NLP applications. For the data component, large scale and effective training data, in addition to algorithm, is necessary to achieve high prediction accuracy. We investigate the challenge of effective large-scale training data collection via crowdsourcing. We propose novel metrics to evaluate the quality of training data for building real-word intelligent application systems. We leverage this methodology to study the trade-off of multiple crowdsourcing methods and provide recommendations on best training data crowdsourcing practices.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145886/1/ypkang_1.pd

    Modeling Crowd Feedback in the Mobile App Market

    Get PDF
    Mobile application (app) stores, such as Google Play and the Apple App Store, have recently emerged as a new model of online distribution platform. These stores have expanded in size in the past five years to host millions of apps, offering end-users of mobile software virtually unlimited options to choose from. In such a competitive market, no app is too big to fail. In fact, recent evidence has shown that most apps lose their users within the first 90 days after initial release. Therefore, app developers have to remain up-to-date with their end-usersā€™ needs in order to survive. Staying close to the user not only minimizes the risk of failure, but also serves as a key factor in achieving market competitiveness as well as managing and sustaining innovation. However, establishing effective communication channels with app users can be a very challenging and demanding process. Specifically, users\u27 needs are often tacit, embedded in the complex interplay between the user, system, and market components of the mobile app ecosystem. Furthermore, such needs are scattered over multiple channels of feedback, such as app store reviews and social media platforms. To address these challenges, in this dissertation, we incorporate methods of requirements modeling, data mining, domain engineering, and market analysis to develop a novel set of algorithms and tools for automatically classifying, synthesizing, and modeling the crowd\u27s feedback in the mobile app market. Our analysis includes a set of empirical investigations and case studies, utilizing multiple large-scale datasets of mobile user data, in order to devise, calibrate, and validate our algorithms and tools. The main objective is to introduce a new form of crowd-driven software models that can be used by app developers to effectively identify and prioritize their end-users\u27 concerns, develop apps to meet these concerns, and uncover optimized pathways of survival in the mobile app ecosystem

    JURI SAYS:An Automatic Judgement Prediction System for the European Court of Human Rights

    Get PDF
    In this paper we present the web platform JURI SAYS that automatically predicts decisions of the European Court of Human Rights based on communicated cases, which are published by the court early in the proceedings and are often available many years before the final decision is made. Our system therefore predicts future judgements of the court. The platform is available at jurisays.com and shows the predictions compared to the actual decisions of the court. It is automatically updated every month by including the prediction for the new cases. Additionally, the system highlights the sentences and paragraphs that are most important for the prediction (i.e. violation vs. no violation of human rights)
    • ā€¦
    corecore