229 research outputs found

    Mining Novellas from PubMed Abstracts using a Storytelling Algorithm

    Get PDF
    Motivation: There are now a multitude of articles published in a diversity of journals providing information about genes, proteins, pathways, and entire processes. Each article investigates particular subsets of a biological process, but to gain insight into the functioning of a system as a whole, we must computationally integrate information across multiple publications. This is especially important in problems such as modeling cross-talk in signaling networks, designing drug therapies for combinatorial selectivity, and unraveling the role of gene interactions in deleterious phenotypes, where the cost of performing combinatorial screens is exorbitant. Results: We present an automated approach to biological knowledge discovery from PubMed abstracts, suitable for unraveling combinatorial relationships. It involves the systematic application of a `storytelling' algorithm followed by compression of the stories into `novellas.' Given a start and end publication, typically with little or no overlap in content, storytelling identifies a chain of intermediate publications from one to the other, such that neighboring publications have significant content similarity. Stories discovered thus provide an argued approach to relate distant concepts through compositions of related concepts. The chains of links employed by stories are then mined to find frequently reused sub-stories, which can be compressed to yield novellas, or compact templates of connections. We demonstrate a successful application of storytelling and novella finding to modeling combinatorial relationships between introduction of extracellular factors and downstream cellular events. Availability: A story visualizer, suitable for interactive exploration of stories and novellas described in this paper, is available for demo/download at https://bioinformatics.cs.vt.edu/storytelling

    Independent Set, Induced Matching, and Pricing: Connections and Tight (Subexponential Time) Approximation Hardnesses

    Full text link
    We present a series of almost settled inapproximability results for three fundamental problems. The first in our series is the subexponential-time inapproximability of the maximum independent set problem, a question studied in the area of parameterized complexity. The second is the hardness of approximating the maximum induced matching problem on bounded-degree bipartite graphs. The last in our series is the tight hardness of approximating the k-hypergraph pricing problem, a fundamental problem arising from the area of algorithmic game theory. In particular, assuming the Exponential Time Hypothesis, our two main results are: - For any r larger than some constant, any r-approximation algorithm for the maximum independent set problem must run in at least 2^{n^{1-\epsilon}/r^{1+\epsilon}} time. This nearly matches the upper bound of 2^{n/r} (Cygan et al., 2008). It also improves some hardness results in the domain of parameterized complexity (e.g., Escoffier et al., 2012 and Chitnis et al., 2013) - For any k larger than some constant, there is no polynomial time min (k^{1-\epsilon}, n^{1/2-\epsilon})-approximation algorithm for the k-hypergraph pricing problem, where n is the number of vertices in an input graph. This almost matches the upper bound of min (O(k), \tilde O(\sqrt{n})) (by Balcan and Blum, 2007 and an algorithm in this paper). We note an interesting fact that, in contrast to n^{1/2-\epsilon} hardness for polynomial-time algorithms, the k-hypergraph pricing problem admits n^{\delta} approximation for any \delta >0 in quasi-polynomial time. This puts this problem in a rare approximability class in which approximability thresholds can be improved significantly by allowing algorithms to run in quasi-polynomial time.Comment: The full version of FOCS 201

    Are NLP Models Good at Tracing Thoughts: An Overview of Narrative Understanding

    Full text link
    Narrative understanding involves capturing the author's cognitive processes, providing insights into their knowledge, intentions, beliefs, and desires. Although large language models (LLMs) excel in generating grammatically coherent text, their ability to comprehend the author's thoughts remains uncertain. This limitation hinders the practical applications of narrative understanding. In this paper, we conduct a comprehensive survey of narrative understanding tasks, thoroughly examining their key features, definitions, taxonomy, associated datasets, training objectives, evaluation metrics, and limitations. Furthermore, we explore the potential of expanding the capabilities of modularized LLMs to address novel narrative understanding tasks. By framing narrative understanding as the retrieval of the author's imaginative cues that outline the narrative structure, our study introduces a fresh perspective on enhancing narrative comprehension

    Efficient Mining of Heterogeneous Star-Structured Data

    Get PDF
    Many of the real world clustering problems arising in data mining applications are heterogeneous in nature. Heterogeneous co-clustering involves simultaneous clustering of objects of two or more data types. While pairwise co-clustering of two data types has been well studied in the literature, research on high-order heterogeneous co-clustering is still limited. In this paper, we propose a graph theoretical framework for addressing star- structured co-clustering problems in which a central data type is connected to all the other data types. Partitioning this graph leads to co-clustering of all the data types under the constraints of the star-structure. Although, graph partitioning approach has been adopted before to address star-structured heterogeneous complex problems, the main contribution of this work lies in an e cient algorithm that we propose for partitioning the star-structured graph. Computationally, our algorithm is very quick as it requires a simple solution to a sparse system of overdetermined linear equations. Theoretical analysis and extensive exper- iments performed on toy and real datasets demonstrate the quality, e ciency and stability of the proposed algorithm

    Identity and Granularity of Events in Text

    Full text link
    In this paper we describe a method to detect event descrip- tions in different news articles and to model the semantics of events and their components using RDF representations. We compare these descriptions to solve a cross-document event coreference task. Our com- ponent approach to event semantics defines identity and granularity of events at different levels. It performs close to state-of-the-art approaches on the cross-document event coreference task, while outperforming other works when assuming similar quality of event detection. We demonstrate how granularity and identity are interconnected and we discuss how se- mantic anomaly could be used to define differences between coreference, subevent and topical relations.Comment: Invited keynote speech by Piek Vossen at Cicling 201

    Visualizing the Motion Flow of Crowds

    Get PDF
    In modern cities, massive population causes problems, like congestion, accident, violence and crime everywhere. Video surveillance system such as closed-circuit television cameras is widely used by security guards to monitor human behaviors and activities to manage, direct, or protect people. With the quantity and prolonged duration of the recorded videos, it requires a huge amount of human resources to examine these video recordings and keep track of activities and events. In recent years, new techniques in computer vision field reduce the barrier of entry, allowing developers to experiment more with intelligent surveillance video system. Different from previous research, this dissertation does not address any algorithm design concerns related to object detection or object tracking. This study will put efforts on the technological side and executing methodologies in data visualization to find the model of detecting anomalies. It would like to provide an understanding of how to detect the behavior of the pedestrians in the video and find out anomalies or abnormal cases by using techniques of data visualization

    EcoAdapt Working Paper Series N°2: iModeler manual: a quick guide for fuzzy cognitive modelling

    Get PDF
    We introduce Fuzzy Cognitive Modelling (FCM) and provide step by step guidance and tips for using iModeler (both qualitative and quantitative approaches), the use of FCM in EcoAdapt Story and Simulation (S&S) approach based on Structured Decision Making, and briefly describe the FCM models being developed in the three study sites. This version correspond to iModeler version 4 (January 2004)
    corecore