Search CORE

229 research outputs found

Mining Novellas from PubMed Abstracts using a Storytelling Algorithm

Author: Gresock Joseph
Helm Richard
Kumar Deept
Potts Malcolm
Ramakrishnan Naren
Publication venue
Publication date: 01/01/2007
Field of study

Motivation: There are now a multitude of articles published in a diversity of journals providing information about genes, proteins, pathways, and entire processes. Each article investigates particular subsets of a biological process, but to gain insight into the functioning of a system as a whole, we must computationally integrate information across multiple publications. This is especially important in problems such as modeling cross-talk in signaling networks, designing drug therapies for combinatorial selectivity, and unraveling the role of gene interactions in deleterious phenotypes, where the cost of performing combinatorial screens is exorbitant. Results: We present an automated approach to biological knowledge discovery from PubMed abstracts, suitable for unraveling combinatorial relationships. It involves the systematic application of a `storytelling' algorithm followed by compression of the stories into `novellas.' Given a start and end publication, typically with little or no overlap in content, storytelling identifies a chain of intermediate publications from one to the other, such that neighboring publications have significant content similarity. Stories discovered thus provide an argued approach to relate distant concepts through compositions of related concepts. The chains of links employed by stories are then mined to find frequently reused sub-stories, which can be compressed to yield novellas, or compact templates of connections. We demonstrate a successful application of storytelling and novella finding to modeling combinatorial relationships between introduction of extracellular factors and downstream cellular events. Availability: A story visualizer, suitable for interactive exploration of stories and novellas described in this paper, is available for demo/download at https://bioinformatics.cs.vt.edu/storytelling

Computer Science Technical Reports @Virginia Tech

CiteSeerX

Independent Set, Induced Matching, and Pricing: Connections and Tight (Subexponential Time) Approximation Hardnesses

Author: Chalermsook Parinya
Laekhanukit Bundit
Nanongkai Danupon
Publication venue
Publication date: 01/01/2013
Field of study

We present a series of almost settled inapproximability results for three fundamental problems. The first in our series is the subexponential-time inapproximability of the maximum independent set problem, a question studied in the area of parameterized complexity. The second is the hardness of approximating the maximum induced matching problem on bounded-degree bipartite graphs. The last in our series is the tight hardness of approximating the k-hypergraph pricing problem, a fundamental problem arising from the area of algorithmic game theory. In particular, assuming the Exponential Time Hypothesis, our two main results are: - For any r larger than some constant, any r-approximation algorithm for the maximum independent set problem must run in at least 2^{n^{1-\epsilon}/r^{1+\epsilon}} time. This nearly matches the upper bound of 2^{n/r} (Cygan et al., 2008). It also improves some hardness results in the domain of parameterized complexity (e.g., Escoffier et al., 2012 and Chitnis et al., 2013) - For any k larger than some constant, there is no polynomial time min (k^{1-\epsilon}, n^{1/2-\epsilon})-approximation algorithm for the k-hypergraph pricing problem, where n is the number of vertices in an input graph. This almost matches the upper bound of min (O(k), \tilde O(\sqrt{n})) (by Balcan and Blum, 2007 and an algorithm in this paper). We note an interesting fact that, in contrast to n^{1/2-\epsilon} hardness for polynomial-time algorithms, the k-hypergraph pricing problem admits n^{\delta} approximation for any \delta >0 in quasi-polynomial time. This puts this problem in a rare approximability class in which approximability thresholds can be improved significantly by allowing algorithms to run in quasi-polynomial time.Comment: The full version of FOCS 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

MPG.PuRe

Are NLP Models Good at Tracing Thoughts: An Overview of Narrative Understanding

Author: Gui Lin
He Yulan
Zhao Runcong
Zhu Lixing
Publication venue
Publication date: 28/10/2023
Field of study

Narrative understanding involves capturing the author's cognitive processes, providing insights into their knowledge, intentions, beliefs, and desires. Although large language models (LLMs) excel in generating grammatically coherent text, their ability to comprehend the author's thoughts remains uncertain. This limitation hinders the practical applications of narrative understanding. In this paper, we conduct a comprehensive survey of narrative understanding tasks, thoroughly examining their key features, definitions, taxonomy, associated datasets, training objectives, evaluation metrics, and limitations. Furthermore, we explore the potential of expanding the capabilities of modularized LLMs to address novel narrative understanding tasks. By framing narrative understanding as the retrieval of the author's imaginative cues that outline the narrative structure, our study introduces a fresh perspective on enhancing narrative comprehension

arXiv.org e-Print Archive

Efficient Mining of Heterogeneous Star-Structured Data

Author: Rege Manjeet
Yu Qi
Publication venue: RIT Scholar Works
Publication date: 01/01/2008
Field of study

Many of the real world clustering problems arising in data mining applications are heterogeneous in nature. Heterogeneous co-clustering involves simultaneous clustering of objects of two or more data types. While pairwise co-clustering of two data types has been well studied in the literature, research on high-order heterogeneous co-clustering is still limited. In this paper, we propose a graph theoretical framework for addressing star- structured co-clustering problems in which a central data type is connected to all the other data types. Partitioning this graph leads to co-clustering of all the data types under the constraints of the star-structure. Although, graph partitioning approach has been adopted before to address star-structured heterogeneous complex problems, the main contribution of this work lies in an e cient algorithm that we propose for partitioning the star-structured graph. Computationally, our algorithm is very quick as it requires a simple solution to a sparse system of overdetermined linear equations. Theoretical analysis and extensive exper- iments performed on toy and real datasets demonstrate the quality, e ciency and stability of the proposed algorithm

CiteSeerX

RIT Scholar Works

Visualizing News Stories from Annotated Text

Author: Catarina Justo dos Santos Fernandes
Publication venue
Publication date: 03/10/2023
Field of study

Repositório Aberto da Universidade do Porto

Identity and Granularity of Events in Text

Author: Cybulska Agata
Vossen Piek
Publication venue
Publication date: 13/04/2017
Field of study

In this paper we describe a method to detect event descrip- tions in different news articles and to model the semantics of events and their components using RDF representations. We compare these descriptions to solve a cross-document event coreference task. Our com- ponent approach to event semantics defines identity and granularity of events at different levels. It performs close to state-of-the-art approaches on the cross-document event coreference task, while outperforming other works when assuming similar quality of event detection. We demonstrate how granularity and identity are interconnected and we discuss how se- mantic anomaly could be used to define differences between coreference, subevent and topical relations.Comment: Invited keynote speech by Piek Vossen at Cicling 201

arXiv.org e-Print Archive

VU Research Portal

Visualizing the Motion Flow of Crowds

Author: Zhao Zheng
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2018
Field of study

In modern cities, massive population causes problems, like congestion, accident, violence and crime everywhere. Video surveillance system such as closed-circuit television cameras is widely used by security guards to monitor human behaviors and activities to manage, direct, or protect people. With the quantity and prolonged duration of the recorded videos, it requires a huge amount of human resources to examine these video recordings and keep track of activities and events. In recent years, new techniques in computer vision field reduce the barrier of entry, allowing developers to experiment more with intelligent surveillance video system. Different from previous research, this dissertation does not address any algorithm design concerns related to object detection or object tracking. This study will put efforts on the technological side and executing methodologies in data visualization to find the model of detecting anomalies. It would like to provide an understanding of how to detect the behavior of the pedestrians in the video and find out anomalies or abnormal cases by using techniques of data visualization

Purdue E-Pubs

EcoAdapt Working Paper Series N°2: iModeler manual: a quick guide for fuzzy cognitive modelling

Author: Leclerc Grégoire
Publication venue: HAL CCSD
Publication date: 11/11/2014
Field of study

We introduce Fuzzy Cognitive Modelling (FCM) and provide step by step guidance and tips for using iModeler (both qualitative and quantitative approaches), the use of FCM in EcoAdapt Story and Simulation (S&S) approach based on Structured Decision Making, and briefly describe the FCM models being developed in the three study sites. This version correspond to iModeler version 4 (January 2004)

HAL Descartes

Agritrop

HAL-CIRAD