229 research outputs found
Mining Novellas from PubMed Abstracts using a Storytelling Algorithm
Motivation: There are now a multitude of articles published in a diversity of journals providing information about genes, proteins, pathways, and entire processes. Each article investigates particular subsets of a biological process, but to gain insight into the functioning of a system as a whole, we must computationally integrate information across multiple publications. This is especially important in problems such as modeling cross-talk in signaling networks, designing drug therapies for combinatorial selectivity, and unraveling the role of gene interactions in deleterious phenotypes, where the cost of performing combinatorial screens is exorbitant.
Results: We present an automated approach to biological knowledge discovery from PubMed abstracts, suitable for unraveling combinatorial relationships. It involves the systematic application of a `storytelling' algorithm followed by compression of the stories into `novellas.' Given a start and end publication, typically with little or no overlap in content, storytelling identifies a chain of intermediate publications from one to the other, such that neighboring publications have significant content similarity. Stories discovered thus provide an argued approach to relate distant concepts through compositions of related concepts. The chains of links employed by stories are then mined to find frequently reused sub-stories, which can be compressed to yield novellas, or compact templates of connections. We demonstrate a successful application of storytelling and novella finding to modeling combinatorial relationships between introduction of extracellular factors and downstream cellular events.
Availability: A story visualizer, suitable for interactive exploration of stories and novellas described in this paper, is available for demo/download at https://bioinformatics.cs.vt.edu/storytelling
Independent Set, Induced Matching, and Pricing: Connections and Tight (Subexponential Time) Approximation Hardnesses
We present a series of almost settled inapproximability results for three
fundamental problems. The first in our series is the subexponential-time
inapproximability of the maximum independent set problem, a question studied in
the area of parameterized complexity. The second is the hardness of
approximating the maximum induced matching problem on bounded-degree bipartite
graphs. The last in our series is the tight hardness of approximating the
k-hypergraph pricing problem, a fundamental problem arising from the area of
algorithmic game theory. In particular, assuming the Exponential Time
Hypothesis, our two main results are:
- For any r larger than some constant, any r-approximation algorithm for the
maximum independent set problem must run in at least
2^{n^{1-\epsilon}/r^{1+\epsilon}} time. This nearly matches the upper bound of
2^{n/r} (Cygan et al., 2008). It also improves some hardness results in the
domain of parameterized complexity (e.g., Escoffier et al., 2012 and Chitnis et
al., 2013)
- For any k larger than some constant, there is no polynomial time min
(k^{1-\epsilon}, n^{1/2-\epsilon})-approximation algorithm for the k-hypergraph
pricing problem, where n is the number of vertices in an input graph. This
almost matches the upper bound of min (O(k), \tilde O(\sqrt{n})) (by Balcan and
Blum, 2007 and an algorithm in this paper).
We note an interesting fact that, in contrast to n^{1/2-\epsilon} hardness
for polynomial-time algorithms, the k-hypergraph pricing problem admits
n^{\delta} approximation for any \delta >0 in quasi-polynomial time. This puts
this problem in a rare approximability class in which approximability
thresholds can be improved significantly by allowing algorithms to run in
quasi-polynomial time.Comment: The full version of FOCS 201
Are NLP Models Good at Tracing Thoughts: An Overview of Narrative Understanding
Narrative understanding involves capturing the author's cognitive processes,
providing insights into their knowledge, intentions, beliefs, and desires.
Although large language models (LLMs) excel in generating grammatically
coherent text, their ability to comprehend the author's thoughts remains
uncertain. This limitation hinders the practical applications of narrative
understanding. In this paper, we conduct a comprehensive survey of narrative
understanding tasks, thoroughly examining their key features, definitions,
taxonomy, associated datasets, training objectives, evaluation metrics, and
limitations. Furthermore, we explore the potential of expanding the
capabilities of modularized LLMs to address novel narrative understanding
tasks. By framing narrative understanding as the retrieval of the author's
imaginative cues that outline the narrative structure, our study introduces a
fresh perspective on enhancing narrative comprehension
Efficient Mining of Heterogeneous Star-Structured Data
Many of the real world clustering problems arising in data mining applications are heterogeneous in nature. Heterogeneous co-clustering involves simultaneous clustering of objects of two or more data types. While pairwise co-clustering of two data types has been well studied in the literature, research on high-order heterogeneous co-clustering is still limited. In this paper, we propose a graph theoretical framework for addressing star- structured co-clustering problems in which a central data type is connected to all the other data types. Partitioning this graph leads to co-clustering of all the data types under the constraints of the star-structure. Although, graph partitioning approach has been adopted before to address star-structured heterogeneous complex problems, the main contribution of this work lies in an e cient algorithm that we propose for partitioning the star-structured graph. Computationally, our algorithm is very quick as it requires a simple solution to a sparse system of overdetermined linear equations. Theoretical analysis and extensive exper- iments performed on toy and real datasets demonstrate the quality, e ciency and stability of the proposed algorithm
Identity and Granularity of Events in Text
In this paper we describe a method to detect event descrip- tions in
different news articles and to model the semantics of events and their
components using RDF representations. We compare these descriptions to solve a
cross-document event coreference task. Our com- ponent approach to event
semantics defines identity and granularity of events at different levels. It
performs close to state-of-the-art approaches on the cross-document event
coreference task, while outperforming other works when assuming similar quality
of event detection. We demonstrate how granularity and identity are
interconnected and we discuss how se- mantic anomaly could be used to define
differences between coreference, subevent and topical relations.Comment: Invited keynote speech by Piek Vossen at Cicling 201
Visualizing the Motion Flow of Crowds
In modern cities, massive population causes problems, like congestion, accident, violence and crime everywhere. Video surveillance system such as closed-circuit television cameras is widely used by security guards to monitor human behaviors and activities to manage, direct, or protect people. With the quantity and prolonged duration of the recorded videos, it requires a huge amount of human resources to examine these video recordings and keep track of activities and events. In recent years, new techniques in computer vision field reduce the barrier of entry, allowing developers to experiment more with intelligent surveillance video system. Different from previous research, this dissertation does not address any algorithm design concerns related to object detection or object tracking. This study will put efforts on the technological side and executing methodologies in data visualization to find the model of detecting anomalies. It would like to provide an understanding of how to detect the behavior of the pedestrians in the video and find out anomalies or abnormal cases by using techniques of data visualization
EcoAdapt Working Paper Series N°2: iModeler manual: a quick guide for fuzzy cognitive modelling
We introduce Fuzzy Cognitive Modelling (FCM) and provide step by step guidance and tips for using iModeler (both qualitative and quantitative approaches), the use of FCM in EcoAdapt Story and Simulation (S&S) approach based on Structured Decision Making, and briefly describe the FCM models being developed in the three study sites. This version correspond to iModeler version 4 (January 2004)
- …