155 research outputs found
DDI Prediction via Heterogeneous Graph Attention Networks
Polypharmacy, defined as the use of multiple drugs together, is a standard
treatment method, especially for severe and chronic diseases. However, using
multiple drugs together may cause interactions between drugs. Drug-drug
interaction (DDI) is the activity that occurs when the impact of one drug
changes when combined with another. DDIs may obstruct, increase, or decrease
the intended effect of either drug or, in the worst-case scenario, create
adverse side effects. While it is critical to detect DDIs on time, it is
timeconsuming and expensive to identify them in clinical trials due to their
short duration and many possible drug pairs to be considered for testing. As a
result, computational methods are needed for predicting DDIs. In this paper, we
present a novel heterogeneous graph attention model, HAN-DDI to predict
drug-drug interactions. We create a heterogeneous network of drugs with
different biological entities. Then, we develop a heterogeneous graph attention
network to learn DDIs using relations of drugs with other entities. It consists
of an attention-based heterogeneous graph node encoder for obtaining drug node
representations and a decoder for predicting drug-drug interactions. Further,
we utilize comprehensive experiments to evaluate of our model and to compare it
with state-of-the-art models. Experimental results show that our proposed
method, HAN-DDI, outperforms the baselines significantly and accurately
predicts DDIs, even for new drugs.Comment: 10 pages, 3 figures, 8 tables, accepted in BioKD
DeepGATGO: A Hierarchical Pretraining-Based Graph-Attention Model for Automatic Protein Function Prediction
Automatic protein function prediction (AFP) is classified as a large-scale
multi-label classification problem aimed at automating protein enrichment
analysis to eliminate the current reliance on labor-intensive wet-lab methods.
Currently, popular methods primarily combine protein-related information and
Gene Ontology (GO) terms to generate final functional predictions. For example,
protein sequences, structural information, and protein-protein interaction
networks are integrated as prior knowledge to fuse with GO term embeddings and
generate the ultimate prediction results. However, these methods are limited by
the difficulty in obtaining structural information or network topology
information, as well as the accuracy of such data. Therefore, more and more
methods that only use protein sequences for protein function prediction have
been proposed, which is a more reliable and computationally cheaper approach.
However, the existing methods fail to fully extract feature information from
protein sequences or label data because they do not adequately consider the
intrinsic characteristics of the data itself. Therefore, we propose a
sequence-based hierarchical prediction method, DeepGATGO, which processes
protein sequences and GO term labels hierarchically, and utilizes graph
attention networks (GATs) and contrastive learning for protein function
prediction. Specifically, we compute embeddings of the sequence and label data
using pre-trained models to reduce computational costs and improve the
embedding accuracy. Then, we use GATs to dynamically extract the structural
information of non-Euclidean data, and learn general features of the label
dataset with contrastive learning by constructing positive and negative example
samples. Experimental results demonstrate that our proposed model exhibits
better scalability in GO term enrichment analysis on large-scale datasets.Comment: Accepted in BIOKDD'2
New threats to health data privacy
<p>Abstract</p> <p>Background</p> <p>Along with the rapid digitalization of health data (e.g. Electronic Health Records), there is an increasing concern on maintaining data privacy while garnering the benefits, especially when the data are required to be published for secondary use. Most of the current research on protecting health data privacy is centered around data de-identification and data anonymization, which removes the identifiable information from the published health data to prevent an adversary from reasoning about the privacy of the patients. However, published health data is not the only source that the adversaries can count on: with a large amount of information that people voluntarily share on the Web, sophisticated attacks that join disparate information pieces from multiple sources against health data privacy become practical. Limited efforts have been devoted to studying these attacks yet.</p> <p>Results</p> <p>We study how patient privacy could be compromised with the help of todayās information technologies. In particular, we show that private healthcare information could be collected by aggregating and associating disparate pieces of information from multiple online data sources including online social networks, public records and search engine results. We demonstrate a real-world case study to show user identity and privacy are highly vulnerable to the attribution, inference and aggregation attacks. We also show that people are highly identifiable to adversaries even with inaccurate information pieces about the target, with real data analysis.</p> <p>Conclusion</p> <p>We claim that too much information has been made available electronic and available online that people are very vulnerable without effective privacy protection.</p
Context-aware visual exploration of molecular databases
Facilitating the visual exploration of scientific data has
received increasing attention in the past decade or so. Especially
in life science related application areas the amount
of available data has grown at a breath taking pace. In this
paper we describe an approach that allows for visual inspection
of large collections of molecular compounds. In
contrast to classical visualizations of such spaces we incorporate
a specific focus of analysis, for example the outcome
of a biological experiment such as high throughout
screening results. The presented method uses this experimental
data to select molecular fragments of the underlying
molecules that have interesting properties and uses the
resulting space to generate a two dimensional map based
on a singular value decomposition algorithm and a self organizing
map. Experiments on real datasets show that
the resulting visual landscape groups molecules of similar
chemical properties in densely connected regions
Combining active learning and semi-supervised learning techniques to extract protein interaction sentences
Background: Protein-protein interaction (PPI) extraction has been a focal point of many biomedical research and database curation tools. Both Active Learning and Semi-supervised SVMs have recently been applied to extract PPI automatically. In this paper, we explore combining the AL with the SSL to improve the performance of the PPI task. Methods: We propose a novel PPI extraction technique called PPISpotter by combining Deterministic Annealing-based SSL and an AL technique to extract protein-protein interaction. In addition, we extract a comprehensive set of features from MEDLINE records by Natural Language Processing (NLP) techniques, which further improve the SVM classifiers. In our feature selection technique, syntactic, semantic, and lexical properties of text are incorporated into feature selection that boosts the system performance significantly. Results: By conducting experiments with three different PPI corpuses, we show that PPISpotter is superior to the other techniques incorporated into semi-supervised SVMs such as Random Sampling, Clustering, and Transductive SVMs by precision, recall, and F-measure. Conclusions: Our system is a novel, state-of-the-art technique for efficiently extracting protein-protein interaction pairs.X116sciescopu
High performance subgraph mining in molecular compounds
Structured data represented in the form of graphs arises in
several fields of the science and the growing amount of available data makes distributed graph mining techniques particularly relevant. In this paper, we present a distributed approach to the frequent subgraph mining
problem to discover interesting patterns in molecular compounds. The problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main
aspects of the proposed distributed algorithm, namely a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiver-initiated, load balancing
algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Instituteās HIV-screening dataset, where the approach attains close-to linear speedup in a network
of workstations
- ā¦