Search CORE

155 research outputs found

Data Mining in Bioinformatics (BIOKDD)

Author: CK Reddy
GA Grothaus
George Karypis
H Yang
Jiong Yang
Mohammed J Zaki
W Hwang
Y Zhang
Publication venue: BioMed Central
Publication date: 01/04/2007
Field of study

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

DDI Prediction via Heterogeneous Graph Attention Networks

Author: Akbas Esra
Saifuddin Khaled Mohammed
Tanvir Farhan
Publication venue
Publication date: 12/07/2022
Field of study

Polypharmacy, defined as the use of multiple drugs together, is a standard treatment method, especially for severe and chronic diseases. However, using multiple drugs together may cause interactions between drugs. Drug-drug interaction (DDI) is the activity that occurs when the impact of one drug changes when combined with another. DDIs may obstruct, increase, or decrease the intended effect of either drug or, in the worst-case scenario, create adverse side effects. While it is critical to detect DDIs on time, it is timeconsuming and expensive to identify them in clinical trials due to their short duration and many possible drug pairs to be considered for testing. As a result, computational methods are needed for predicting DDIs. In this paper, we present a novel heterogeneous graph attention model, HAN-DDI to predict drug-drug interactions. We create a heterogeneous network of drugs with different biological entities. Then, we develop a heterogeneous graph attention network to learn DDIs using relations of drugs with other entities. It consists of an attention-based heterogeneous graph node encoder for obtaining drug node representations and a decoder for predicting drug-drug interactions. Further, we utilize comprehensive experiments to evaluate of our model and to compare it with state-of-the-art models. Experimental results show that our proposed method, HAN-DDI, outperforms the baselines significantly and accurately predicts DDIs, even for new drugs.Comment: 10 pages, 3 figures, 8 tables, accepted in BioKD

arXiv.org e-Print Archive

DeepGATGO: A Hierarchical Pretraining-Based Graph-Attention Model for Automatic Protein Function Prediction

Author: Jiang Changkun
Li Jianqiang
Li Zihao
Publication venue
Publication date: 24/07/2023
Field of study

Automatic protein function prediction (AFP) is classified as a large-scale multi-label classification problem aimed at automating protein enrichment analysis to eliminate the current reliance on labor-intensive wet-lab methods. Currently, popular methods primarily combine protein-related information and Gene Ontology (GO) terms to generate final functional predictions. For example, protein sequences, structural information, and protein-protein interaction networks are integrated as prior knowledge to fuse with GO term embeddings and generate the ultimate prediction results. However, these methods are limited by the difficulty in obtaining structural information or network topology information, as well as the accuracy of such data. Therefore, more and more methods that only use protein sequences for protein function prediction have been proposed, which is a more reliable and computationally cheaper approach. However, the existing methods fail to fully extract feature information from protein sequences or label data because they do not adequately consider the intrinsic characteristics of the data itself. Therefore, we propose a sequence-based hierarchical prediction method, DeepGATGO, which processes protein sequences and GO term labels hierarchically, and utilizes graph attention networks (GATs) and contrastive learning for protein function prediction. Specifically, we compute embeddings of the sequence and label data using pre-trained models to reduce computational costs and improve the embedding accuracy. Then, we use GATs to dynamically extract the structural information of non-Euclidean data, and learn general features of the label dataset with contrastive learning by constructing positive and negative example samples. Experimental results demonstrate that our proposed model exhibits better scalability in GO term enrichment analysis on large-scale datasets.Comment: Accepted in BIOKDD'2

arXiv.org e-Print Archive

New threats to health data privacy

Author: Chen Jake Y
Li Fengjun
Liu Peng
Zou Xukai
Publication venue: BioMed Central
Publication date: 01/11/2011
Field of study

Abstract Background Along with the rapid digitalization of health data (e.g. Electronic Health Records), there is an increasing concern on maintaining data privacy while garnering the benefits, especially when the data are required to be published for secondary use. Most of the current research on protecting health data privacy is centered around data de-identification and data anonymization, which removes the identifiable information from the published health data to prevent an adversary from reasoning about the privacy of the patients. However, published health data is not the only source that the adversaries can count on: with a large amount of information that people voluntarily share on the Web, sophisticated attacks that join disparate information pieces from multiple sources against health data privacy become practical. Limited efforts have been devoted to studying these attacks yet. Results We study how patient privacy could be compromised with the help of today’s information technologies. In particular, we show that private healthcare information could be collected by aggregating and associating disparate pieces of information from multiple online data sources including online social networks, public records and search engine results. We demonstrate a real-world case study to show user identity and privacy are highly vulnerable to the attribution, inference and aggregation attacks. We also show that people are highly identifiable to adversaries even with inaccurate information pieces about the target, with real data analysis. Conclusion We claim that too much information has been made available electronic and available online that people are very vulnerable without effective privacy protection.</p

Directory of Open Access Journals

KU ScholarWorks

PubMed Central

Context-aware visual exploration of molecular databases

Author: Berthold Michael
Di Fatta Giuseppe
Fiannaca Antonino
Gaglio Salvatore
Rizzo Riccardo
Urso Alfonso
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

Facilitating the visual exploration of scientific data has received increasing attention in the past decade or so. Especially in life science related application areas the amount of available data has grown at a breath taking pace. In this paper we describe an approach that allows for visual inspection of large collections of molecular compounds. In contrast to classical visualizations of such spaces we incorporate a specific focus of analysis, for example the outcome of a biological experiment such as high throughout screening results. The presented method uses this experimental data to select molecular fragments of the underlying molecules that have interesting properties and uses the resulting space to generate a two dimensional map based on a singular value decomposition algorithm and a self organizing map. Experiments on real datasets show that the resulting visual landscape groups molecules of similar chemical properties in densely connected regions

Central Archive at the University of Reading

CiteSeerX

Archivio istituzionale della ricerca - Università di Palermo

Combining active learning and semi-supervised learning techniques to extract protein interaction sentences

Author: A Yakushiji
AC McCallum
B Cui
C Blaschke
C Friedman
CJC Burges
D Zhou
F Fung
G Erkan
G Schohn
G Tur
Hwanjo Yu
J Lafferty
J Pustejovsky
JM Temkin
KP Bennett
L Smith
M Huang
M Song
MC Jenkins
Min Song
O Chapelle
O Chapelle
O Chapelle
R Bunescu
S Kim
S Pyysalo
S Pyysalo
T Joachims
T Luo
T Mitsumori
T Ono
TK Jenssen
V Sindhwani
Wook-Shin Han
WW Chapman
X Zhu
Y Miyao
Z Yang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Background: Protein-protein interaction (PPI) extraction has been a focal point of many biomedical research and database curation tools. Both Active Learning and Semi-supervised SVMs have recently been applied to extract PPI automatically. In this paper, we explore combining the AL with the SSL to improve the performance of the PPI task. Methods: We propose a novel PPI extraction technique called PPISpotter by combining Deterministic Annealing-based SSL and an AL technique to extract protein-protein interaction. In addition, we extract a comprehensive set of features from MEDLINE records by Natural Language Processing (NLP) techniques, which further improve the SVM classifiers. In our feature selection technique, syntactic, semantic, and lexical properties of text are incorporated into feature selection that boosts the system performance significantly. Results: By conducting experiments with three different PPI corpuses, we show that PPISpotter is superior to the other techniques incorporated into semi-supervised SVMs such as Random Sampling, Clustering, and Transductive SVMs by precision, recall, and F-measure. Conclusions: Our system is a novel, state-of-the-art technique for efficiently extracting protein-protein interaction pairs.X116sciescopu

Crossref

Springer - Publisher Connector

PubMed Central

포항공과대학교

High performance subgraph mining in molecular compounds

Author: M.J. Zaki
O. Weislow
R. Finkel
T. Washio
Y. Chung
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

Structured data represented in the form of graphs arises in several fields of the science and the growing amount of available data makes distributed graph mining techniques particularly relevant. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. The problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiver-initiated, load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening dataset, where the approach attains close-to linear speedup in a network of workstations

KOPS - The Institutional Repository of the University of Konstanz

Central Archive at the University of Reading

Crossref