Search CORE

1,203 research outputs found

Unifying Bioinformatics and Chemoinformatics for Drug Design

Author: J.B. Brown
Yasushi Okuno
Publication venue: 'IntechOpen'
Publication date: 12/09/2011
Field of study

IntechOpen

Crossref

DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences

Author: Keum Jongsoo
Lee Ingoo
Nam Hojung
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 05/11/2018
Field of study

Identification of drug-target interactions (DTIs) plays a key role in drug discovery. The high cost and labor-intensive nature of in vitro and in vivo experiments have highlighted the importance of in silico-based DTI prediction approaches. In several computational models, conventional protein descriptors are shown to be not informative enough to predict accurate DTIs. Thus, in this study, we employ a convolutional neural network (CNN) on raw protein sequences to capture local residue patterns participating in DTIs. With CNN on protein sequences, our model performs better than previous protein descriptor-based models. In addition, our model performs better than the previous deep learning model for massive prediction of DTIs. By examining the pooled convolution results, we found that our model can detect binding sites of proteins for DTIs. In conclusion, our prediction model for detecting local residue patterns of target proteins successfully enriches the protein features of a raw protein sequence, yielding better prediction results than previous approaches.Comment: 26 pages, 7 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

A Survey on Graph Kernels

Author: Johansson Fredrik D.
Kriege Nils M.
Morris Christopher
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Graph kernels have become an established and widely-used technique for solving classification tasks on graphs. This survey gives a comprehensive overview of techniques for kernel-based graph classification developed in the past 15 years. We describe and categorize graph kernels based on properties inherent to their design, such as the nature of their extracted graph features, their method of computation and their applicability to problems in practice. In an extensive experimental evaluation, we study the classification accuracy of a large suite of graph kernels on established benchmarks as well as new datasets. We compare the performance of popular kernels with several baseline methods and study the effect of applying a Gaussian RBF kernel to the metric induced by a graph kernel. In doing so, we find that simple baselines become competitive after this transformation on some datasets. Moreover, we study the extent to which existing graph kernels agree in their predictions (and prediction errors) and obtain a data-driven categorization of kernels as result. Finally, based on our experimental results, we derive a practitioner's guide to kernel-based graph classification

arXiv.org e-Print Archive

DSpace@MIT

Chalmers Research

An Evolutionary Variable Neighborhood Search for Selecting Combinational Gene Signatures in Predicting Chemo-Response of Osteosarcoma

Author: Aydin M.
Chan Kit Yan
Lau C.
Zhu H.
Publication venue: Institute of Scientific Computing and Information
Publication date: 01/01/2010
Field of study

In genomic studies of cancers, identification of genetic biomarkers from analyzing microarray chip that interrogate thousands of genes is important for diagnosis and therapeutics. However, the commonly used statistical significance analysis can only provide information of each single gene, thus neglecting the intrinsic interactions among genes. Therefore, methods aiming at combinational gene signatures are highly valuable. Supervised classification is an effective way to assess the function of a gene combination in differentiating various groups of samples. In this paper, an evolutionary variable neighborhood search (EVNS) that integrated the approaches of evolutionary algorithm and variable neighborhood search (VNS) is introduced.It consists of a population of solutions that evolution is performed by a variable neighborhood search operator, instead of the more usual reproduction operators, crossover and mutation used in evolutionary algorithms. It is an efficient search algorithm especially suitable for tremendous solution space. The proposed EVNS can simultaneously optimize the feature subset and the classifier through a common solution coding mechanism. This method was applied in searching the combinational gene signatures for predicting histologic response of chemotherapy on osteosarcoma patients, which is the most common malignant bone tumor in children. Cross-validation results show that EVNS outperforms the other existing approaches in classifying initial biopsy samples

espace@Curtin

Mean-Field Theory of Meta-Learning

Author: Byvatov E
Dariusz Plewczynski
Dolezal J
Hotz C S
Plewczynski D
Plewczynski D
Publication venue: 'IOP Publishing'
Publication date: 09/10/2009
Field of study

We discuss here the mean-field theory for a cellular automata model of meta-learning. The meta-learning is the process of combining outcomes of individual learning procedures in order to determine the final decision with higher accuracy than any single learning method. Our method is constructed from an ensemble of interacting, learning agents, that acquire and process incoming information using various types, or different versions of machine learning algorithms. The abstract learning space, where all agents are located, is constructed here using a fully connected model that couples all agents with random strength values. The cellular automata network simulates the higher level integration of information acquired from the independent learning trials. The final classification of incoming input data is therefore defined as the stationary state of the meta-learning system using simple majority rule, yet the minority clusters that share opposite classification outcome can be observed in the system. Therefore, the probability of selecting proper class for a given input data, can be estimated even without the prior knowledge of its affiliation. The fuzzy logic can be easily introduced into the system, even if learning agents are build from simple binary classification machine learning algorithms by calculating the percentage of agreeing agents.Comment: 23 page

arXiv.org e-Print Archive

Crossref

Correcting for selection bias via cross-validation in the classification of microarray data

Author: Chevelu J.
McLachlan G. J.
Zhu J.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2008
Field of study

There is increasing interest in the use of diagnostic rules based on microarray data. These rules are formed by considering the expression levels of thousands of genes in tissue samples taken on patients of known classification with respect to a number of classes, representing, say, disease status or treatment strategy. As the final versions of these rules are usually based on a small subset of the available genes, there is a selection bias that has to be corrected for in the estimation of the associated error rates. We consider the problem using cross-validation. In particular, we present explicit formulae that are useful in explaining the layers of validation that have to be performed in order to avoid improperly cross-validated estimates.Comment: Published in at http://dx.doi.org/10.1214/193940307000000284 the IMS Collections (http://www.imstat.org/publications/imscollections.htm) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

University of Queensland eSpace

Multi-class cancer classification by total principal component regression (TPCR) using microarray gene expression data

Author: Shi Leming
Tan Yongxi
Tong Weida
Wang Charles
Publication venue: Oxford University Press
Publication date: 07/01/2005
Field of study

DNA microarray technology provides a promising approach to the diagnosis and prognosis of tumors on a genome-wide scale by monitoring the expression levels of thousands of genes simultaneously. One problem arising from the use of microarray data is the difficulty to analyze the high-dimensional gene expression data, typically with thousands of variables (genes) and much fewer observations (samples), in which severe collinearity is often observed. This makes it difficult to apply directly the classical statistical methods to investigate microarray data. In this paper, total principal component regression (TPCR) was proposed to classify human tumors by extracting the latent variable structure underlying microarray data from the augmented subspace of both independent variables and dependent variables. One of the salient features of our method is that it takes into account not only the latent variable structure but also the errors in the microarray gene expression profiles (independent variables). The prediction performance of TPCR was evaluated by both leave-one-out and leave-half-out cross-validation using four well-known microarray datasets. The stabilities and reliabilities of the classification models were further assessed by re-randomization and permutation studies. A fast kernel algorithm was applied to decrease the computation time dramatically. (MATLAB source code is available upon request.

Crossref

PubMed Central