Search CORE

5,899 research outputs found

Space-efficient Feature Maps for String Alignment Kernels

Author: CC Chang
G Cormode
H Lodhi
H Saigo
M Kanehisa
MC Ferris
RE Fan
S Kim
T Gärtner
T Hofmann
TF Smith
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

String kernels are attractive data analysis tools for analyzing string data. Among them, alignment kernels are known for their high prediction accuracies in string classifications when tested in combination with SVM in various applications. However, alignment kernels have a crucial drawback in that they scale poorly due to their quadratic computation complexity in the number of input strings, which limits large-scale applications in practice. We address this need by presenting the first approximation for string alignment kernels, which we call space-efficient feature maps for edit distance with moves (SFMEDM), by leveraging a metric embedding named edit sensitive parsing (ESP) and feature maps (FMs) of random Fourier features (RFFs) for large-scale string analyses. The original FMs for RFFs consume a huge amount of memory proportional to the dimension d of input vectors and the dimension D of output vectors, which prohibits its large-scale applications. We present novel space-efficient feature maps (SFMs) of RFFs for a space reduction from O(dD) of the original FMs to O(d) of SFMs with a theoretical guarantee with respect to concentration bounds. We experimentally test SFMEDM on its ability to learn SVM for large-scale string classifications with various massive string data, and we demonstrate the superior performance of SFMEDM with respect to prediction accuracy, scalability and computation efficiency.Comment: Full version for ICDM'19 pape

arXiv.org e-Print Archive

Crossref

The IT University of Copenhagen's Repository

Neural networks and support vector machines based bio-activity classification

Author: Salim Naomie
Zeb Shah Jehan
Publication venue
Publication date: 01/07/2006
Field of study

Classification of various compounds into their respective biological activity classes is important in drug discovery applications from an early phase virtual compound filtering and screening point of view. In this work two types of neural networks, multi layer perceptron (MLP) and radial basis functions (RBF), and support vector machines (SVM) were employed for the classification of three types of biologically active enzyme inhibitors. Both of the networks were trained with back propagation learning method with chemical compounds whose active inhibition properties were previously known. A group of topological indices, selected with the help of principle component analysis (PCA) were used as descriptors. The results of all the three classification methods show that the performance of both the neural networks is better than the SVM

Universiti Teknologi Malaysia Institutional Repository

Finding kernel function for stock market prediction with support vector regression

Author: Chai Chon Lung
Publication venue
Publication date: 01/04/2006
Field of study

Stock market prediction is one of the fascinating issues of stock market research. Accurate stock prediction becomes the biggest challenge in investment industry because the distribution of stock data is changing over the time. Time series forcasting, Neural Network (NN) and Support Vector Machine (SVM) are once commonly used for prediction on stock price. In this study, the data mining operation called time series forecasting is implemented. The large amount of stock data collected from Kuala Lumpur Stock Exchange is used for the experiment to test the validity of SVMs regression. SVM is a new machine learning technique with principle of structural minimization risk, which have greater generalization ability and proved success in time series prediction. Two kernel functions namely Radial Basis Function and polynomial are compared for finding the accurate prediction values. Besides that, backpropagation neural network are also used to compare the predictions performance. Several experiments are conducted and some analyses on the experimental results are done. The results show that SVM with polynomial kernels provide a promising alternative tool in KLSE stock market prediction

Universiti Teknologi Malaysia Institutional Repository

Link Mining for Kernel-based Compound-Protein Interaction Predictions Using a Chemogenomics Approach

Author: A Lavecchia
ACA Nascimento
C-C Chang
D Rogers
H Ding
L Jacob
M Bouchard
M Gonen
M Hattori
MN Drwal
S Daminelli
T Laarhoven van
T Laarhoven van
TF Smith
Y Liu
Y Yamanishi
Publication venue
Publication date: 29/06/2017
Field of study

Virtual screening (VS) is widely used during computational drug discovery to reduce costs. Chemogenomics-based virtual screening (CGBVS) can be used to predict new compound-protein interactions (CPIs) from known CPI network data using several methods, including machine learning and data mining. Although CGBVS facilitates highly efficient and accurate CPI prediction, it has poor performance for prediction of new compounds for which CPIs are unknown. The pairwise kernel method (PKM) is a state-of-the-art CGBVS method and shows high accuracy for prediction of new compounds. In this study, on the basis of link mining, we improved the PKM by combining link indicator kernel (LIK) and chemical similarity and evaluated the accuracy of these methods. The proposed method obtained an average area under the precision-recall curve (AUPR) value of 0.562, which was higher than that achieved by the conventional Gaussian interaction profile (GIP) method (0.425), and the calculation time was only increased by a few percent

arXiv.org e-Print Archive

Crossref

A Comparison of Multi-instance Learning Algorithms

Author: Dong Lin
Publication venue: The University of Waikato
Publication date: 01/01/2006
Field of study

Motivated by various challenging real-world applications, such as drug activity prediction and image retrieval, multi-instance (MI) learning has attracted considerable interest in recent years. Compared with standard supervised learning, the MI learning task is more difficult as the label information of each training example is incomplete. Many MI algorithms have been proposed. Some of them are specifically designed for MI problems whereas others have been upgraded or adapted from standard single-instance learning algorithms. Most algorithms have been evaluated on only one or two benchmark datasets, and there is a lack of systematic comparisons of MI learning algorithms. This thesis presents a comprehensive study of MI learning algorithms that aims to compare their performance and find a suitable way to properly address different MI problems. First, it briefly reviews the history of research on MI learning. Then it discusses five general classes of MI approaches that cover a total of 16 MI algorithms. After that, it presents empirical results for these algorithms that were obtained from 15 datasets which involve five different real-world application domains. Finally, some conclusions are drawn from these results: (1) applying suitable standard single-instance learners to MI problems can often generate the best result on the datasets that were tested, (2) algorithms exploiting the standard asymmetric MI assumption do not show significant advantages over approaches using the so-called collective assumption, and (3) different MI approaches are suitable for different application domains, and no MI algorithm works best on all MI problems

Research Commons@Waikato

The pharmacophore kernel for virtual screening with support vector machines

Author: Mahé Pierre
Ralaivola Liva
Stoven Véronique
Vert Jean-Philippe
Publication venue
Publication date: 03/03/2006
Field of study

We introduce a family of positive definite kernels specifically optimized for the manipulation of 3D structures of molecules with kernel methods. The kernels are based on the comparison of the three-points pharmacophores present in the 3D structures of molecul es, a set of molecular features known to be particularly relevant for virtual screening applications. We present a computationally demanding exact implementation of these kernels, as well as fast approximations related to the classical fingerprint-based approa ches. Experimental results suggest that this new approach outperforms state-of-the-art algorithms based on the 2D structure of mol ecules for the detection of inhibitors of several drug targets

arXiv.org e-Print Archive

HAL AMU

HAL-MINES ParisTech