Search CORE

222 research outputs found

GCondNet: A Novel Method for Improving Neural Networks on Small High-Dimensional Tabular Data

Author: Jamnik Mateja
Lio Pietro
Margeloiu Andrei
Simidjievski Nikola
Publication venue
Publication date: 29/05/2023
Field of study

Neural network models often struggle with high-dimensional but small sample-size tabular datasets. One reason is that current weight initialisation methods assume independence between weights, which can be problematic when there are insufficient samples to estimate the model's parameters accurately. In such small data scenarios, leveraging additional structures can improve the model's training stability and performance. To address this, we propose GCondNet, a general approach to enhance neural networks by leveraging implicit structures present in tabular data. We create a graph between samples for each data dimension, and utilise Graph Neural Networks (GNNs) for extracting this implicit structure, and for conditioning the parameters of the first layer of an underlying predictor MLP network. By creating many small graphs, GCondNet exploits the data's high-dimensionality, and thus improves the performance of an underlying predictor network. We demonstrate the effectiveness of our method on nine real-world datasets, where GCondNet outperforms 14 standard and state-of-the-art methods. The results show that GCondNet is robust and can be applied to any small sample-size and high-dimensional tabular learning task.Comment: Early version presented at the 17th Machine Learning in Computational Biology (MLCB) meeting, 202

arXiv.org e-Print Archive

Exploiting physico-chemical properties in string kernels

Author: B Peters
B Shen
C Leslie
C Leslie
C Leslie
Christian Widmer
CS Ong
CS Ong
CW Tung
G Rätsch
G Rätsch
G Schweikert
Gunnar Rätsch
H Rangwala
H Saigo
J Weston
L Jacob
M Röttig
M Venkatarajan
N Pfeifer
Nora C Toussaint
Oliver Kohlbacher
R Kuang
RM Clark
S Henikoff
S Kawashima
S Sonnenburg
S Sonnenburg
SJ Schultheiss
V Roth
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background String kernels are commonly used for the classification of biological sequences, nucleotide as well as amino acid sequences. Although string kernels are already very powerful, when it comes to amino acids they have a major short coming. They ignore an important piece of information when comparing amino acids: the physico-chemical properties such as size, hydrophobicity, or charge. This information is very valuable, especially when training data is less abundant. There have been only very few approaches so far that aim at combining these two ideas. Results We propose new string kernels that combine the benefits of physico-chemical descriptors for amino acids with the ones of string kernels. The benefits of the proposed kernels are assessed on two problems: MHC-peptide binding classification using position specific kernels and protein classification based on the substring spectrum of the sequences. Our experiments demonstrate that the incorporation of amino acid properties in string kernels yields improved performances compared to standard string kernels and to previously proposed non-substring kernels. Conclusions In summary, the proposed modifications, in particular the combination with the RBF substring kernel, consistently yield improvements without affecting the computational complexity. The proposed kernels therefore appear to be the kernels of choice for any protein sequence-based inference. Availability Data sets, code and additional information are available from <url>http://www.fml.tuebingen.mpg.de/raetsch/suppl/aask</url>. Implementations of the developed kernels are available as part of the Shogun toolbox.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

MPG.PuRe