234 research outputs found
A Survey on Graph Kernels
Graph kernels have become an established and widely-used technique for
solving classification tasks on graphs. This survey gives a comprehensive
overview of techniques for kernel-based graph classification developed in the
past 15 years. We describe and categorize graph kernels based on properties
inherent to their design, such as the nature of their extracted graph features,
their method of computation and their applicability to problems in practice. In
an extensive experimental evaluation, we study the classification accuracy of a
large suite of graph kernels on established benchmarks as well as new datasets.
We compare the performance of popular kernels with several baseline methods and
study the effect of applying a Gaussian RBF kernel to the metric induced by a
graph kernel. In doing so, we find that simple baselines become competitive
after this transformation on some datasets. Moreover, we study the extent to
which existing graph kernels agree in their predictions (and prediction errors)
and obtain a data-driven categorization of kernels as result. Finally, based on
our experimental results, we derive a practitioner's guide to kernel-based
graph classification
A novel kernel based approach to arbitrary length symbolic data with application to type 2 diabetes risk
Predictive modeling of clinical data is fraught with challenges arising from the manner in which events are recorded. Patients typically fall ill at irregular intervals and experience dissimilar intervention trajectories. This results in irregularly sampled and uneven length data which poses a problem for standard multivariate tools. The alternative of feature extraction into equal-length vectors via methods like Bag-of-Words (BoW) potentially discards useful information. We propose an approach based on a kernel framework in which data is maintained in its native form: discrete sequences of symbols. Kernel functions derived from the edit distance between pairs of sequences may then be utilized in conjunction with support vector machines to classify the data. Our method is evaluated in the context of the prediction task of determining patients likely to develop type 2 diabetes following an earlier episode of elevated blood pressure of 130/80 mmHg. Kernels combined via multi kernel learning achieved an F1-score of 0.96, outperforming classification with SVM 0.63, logistic regression 0.63, Long Short Term Memory 0.61 and Multi-Layer Perceptron 0.54 applied to a BoW representation of the data. We achieved an F1-score of 0.97 on MKL on external dataset. The proposed approach is consequently able to overcome limitations associated with feature-based classification in the context of clinical data
Local kernel canonical correlation analysis with application to virtual drug screening
Drug discovery is the process of identifying compounds which have potentially meaningful biological activity. A major challenge that arises is that the number of compounds to search over can be quite large, sometimes numbering in the millions, making experimental testing intractable. For this reason computational methods are employed to filter out those compounds which do not exhibit strong biological activity. This filtering step, also called virtual screening reduces the search space, allowing for the remaining compounds to be experimentally tested
Large-scale classification based on support vector machine
Esta tese propón o fast support vector classifier, unha versión eficiente da
máquina de vectores de soporte (SVM) con cerne gausiano para problemas de clasificación grandes. Este
clasificador acada un acerto cercano aos mellores métodos dispoñíbeis, sendo moito máis rápido que aqueles en
conxuntos de ata 31 millóns de datos, 30.000 entradas e 131 clases. Tamén axusta os requerimentos de memoria,
permitindo a súa execución en datos de tamano case arbitrariamente grande. Esta tese tamén propón o algoritmo
ideal kernel tuning, un método de sintonización eficiente da anchura do cerne gausiano para a SVM, método que é
o máis rápido comparado con outras 5 alternativas da literatura, cun acerto moi perto do mellor dispoñíbel
actualmente e cun reducido consumo de memoria
Learning to Predict Combinatorial Structures
The major challenge in designing a discriminative learning algorithm for
predicting structured data is to address the computational issues arising from
the exponential size of the output space. Existing algorithms make different
assumptions to ensure efficient, polynomial time estimation of model
parameters. For several combinatorial structures, including cycles, partially
ordered sets, permutations and other graph classes, these assumptions do not
hold. In this thesis, we address the problem of designing learning algorithms
for predicting combinatorial structures by introducing two new assumptions: (i)
The first assumption is that a particular counting problem can be solved
efficiently. The consequence is a generalisation of the classical ridge
regression for structured prediction. (ii) The second assumption is that a
particular sampling problem can be solved efficiently. The consequence is a new
technique for designing and analysing probabilistic structured prediction
models. These results can be applied to solve several complex learning problems
including but not limited to multi-label classification, multi-category
hierarchical classification, and label ranking.Comment: PhD thesis, Department of Computer Science, University of Bonn
(submitted, December 2009
The analysis and advanced extensions of canonical correlation analysis
Drug discovery is the process of identifying compounds which have potentially meaningful biological activity. A problem that arises is that the number of compounds to search over can be quite large, sometimes numbering in the millions, making experimental testing intractable. For this reason computational methods are employed to filter out those compounds which do not exhibit strong biological activity. This filtering step, also called virtual screening reduces the search space, allowing for the remaining compounds to be experimentally tested. In this dissertation I will provide an approach to the problem of virtual screening based on Canonical Correlation Analysis (CCA) and several extensions which use kernel and spectral learning ideas. Specifically these methods will be applied to the protein ligand matching problem. Additionally, theoretical results analyzing the behavior of CCA in the High Dimension Low Sample Size (HDLSS) setting will be provided
Unveiling the frontiers of deep learning: innovations shaping diverse domains
Deep learning (DL) enables the development of computer models that are
capable of learning, visualizing, optimizing, refining, and predicting data. In
recent years, DL has been applied in a range of fields, including audio-visual
data processing, agriculture, transportation prediction, natural language,
biomedicine, disaster management, bioinformatics, drug design, genomics, face
recognition, and ecology. To explore the current state of deep learning, it is
necessary to investigate the latest developments and applications of deep
learning in these disciplines. However, the literature is lacking in exploring
the applications of deep learning in all potential sectors. This paper thus
extensively investigates the potential applications of deep learning across all
major fields of study as well as the associated benefits and challenges. As
evidenced in the literature, DL exhibits accuracy in prediction and analysis,
makes it a powerful computational tool, and has the ability to articulate
itself and optimize, making it effective in processing data with no prior
training. Given its independence from training data, deep learning necessitates
massive amounts of data for effective analysis and processing, much like data
volume. To handle the challenge of compiling huge amounts of medical,
scientific, healthcare, and environmental data for use in deep learning, gated
architectures like LSTMs and GRUs can be utilized. For multimodal learning,
shared neurons in the neural network for all activities and specialized neurons
for particular tasks are necessary.Comment: 64 pages, 3 figures, 3 table
- …