Search CORE

574 research outputs found

Learning with Kernels

Author: Schölkopf B.
Smola A.
Publication venue
Publication date: 01/08/2007
Field of study

Classifying LEP Data with Support Vector Algorithms

Author: Mueller K. -R.
Schoelkopf B.
Smola A.
Soldner-Rembold S.
Vannerem P.
Publication venue
Publication date: 01/01/1999
Field of study

We have studied the application of different classification algorithms in the analysis of simulated high energy physics data. Whereas Neural Network algorithms have become a standard tool for data analysis, the performance of other classifiers such as Support Vector Machines has not yet been tested in this environment. We chose two different problems to compare the performance of a Support Vector Machine and a Neural Net trained with back-propagation: tagging events of the type e+e- -> ccbar and the identification of muons produced in multihadronic e+e- annihilation events.Comment: 7 pages, 4 figures, submitted to proceedings of AIHENP99, Crete, April 199

arXiv.org e-Print Archive

CiteSeerX

CERN Document Server

MPG.PuRe

A Kernel Method for the Two-sample Problem

Author: Borgwardt K.
Gretton A.
Rasch M.
Schölkopf B.
Smola A.
Publication venue: Max Planck Institute for Biological Cybernetics
Publication date: 01/12/2007
Field of study

We propose a framework for analyzing and comparing distributions, allowing us to design statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS). We present two tests based on large deviation bounds for the test statistic, while a third is based on the asymptotic distribution of this statistic. The test statistic can be computed in quadratic time, although efficient linear time approximations are available. Several classical metrics on distributions are recovered when the function space used to compute the difference in expectations is allowed to be more general (eg.~a Banach space). We apply our two-sample tests to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where they perform strongly. Excellent performance is also obtained when comparing distributions over graphs, for which these are the first such tests

UCL Discovery

MPG.PuRe

A framework for space-efficient string kernels

Author: A Apostolico
A Apostolico
AJ Smola
AM İleri
B Chor
D Belazzougui
G Reinert
GE Sims
J Herold
J Qi
J Shawe-Taylor
M Crochemore
R Chikhi
S Chairungsee
Publication venue
Publication date: 23/02/2015
Field of study

String kernels are typically used to compare genome-scale sequences whose length makes alignment impractical, yet their computation is based on data structures that are either space-inefficient, or incur large slowdowns. We show that a number of exact string kernels, like the

k

-mer kernel, the substrings kernels, a number of length-weighted kernels, the minimal absent words kernel, and kernels with Markovian corrections, can all be computed in

O(nd)

time and in

o(n)

bits of space in addition to the input, using just a

\mathtt{rangeDistinct}

data structure on the Burrows-Wheeler transform of the input strings, which takes

O(d)

time per element in its output. The same bounds hold for a number of measures of compositional complexity based on multiple value of

k

, like the

k

-mer profile and the

k

-th order empirical entropy, and for calibrating the value of

k

using the data

arXiv.org e-Print Archive

Crossref

Hilbert Space Representations of Probability Distributions

Author: Borgwardt K.
Fukumizu K.
Gretton A.
Rasch M.
Schölkopf B.
Smola A.
Song L.
Teo C.
Publication venue
Publication date: 01/10/2007
Field of study

Many problems in unsupervised learning require the analysis of features of probability distributions. At the most fundamental level, we might wish to determine whether two distributions are the same, based on samples from each - this is known as the two-sample or homogeneity problem. We use kernel methods to address this problem, by mapping probability distributions to elements in a reproducing kernel Hilbert space (RKHS). Given a sufficiently rich RKHS, these representations are unique: thus comparing feature space representations allows us to compare distributions without ambiguity. Applications include testing whether cancer subtypes are distinguishable on the basis of DNA microarray data, and whether low frequency oscillations measured at an electrode in the cortex have a different distribution during a neural spike. A more difficult problem is to discover whether two random variables drawn from a joint distribution are independent. It turns out that any dependence between pairs of random variables can be encoded in a cross-covariance operator between appropriate RKHS representations of the variables, and we may test independence by looking at a norm of the operator. We demonstrate this independence test by establishing dependence between an English text and its French translation, as opposed to French text on the same topic but otherwise unrelated. Finally, we show that this operator norm is itself a difference in feature means

MPG.PuRe

The devices, experimental scaffolds, and biomaterials ontology (DEB): a tool for mapping, annotation, and analysis of biomaterials' data

Author: Blei D. M.
Chang P.
Deepu S.
FRA
Gene Ontology Consortium
Hassanpour S.
Kondylakis H.
Kononova O.
McGuinness D. L.
Noy N.
Rawat S.
Rees R.
Seppälä S.
Smola A.
Tchoua R. B.
Wang X. H.
Publication venue
Publication date: 01/01/2020
Field of study

The size and complexity of the biomaterials literature makes systematic data analysis an excruciating manual task. A practical solution is creating databases and information resources. Implant design and biomaterials research can greatly benefit from an open database for systematic data retrieval. Ontologies are pivotal to knowledge base creation, serving to represent and organize domain knowledge. To name but two examples, GO, the gene ontology, and CheBI, Chemical Entities of Biological Interest ontology and their associated databases are central resources to their respective research communities. The creation of the devices, experimental scaffolds, and biomaterials ontology (DEB), an open resource for organizing information about biomaterials, their design, manufacture, and biological testing, is described. It is developed using text analysis for identifying ontology terms from a biomaterials gold standard corpus, systematically curated to represent the domain's lexicon. Topics covered are validated by members of the biomaterials research community. The ontology may be used for searching terms, performing annotations for machine learning applications, standardized meta-data indexing, and other cross-disciplinary data exploitation. The input of the biomaterials community to this effort to create data-driven open-access research tools is encouraged and welcomed.Preprin

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Uncertainty in context-aware systems: A case study for intelligent environments

Author: A Smola
Amel Ben Yaghlane
B Djoudi
C Perera
E Alpaydin
K Sheikh
P Novais
R Senge
S Bobek
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Data used be context-aware systems is naturally incomplete and not always reflect real situations. The dynamic nature of intelligent environments leads to the need of analysing and handling uncertain information. Users can change their acting patterns within a short space of time. This paper presents a case study for a better understanding of concepts related to context awareness and the problem of dealing with inaccurate data. Through the analysis of identification of elements that results in the construction of unreliable contexts, it is aimed to identify patterns to minimize incompleteness. Thus, it will be possible to deal with flaws caused by undesired execution of applications.Programa Operacional Temático Factores de Competitividade (POCI-01-0145-

Universidade do Minho: RepositoriUM

Crossref

Reproducing Kernels of Generalized Sobolev Spaces via a Green Function Approach with Distributional Operators

Author: A. Berlinet
A. Bouhamidi
A. Bouhamidi
A.J. Smola
B. Schölkopf
D.G. Schweikert
E.M. Stein
G. Wahba
G.E. Fasshauer
Gregory E. Fasshauer
H. Wendland
J. Duchon
J. Kybic
M.D. Buhmann
M.L. Stein
Qi Ye
R. Schaback
R.A. Adams
W.A. Light
W.R. Madych
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/03/2013
Field of study

In this paper we introduce a generalized Sobolev space by defining a semi-inner product formulated in terms of a vector distributional operator

\mathbf{P}

consisting of finitely or countably many distributional operators

P_n

, which are defined on the dual space of the Schwartz space. The types of operators we consider include not only differential operators, but also more general distributional operators such as pseudo-differential operators. We deduce that a certain appropriate full-space Green function

G

with respect to

L:=\mathbf{P}^{\ast T}\mathbf{P}

now becomes a conditionally positive definite function. In order to support this claim we ensure that the distributional adjoint operator

\mathbf{P}^{\ast}

\mathbf{P}

is well-defined in the distributional sense. Under sufficient conditions, the native space (reproducing-kernel Hilbert space) associated with the Green function

G

can be isometrically embedded into or even be isometrically equivalent to a generalized Sobolev space. As an application, we take linear combinations of translates of the Green function with possibly added polynomial terms and construct a multivariate minimum-norm interpolant

s_{f,X}

to data values sampled from an unknown generalized Sobolev function

f

at data sites located in some set

X \subset \mathbb{R}^d

. We provide several examples, such as Mat\'ern kernels or Gaussian kernels, that illustrate how many reproducing-kernel Hilbert spaces of well-known reproducing kernels are isometrically equivalent to a generalized Sobolev space. These examples further illustrate how we can rescale the Sobolev spaces by the vector distributional operator

\mathbf{P}

. Introducing the notion of scale as part of the definition of a generalized Sobolev space may help us to choose the "best" kernel function for kernel-based approximation methods.Comment: Update version of the publish at Num. Math. closed to Qi Ye's Ph.D. thesis (\url{http://mypages.iit.edu/~qye3/PhdThesis-2012-AMS-QiYe-IIT.pdf}

arXiv.org e-Print Archive

Crossref

An incremental dual nu-support vector regression algorithm

Author: AJ Smola
B Gu
B Gu
B Gu
B Gu
B Scholkopf
C-C Chang
D Meyer
DH Hong
G Chen
G Huang
J Ma
N Takahashi
OA Omitaomu
OA Omitaomu
R Collobert
SK Shevade
X Peng
X Peng
X Yang
X Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

© 2018, Springer International Publishing AG, part of Springer Nature. Support vector regression (SVR) has been a hot research topic for several years as it is an effective regression learning algorithm. Early studies on SVR mostly focus on solving large-scale problems. Nowadays, an increasing number of researchers are focusing on incremental SVR algorithms. However, these incremental SVR algorithms cannot handle uncertain data, which are very common in real life because the data in the training example must be precise. Therefore, to handle the incremental regression problem with uncertain data, an incremental dual nu-support vector regression algorithm (dual-v-SVR) is proposed. In the algorithm, a dual-v-SVR formulation is designed to handle the uncertain data at first, then we design two special adjustments to enable the dual-v-SVR model to learn incrementally: incremental adjustment and decremental adjustment. Finally, the experiment results demonstrate that the incremental dual-v-SVR algorithm is an efficient incremental algorithm which is not only capable of solving the incremental regression problem with uncertain data, it is also faster than batch or other incremental SVR algorithms

Crossref

OPUS - University of Technology Sydney