1,182 research outputs found
Positive Definite Kernels in Machine Learning
This survey is an introduction to positive definite kernels and the set of
methods they have inspired in the machine learning literature, namely kernel
methods. We first discuss some properties of positive definite kernels as well
as reproducing kernel Hibert spaces, the natural extension of the set of
functions associated with a kernel defined
on a space . We discuss at length the construction of kernel
functions that take advantage of well-known statistical models. We provide an
overview of numerous data-analysis methods which take advantage of reproducing
kernel Hilbert spaces and discuss the idea of combining several kernels to
improve the performance on certain tasks. We also provide a short cookbook of
different kernels which are particularly useful for certain data-types such as
images, graphs or speech segments.Comment: draft. corrected a typo in figure
Flexible combination of multiple diagnostic biomarkers to improve diagnostic accuracy
In medical research, it is common to collect information of multiple
continuous biomarkers to improve the accuracy of diagnostic tests. Combining
the measurements of these biomarkers into one single score is a popular
practice to integrate the collected information, where the accuracy of the
resultant diagnostic test is usually improved. To measure the accuracy of a
diagnostic test, the Youden index has been widely used in literature. Various
parametric and nonparametric methods have been proposed to linearly combine
biomarkers so that the corresponding Youden index can be optimized. Yet there
seems to be little justification of enforcing such a linear combination. This
paper proposes a flexible approach that allows both linear and nonlinear
combinations of biomarkers. The proposed approach formulates the problem in a
large margin classification framework, where the combination function is
embedded in a flexible reproducing kernel Hilbert space. Advantages of the
proposed approach are demonstrated in a variety of simulated experiments as
well as a real application to a liver disorder study
Interpretable statistics for complex modelling: quantile and topological learning
As the complexity of our data increased exponentially in the last decades, so has our
need for interpretable features. This thesis revolves around two paradigms to approach
this quest for insights.
In the first part we focus on parametric models, where the problem of interpretability
can be seen as a “parametrization selection”. We introduce a quantile-centric
parametrization and we show the advantages of our proposal in the context of regression,
where it allows to bridge the gap between classical generalized linear (mixed)
models and increasingly popular quantile methods.
The second part of the thesis, concerned with topological learning, tackles the
problem from a non-parametric perspective. As topology can be thought of as a way
of characterizing data in terms of their connectivity structure, it allows to represent
complex and possibly high dimensional through few features, such as the number of
connected components, loops and voids. We illustrate how the emerging branch of
statistics devoted to recovering topological structures in the data, Topological Data
Analysis, can be exploited both for exploratory and inferential purposes with a special
emphasis on kernels that preserve the topological information in the data.
Finally, we show with an application how these two approaches can borrow strength
from one another in the identification and description of brain activity through fMRI
data from the ABIDE project
How temporary is temporary employment in Spain?.
We use the Spanish Labor Force Survey (EPA) for the period 1987-1996 to study trends, characteristics, and labor force transitions of temporary workers. These are workers who hold fixed-term contracts, which the Spanish labor law distinguishes from indefinite contracts. Since the EPA questionnaire allows us to identify permanent from temporary workers, we are able to compare their characteristics. More importantly, we can use matched fIles from the same data source to analyze transitions from temporary to permanent employment. The aim is to test the extent to which temporary workers tend to be trapped in temporary employment relationships. Indeed, we fmd some evidence of this.Permanent and temporary employment; Fixed-term contract; Transition rate;
Classification of red blood cell shapes in flow using outlier tolerant machine learning
The manual evaluation, classification and counting of biological objects
demands for an enormous expenditure of time and subjective human input may be a
source of error. Investigating the shape of red blood cells (RBCs) in
microcapillary Poiseuille flow, we overcome this drawback by introducing a
convolutional neural regression network for an automatic, outlier tolerant
shape classification. From our experiments we expect two stable geometries: the
so-called `slipper' and `croissant' shapes depending on the prevailing flow
conditions and the cell-intrinsic parameters. Whereas croissants mostly occur
at low shear rates, slippers evolve at higher flow velocities. With our method,
we are able to find the transition point between both `phases' of stable shapes
which is of high interest to ensuing theoretical studies and numerical
simulations. Using statistically based thresholds, from our data, we obtain
so-called phase diagrams which are compared to manual evaluations.
Prospectively, our concept allows us to perform objective analyses of
measurements for a variety of flow conditions and to receive comparable
results. Moreover, the proposed procedure enables unbiased studies on the
influence of drugs on flow properties of single RBCs and the resulting
macroscopic change of the flow behavior of whole blood.Comment: 15 pages, published in PLoS Comput Biol, open acces
Performance Boundary Identification for the Evaluation of Automated Vehicles using Gaussian Process Classification
Safety is an essential aspect in the facilitation of automated vehicle
deployment. Current testing practices are not enough, and going beyond them
leads to infeasible testing requirements, such as needing to drive billions of
kilometres on public roads. Automated vehicles are exposed to an indefinite
number of scenarios. Handling of the most challenging scenarios should be
tested, which leads to the question of how such corner cases can be determined.
We propose an approach to identify the performance boundary, where these corner
cases are located, using Gaussian Process Classification. We also demonstrate
the classification on an exemplary traffic jam approach scenario, showing that
it is feasible and would lead to more efficient testing practices.Comment: 6 pages, 5 figures, accepted at 2019 IEEE Intelligent Transportation
Systems Conference - ITSC 2019, Auckland, New Zealand, October 201
- …