1,182 research outputs found

    Positive Definite Kernels in Machine Learning

    Full text link
    This survey is an introduction to positive definite kernels and the set of methods they have inspired in the machine learning literature, namely kernel methods. We first discuss some properties of positive definite kernels as well as reproducing kernel Hibert spaces, the natural extension of the set of functions {k(x,),xX}\{k(x,\cdot),x\in\mathcal{X}\} associated with a kernel kk defined on a space X\mathcal{X}. We discuss at length the construction of kernel functions that take advantage of well-known statistical models. We provide an overview of numerous data-analysis methods which take advantage of reproducing kernel Hilbert spaces and discuss the idea of combining several kernels to improve the performance on certain tasks. We also provide a short cookbook of different kernels which are particularly useful for certain data-types such as images, graphs or speech segments.Comment: draft. corrected a typo in figure

    Flexible combination of multiple diagnostic biomarkers to improve diagnostic accuracy

    Full text link
    In medical research, it is common to collect information of multiple continuous biomarkers to improve the accuracy of diagnostic tests. Combining the measurements of these biomarkers into one single score is a popular practice to integrate the collected information, where the accuracy of the resultant diagnostic test is usually improved. To measure the accuracy of a diagnostic test, the Youden index has been widely used in literature. Various parametric and nonparametric methods have been proposed to linearly combine biomarkers so that the corresponding Youden index can be optimized. Yet there seems to be little justification of enforcing such a linear combination. This paper proposes a flexible approach that allows both linear and nonlinear combinations of biomarkers. The proposed approach formulates the problem in a large margin classification framework, where the combination function is embedded in a flexible reproducing kernel Hilbert space. Advantages of the proposed approach are demonstrated in a variety of simulated experiments as well as a real application to a liver disorder study

    Interpretable statistics for complex modelling: quantile and topological learning

    Get PDF
    As the complexity of our data increased exponentially in the last decades, so has our need for interpretable features. This thesis revolves around two paradigms to approach this quest for insights. In the first part we focus on parametric models, where the problem of interpretability can be seen as a “parametrization selection”. We introduce a quantile-centric parametrization and we show the advantages of our proposal in the context of regression, where it allows to bridge the gap between classical generalized linear (mixed) models and increasingly popular quantile methods. The second part of the thesis, concerned with topological learning, tackles the problem from a non-parametric perspective. As topology can be thought of as a way of characterizing data in terms of their connectivity structure, it allows to represent complex and possibly high dimensional through few features, such as the number of connected components, loops and voids. We illustrate how the emerging branch of statistics devoted to recovering topological structures in the data, Topological Data Analysis, can be exploited both for exploratory and inferential purposes with a special emphasis on kernels that preserve the topological information in the data. Finally, we show with an application how these two approaches can borrow strength from one another in the identification and description of brain activity through fMRI data from the ABIDE project

    How temporary is temporary employment in Spain?.

    Get PDF
    We use the Spanish Labor Force Survey (EPA) for the period 1987-1996 to study trends, characteristics, and labor force transitions of temporary workers. These are workers who hold fixed-term contracts, which the Spanish labor law distinguishes from indefinite contracts. Since the EPA questionnaire allows us to identify permanent from temporary workers, we are able to compare their characteristics. More importantly, we can use matched fIles from the same data source to analyze transitions from temporary to permanent employment. The aim is to test the extent to which temporary workers tend to be trapped in temporary employment relationships. Indeed, we fmd some evidence of this.Permanent and temporary employment; Fixed-term contract; Transition rate;

    Classification of red blood cell shapes in flow using outlier tolerant machine learning

    Get PDF
    The manual evaluation, classification and counting of biological objects demands for an enormous expenditure of time and subjective human input may be a source of error. Investigating the shape of red blood cells (RBCs) in microcapillary Poiseuille flow, we overcome this drawback by introducing a convolutional neural regression network for an automatic, outlier tolerant shape classification. From our experiments we expect two stable geometries: the so-called `slipper' and `croissant' shapes depending on the prevailing flow conditions and the cell-intrinsic parameters. Whereas croissants mostly occur at low shear rates, slippers evolve at higher flow velocities. With our method, we are able to find the transition point between both `phases' of stable shapes which is of high interest to ensuing theoretical studies and numerical simulations. Using statistically based thresholds, from our data, we obtain so-called phase diagrams which are compared to manual evaluations. Prospectively, our concept allows us to perform objective analyses of measurements for a variety of flow conditions and to receive comparable results. Moreover, the proposed procedure enables unbiased studies on the influence of drugs on flow properties of single RBCs and the resulting macroscopic change of the flow behavior of whole blood.Comment: 15 pages, published in PLoS Comput Biol, open acces

    Performance Boundary Identification for the Evaluation of Automated Vehicles using Gaussian Process Classification

    Get PDF
    Safety is an essential aspect in the facilitation of automated vehicle deployment. Current testing practices are not enough, and going beyond them leads to infeasible testing requirements, such as needing to drive billions of kilometres on public roads. Automated vehicles are exposed to an indefinite number of scenarios. Handling of the most challenging scenarios should be tested, which leads to the question of how such corner cases can be determined. We propose an approach to identify the performance boundary, where these corner cases are located, using Gaussian Process Classification. We also demonstrate the classification on an exemplary traffic jam approach scenario, showing that it is feasible and would lead to more efficient testing practices.Comment: 6 pages, 5 figures, accepted at 2019 IEEE Intelligent Transportation Systems Conference - ITSC 2019, Auckland, New Zealand, October 201
    corecore