464 research outputs found

    k-Nearest Neighbour Classifiers: 2nd Edition (with Python examples)

    Get PDF
    Perhaps the most straightforward classifier in the arsenal or machine learning techniques is the Nearest Neighbour Classifier -- classification is achieved by identifying the nearest neighbours to a query example and using those neighbours to determine the class of the query. This approach to classification is of particular importance because issues of poor run-time performance is not such a problem these days with the computational power that is available. This paper presents an overview of techniques for Nearest Neighbour classification focusing on; mechanisms for assessing similarity (distance), computational issues in identifying nearest neighbours and mechanisms for reducing the dimension of the data. This paper is the second edition of a paper previously published as a technical report. Sections on similarity measures for time-series, retrieval speed-up and intrinsic dimensionality have been added. An Appendix is included providing access to Python code for the key methods.Comment: 22 pages, 15 figures: An updated edition of an older tutorial on kN

    Randomized pick-freeze for sparse Sobol indices estimation in high dimension

    Get PDF
    This article investigates a new procedure to estimate the influence of each variable of a given function defined on a high-dimensional space. More precisely, we are concerned with describing a function of a large number pp of parameters that depends only on a small number ss of them. Our proposed method is an unconstrained 1\ell_{1}-minimization based on the Sobol's method. We prove that, with only O(slogp)\mathcal O(s\log p) evaluations of ff, one can find which are the relevant parameters

    Flood event impact on pesticide transfer in a small agricultural catchment (Moutousse at Aurade, south west France)

    Get PDF
    In this paper, pesticide transfer dynamic is studied during two flood events in a small experimental catchment close to Toulouse (south west France). Thirteen pesticide molecules (herbicides, fungicides) have been analysed by multi-residue technique on filtered and unfiltered waters. The results show very high pesticide concentrations in the different fractions compared to low flow periods and to the data collected by the French institutional networks in charge of the pesticide river water pollution survey. Several molecules present concentration higher than 0.1 mgL-1 and even higher than 1 mgL-1 in the unfiltered waters. In the suspended matters the concentrations vary respectively between 0.1 and 30 mg g-1 according to the molecules and can represent 40 to 90% of the total concentration for low soluble molecules. All the molecule concentrations and fluxes increase during the flood flows and have positive relationships with the stream discharge, but hysteresis between rising and falling periods can be observed for some molecules. Pesticide concentrations in unfiltered waters and partitioning between dissolved and particulate fractions (Kd¼[diss]/[part]) are controlled by dissolved organic carbon and total suspended matter. A good negative relationship can be established between logKd and logKow for 6 molecules

    Application Of Digital Signal Analysis, Mass Data Acquisition and Processing Techniques, and Automated Experiment Protocols to the Study of Cardiac Cell Membrane Electrophysiology, with Mathematical Modeling

    Get PDF
    Traditional methods of collecting, analyzing and storing data from cardiac cell membrane electrophysiology experiments have become increasingly cumbersome and unwieldy as experimental protocols have become more sophisticated and complex. A global approach to collecting, analyzing, refining and storing electrophysiologic data, as well as a new approach to mathematical modeling of cell membrane single ion channel kinetics, was developed. This utilizes a comprehensive microcomputer based system of software with specialized analog and digital electronics for data acquisition, analysis and archiving. Unique discrete signal processing techniques for characterizing the electronic recording system, including specialized hardware and software adapted for minimizing distortions in biosignal recordings, are discussed in detail

    Conditional Transformation Models

    Full text link
    The ultimate goal of regression analysis is to obtain information about the conditional distribution of a response given a set of explanatory variables. This goal is, however, seldom achieved because most established regression models only estimate the conditional mean as a function of the explanatory variables and assume that higher moments are not affected by the regressors. The underlying reason for such a restriction is the assumption of additivity of signal and noise. We propose to relax this common assumption in the framework of transformation models. The novel class of semiparametric regression models proposed herein allows transformation functions to depend on explanatory variables. These transformation functions are estimated by regularised optimisation of scoring rules for probabilistic forecasts, e.g. the continuous ranked probability score. The corresponding estimated conditional distribution functions are consistent. Conditional transformation models are potentially useful for describing possible heteroscedasticity, comparing spatially varying distributions, identifying extreme events, deriving prediction intervals and selecting variables beyond mean regression effects. An empirical investigation based on a heteroscedastic varying coefficient simulation model demonstrates that semiparametric estimation of conditional distribution functions can be more beneficial than kernel-based non-parametric approaches or parametric generalised additive models for location, scale and shape

    Two-View Geometry Scoring Without Correspondences

    Full text link
    Camera pose estimation for two-view geometry traditionally relies on RANSAC. Normally, a multitude of image correspondences leads to a pool of proposed hypotheses, which are then scored to find a winning model. The inlier count is generally regarded as a reliable indicator of "consensus". We examine this scoring heuristic, and find that it favors disappointing models under certain circumstances. As a remedy, we propose the Fundamental Scoring Network (FSNet), which infers a score for a pair of overlapping images and any proposed fundamental matrix. It does not rely on sparse correspondences, but rather embodies a two-view geometry model through an epipolar attention mechanism that predicts the pose error of the two images. FSNet can be incorporated into traditional RANSAC loops. We evaluate FSNet on fundamental and essential matrix estimation on indoor and outdoor datasets, and establish that FSNet can successfully identify good poses for pairs of images with few or unreliable correspondences. Besides, we show that naively combining FSNet with MAGSAC++ scoring approach achieves state of the art results

    Purposeful Co-Design of OFDM Signals for Ranging and Communications

    Full text link
    This paper analyzes the fundamental trade-offs that occur in the co-design of orthogonal frequency-division multiplexing signals for both ranging (via time-of-arrival estimation) and communications. These trade-offs are quantified through the Shannon capacity bound, probability of outage, and the Ziv-Zakai bound on range estimation variance. Bounds are derived for signals experiencing frequency-selective Rayleigh block fading, accounting for the impact of limited channel knowledge and multi-antenna reception. Uncompensated carrier frequency offset and phase errors are also factored into the capacity bounds. Analysis based on the derived bounds demonstrates how Pareto-optimal design choices can be made to optimize the communication throughput, probability of outage, and ranging variance. Different signal design strategies are then analyzed, showing how Pareto-optimal design choices change depending on the channel
    corecore