464 research outputs found
k-Nearest Neighbour Classifiers: 2nd Edition (with Python examples)
Perhaps the most straightforward classifier in the arsenal or machine
learning techniques is the Nearest Neighbour Classifier -- classification is
achieved by identifying the nearest neighbours to a query example and using
those neighbours to determine the class of the query. This approach to
classification is of particular importance because issues of poor run-time
performance is not such a problem these days with the computational power that
is available. This paper presents an overview of techniques for Nearest
Neighbour classification focusing on; mechanisms for assessing similarity
(distance), computational issues in identifying nearest neighbours and
mechanisms for reducing the dimension of the data.
This paper is the second edition of a paper previously published as a
technical report. Sections on similarity measures for time-series, retrieval
speed-up and intrinsic dimensionality have been added. An Appendix is included
providing access to Python code for the key methods.Comment: 22 pages, 15 figures: An updated edition of an older tutorial on kN
Randomized pick-freeze for sparse Sobol indices estimation in high dimension
This article investigates a new procedure to estimate the influence of each
variable of a given function defined on a high-dimensional space. More
precisely, we are concerned with describing a function of a large number of
parameters that depends only on a small number of them. Our proposed method
is an unconstrained -minimization based on the Sobol's method. We
prove that, with only evaluations of , one can find
which are the relevant parameters
Flood event impact on pesticide transfer in a small agricultural catchment (Moutousse at Aurade, south west France)
In this paper, pesticide transfer dynamic is studied during two flood events in a small experimental catchment close to Toulouse (south west France). Thirteen pesticide molecules (herbicides, fungicides) have been analysed by multi-residue
technique on filtered and unfiltered waters. The results show very high pesticide concentrations in the different fractions compared to low flow periods and to the data collected by the French institutional networks in charge of the pesticide river water pollution survey. Several molecules present concentration higher than 0.1 mgL-1 and even higher than 1 mgL-1 in the unfiltered waters. In the suspended matters the concentrations vary respectively between 0.1 and 30 mg g-1 according to the molecules and can represent 40 to 90% of the total concentration for low soluble molecules. All the molecule concentrations and fluxes increase during the flood flows and have positive relationships with the stream discharge, but hysteresis between rising and falling periods can be observed for some molecules. Pesticide concentrations in unfiltered waters and partitioning between dissolved and particulate fractions (Kd¼[diss]/[part]) are controlled by dissolved organic carbon and total suspended matter. A good negative relationship can be established between logKd and logKow for 6 molecules
Application Of Digital Signal Analysis, Mass Data Acquisition and Processing Techniques, and Automated Experiment Protocols to the Study of Cardiac Cell Membrane Electrophysiology, with Mathematical Modeling
Traditional methods of collecting, analyzing and storing data from cardiac cell membrane electrophysiology experiments have become increasingly cumbersome and unwieldy as experimental protocols have become more sophisticated and complex. A global approach to collecting, analyzing, refining and storing electrophysiologic data, as well as a new approach to mathematical modeling of cell membrane single ion channel kinetics, was developed. This utilizes a comprehensive microcomputer based system of software with specialized analog and digital electronics for data acquisition, analysis and archiving. Unique discrete signal processing techniques for characterizing the electronic recording system, including specialized hardware and software adapted for minimizing distortions in biosignal recordings, are discussed in detail
Conditional Transformation Models
The ultimate goal of regression analysis is to obtain information about the
conditional distribution of a response given a set of explanatory variables.
This goal is, however, seldom achieved because most established regression
models only estimate the conditional mean as a function of the explanatory
variables and assume that higher moments are not affected by the regressors.
The underlying reason for such a restriction is the assumption of additivity of
signal and noise. We propose to relax this common assumption in the framework
of transformation models. The novel class of semiparametric regression models
proposed herein allows transformation functions to depend on explanatory
variables. These transformation functions are estimated by regularised
optimisation of scoring rules for probabilistic forecasts, e.g. the continuous
ranked probability score. The corresponding estimated conditional distribution
functions are consistent. Conditional transformation models are potentially
useful for describing possible heteroscedasticity, comparing spatially varying
distributions, identifying extreme events, deriving prediction intervals and
selecting variables beyond mean regression effects. An empirical investigation
based on a heteroscedastic varying coefficient simulation model demonstrates
that semiparametric estimation of conditional distribution functions can be
more beneficial than kernel-based non-parametric approaches or parametric
generalised additive models for location, scale and shape
Two-View Geometry Scoring Without Correspondences
Camera pose estimation for two-view geometry traditionally relies on RANSAC.
Normally, a multitude of image correspondences leads to a pool of proposed
hypotheses, which are then scored to find a winning model. The inlier count is
generally regarded as a reliable indicator of "consensus". We examine this
scoring heuristic, and find that it favors disappointing models under certain
circumstances. As a remedy, we propose the Fundamental Scoring Network (FSNet),
which infers a score for a pair of overlapping images and any proposed
fundamental matrix. It does not rely on sparse correspondences, but rather
embodies a two-view geometry model through an epipolar attention mechanism that
predicts the pose error of the two images. FSNet can be incorporated into
traditional RANSAC loops. We evaluate FSNet on fundamental and essential matrix
estimation on indoor and outdoor datasets, and establish that FSNet can
successfully identify good poses for pairs of images with few or unreliable
correspondences. Besides, we show that naively combining FSNet with MAGSAC++
scoring approach achieves state of the art results
Purposeful Co-Design of OFDM Signals for Ranging and Communications
This paper analyzes the fundamental trade-offs that occur in the co-design of
orthogonal frequency-division multiplexing signals for both ranging (via
time-of-arrival estimation) and communications. These trade-offs are quantified
through the Shannon capacity bound, probability of outage, and the Ziv-Zakai
bound on range estimation variance. Bounds are derived for signals experiencing
frequency-selective Rayleigh block fading, accounting for the impact of limited
channel knowledge and multi-antenna reception. Uncompensated carrier frequency
offset and phase errors are also factored into the capacity bounds. Analysis
based on the derived bounds demonstrates how Pareto-optimal design choices can
be made to optimize the communication throughput, probability of outage, and
ranging variance. Different signal design strategies are then analyzed, showing
how Pareto-optimal design choices change depending on the channel
- …