59 research outputs found
Piecewise polynomial approximation of probability density functions with application to uncertainty quantification for stochastic PDEs
The probability density function (PDF) associated with a given set of samples
is approximated by a piecewise-linear polynomial constructed with respect to a
binning of the sample space. The kernel functions are a compactly supported
basis for the space of such polynomials, i.e. finite element hat functions,
that are centered at the bin nodes rather than at the samples, as is the case
for the standard kernel density estimation approach. This feature naturally
provides an approximation that is scalable with respect to the sample size. On
the other hand, unlike other strategies that use a finite element approach, the
proposed approximation does not require the solution of a linear system. In
addition, a simple rule that relates the bin size to the sample size eliminates
the need for bandwidth selection procedures. The proposed density estimator has
unitary integral, does not require a constraint to enforce positivity, and is
consistent. The proposed approach is validated through numerical examples in
which samples are drawn from known PDFs. The approach is also used to determine
approximations of (unknown) PDFs associated with outputs of interest that
depend on the solution of a stochastic partial differential equation
learning and adaptation to detect changes and anomalies in high dimensional data
The problem of monitoring a datastream and detecting whether the data generating process changes from normal to novel and possibly anomalous conditions has relevant applications in many real scenarios, such as health monitoring and quality inspection of industrial processes. A general approach often adopted in the literature is to learn a model to describe normal data and detect as anomalous those data that do not conform to the learned model. However, several challenges have to be addressed to make this approach effective in real world scenarios, where acquired data are often characterized by high dimension and feature complex structures (such as signals and images). We address this problem from two perspectives corresponding to different modeling assumptions on the data-generating process. At first, we model data as realization of random vectors, as it is customary in the statistical literature. In this settings we focus on the change detection problem, where the goal is to detect whether the datastream permanently departs from normal conditions. We theoretically prove the intrinsic difficulty of this problem when the data dimension increases and propose a novel non-parametric and multivariate change-detection algorithm. In the second part, we focus on data having complex structure and we adopt dictionaries yielding sparse representations to model normal data. We propose novel algorithms to detect anomalies in such datastreams and to adapt the learned model when the process generating normal data changes
Statistics for Fission-Track Thermochronology
This chapter introduces statistical tools to extract geologically meaningful information from fission-track (FT) data using both the external detector and LA-ICP-MS methods. The spontaneous fission of 238U is a Poisson process resulting in large single-grain age uncertainties. To overcome this imprecision, it is nearly always necessary to analyse multiple grains per sample. The degree to which the analytical uncertainties can explain the observed scatter of the single-grain data can be visually assessed on a radial plot and objectively quantified by a chi-square test. For sufficiently low values of the chi-square statistic (or sufficiently high p values), the pooled age of all the grains gives a suitable description of the underlying ‘true’ age population. Samples may fail the chi-square test for several reasons. A first possibility is that the true age population does not consist of a single discrete age component, but is characterised by a continuous range of ages. In this case, a ‘random effects’ model can constrain the true age distribution using two parameters: the ‘central age’ and the ‘(over)dispersion’. A second reason why FT data sets might fail the chi-square test is if they are underlain by multimodal age distributions. Such distributions may consist of discrete age components, continuous age distributions, or a combination of the two. Formalised statistical tests such as chi-square can be useful in preventing overfitting of relatively small data sets. However, they should be used with caution when applied to large data sets (including length measurements) which generate sufficient statistical ‘power’ to reject any simple yet geologically plausible hypothesis
Novel Methods for Analysing Bacterial Tracks Reveal Persistence in Rhodobacter sphaeroides
Tracking bacteria using video microscopy is a powerful experimental approach to probe their motile behaviour. The
trajectories obtained contain much information relating to the complex patterns of bacterial motility. However, methods for
the quantitative analysis of such data are limited. Most swimming bacteria move in approximately straight lines,
interspersed with random reorientation phases. It is therefore necessary to segment observed tracks into swimming and
reorientation phases to extract useful statistics. We present novel robust analysis tools to discern these two phases in tracks.
Our methods comprise a simple and effective protocol for removing spurious tracks from tracking datasets, followed by
analysis based on a two-state hidden Markov model, taking advantage of the availability of mutant strains that exhibit
swimming-only or reorientating-only motion to generate an empirical prior distribution. Using simulated tracks with varying
levels of added noise, we validate our methods and compare them with an existing heuristic method. To our knowledge this
is the first example of a systematic assessment of analysis methods in this field. The new methods are substantially more
robust to noise and introduce less systematic bias than the heuristic method. We apply our methods to tracks obtained
from the bacterial species Rhodobacter sphaeroides and Escherichia coli. Our results demonstrate that R. sphaeroides exhibits
persistence over the course of a tumbling event, which is a novel result with important implications in the study of this and
similar species
Automatic Mapping of Discontinuity Persistence on Rock Masses Using 3D Point Clouds
Finding new ways to quantify discontinuity persistence values in rock masses in an automatic or semi-automatic manner is a considerable challenge, as an alternative to the use of traditional methods based on measuring patches or traces with tapes. Remote sensing techniques potentially provide new ways of analysing visible data from the rock mass. This work presents a methodology for the automatic mapping of discontinuity persistence on rock masses, using 3D point clouds. The method proposed herein starts by clustering points that belong to patches of a given discontinuity. Coplanar clusters are then merged into a single group of points. Persistence is measured in the directions of the dip and strike for each coplanar set of points, resulting in the extraction of the length of the maximum chord and the area of the convex hull. The proposed approach is implemented in a graphic interface with open source software. Three case studies are utilized to illustrate the methodology: (1) small-scale laboratory setup consisting of a regular distribution of cubes with similar dimensions, (2) more complex geometry consisting of a real rock mass surface in an excavated cavern and (3) slope with persistent sub-vertical discontinuities. Results presented good agreement with field measurements, validating the methodology. Complexities and difficulties related to the method (e.g. natural discontinuity waviness) are reported and discussed. An assessment on the applicability of the method to the 3D point cloud is also presented. Utilization of remote sensing data for a more objective characterization of the persistence of planar discontinuities affecting rock masses is highlighted herein
Recommended from our members
Spectral-temporal EEG dynamics of speech discrimination processing in infants during sleep
BACKGROUND: Oddball paradigms are frequently used to study auditory discrimination by comparing event-related potential (ERP) responses from a standard, high probability sound and to a deviant, low probability sound. Previous research has established that such paradigms, such as the mismatch response or mismatch negativity, are useful for examining auditory processes in young children and infants across various sleep and attention states. The extent to which oddball ERP responses may reflect subtle discrimination effects, such as speech discrimination, is largely unknown, especially in infants that have not yet acquired speech and language.
RESULTS: Mismatch responses for three contrasts (non-speech, vowel, and consonant) were computed as a spectral-temporal probability function in 24 infants, and analyzed at the group level by a modified multidimensional scaling. Immediately following an onset gamma response (30-50 Hz), the emergence of a beta oscillation (12-30 Hz) was temporally coupled with a lower frequency theta oscillation (2-8 Hz). The spectral-temporal probability of this coupling effect relative to a subsequent theta modulation corresponds with discrimination difficulty for non-speech, vowel, and consonant contrast features.
DISCUSSION: The theta modulation effect suggests that unexpected sounds are encoded as a probabilistic measure of surprise. These results support the notion that auditory discrimination is driven by the development of brain networks for predictive processing, and can be measured in infants during sleep. The results presented here have implications for the interpretation of discrimination as a probabilistic process, and may provide a basis for the development of single-subject and single-trial classification in a clinically useful context.
CONCLUSION: An infant's brain is processing information about the environment and performing computations, even during sleep. These computations reflect subtle differences in acoustic feature processing that are necessary for language-learning. Results from this study suggest that brain responses to deviant sounds in an oddball paradigm follow a cascade of oscillatory modulations. This cascade begins with a gamma response that later emerges as a beta synchronization, which is temporally coupled with a theta modulation, and followed by a second, subsequent theta modulation. The difference in frequency and timing of the theta modulations appears to reflect a measure of surprise. These insights into the neurophysiological mechanisms of auditory discrimination provide a basis for exploring the clinically utility of the MM
Sampling conditionally on a rare event via generalized splitting
We propose and analyze a generalized splitting method to sample approximately from a distribution conditional on the occurrence of a rare event. This has important applications in a variety of contexts in operations research, engineering, and computational statistics. The method uses independent trials starting from a single particle. We exploit this independence to obtain asymptotic and nonasymptotic bounds on the total variation error of the sampler. Our main finding is that the approximation error depends crucially on the relative variability of the number of points produced by the splitting algorithm in one run and that this relative variability can be readily estimated via simulation. We illustrate the relevance of the proposed method on an application in which one needs to sample (approximately) from an intractable posterior density in Bayesian inference
Simulation from the Tail of the Univariate andMultivariate Normal Distribution
We study and compare various methods to generate a random variateor vector from the univariate or multivariate normal distribution truncatedto some nite or semi-in nite region, with special attention to the situationwhere the regions are far in the tail. This is required in particular for certainapplications in Bayesian statistics, such as to perform exact posterior simulationsfor parameter inference, but could have many other applications as well.We distinguish the case in which inversion is warranted, and that in whichrejection methods are preferred
- …