16,551 research outputs found
Fast change point analysis on the Hurst index of piecewise fractional Brownian motion
In this presentation, we introduce a new method for change point analysis on
the Hurst index for a piecewise fractional Brownian motion. We first set the
model and the statistical problem. The proposed method is a transposition of
the FDpV (Filtered Derivative with p-value) method introduced for the detection
of change points on the mean in Bertrand et al. (2011) to the case of changes
on the Hurst index. The underlying statistics of the FDpV technology is a new
statistic estimator for Hurst index, so-called Increment Bernoulli Statistic
(IBS). Both FDpV and IBS are methods with linear time and memory complexity,
with respect to the size of the series. Thus the resulting method for change
point analysis on Hurst index reaches also a linear complexity
Multiscale change-point segmentation: beyond step functions.
Modern multiscale type segmentation methods are known to detect multiple change-points with high statistical accuracy, while allowing for fast computation. Underpinning (minimax) estimation theory has been developed mainly for models that assume the signal as a piecewise constant function. In this paper, for a large collection of multiscale segmentation methods (including various existing procedures), such theory will be extended to certain function classes beyond step functions in a nonparametric regression setting. This extends the interpretation of such methods on the one hand and on the other hand reveals these methods as robust to deviation from piecewise constant functions. Our main finding is the adaptation over nonlinear approximation classes for a universal thresholding, which includes bounded variation functions, and (piecewise) Holder functions of smoothness order 0 < alpha <= 1 as special cases. From this we derive statistical guarantees on feature detection in terms of jumps and modes. Another key finding is that these multiscale segmentation methods perform nearly (up to a log-factor) as well as the oracle piecewise constant segmentation estimator (with known jump locations), and the best piecewise constant approximants of the (unknown) true signal. Theoretical findings are examined by various numerical simulations
Bayesian Regression of Piecewise Constant Functions
We derive an exact and efficient Bayesian regression algorithm for piecewise
constant functions of unknown segment number, boundary location, and levels. It
works for any noise and segment level prior, e.g. Cauchy which can handle
outliers. We derive simple but good estimates for the in-segment variance. We
also propose a Bayesian regression curve as a better way of smoothing data
without blurring boundaries. The Bayesian approach also allows straightforward
determination of the evidence, break probabilities and error estimates, useful
for model selection and significance and robustness studies. We discuss the
performance on synthetic and real-world examples. Many possible extensions will
be discussed.Comment: 27 pages, 18 figures, 1 table, 3 algorithm
Heterogeneous Change Point Inference
We propose HSMUCE (heterogeneous simultaneous multiscale change-point
estimator) for the detection of multiple change-points of the signal in a
heterogeneous gaussian regression model. A piecewise constant function is
estimated by minimizing the number of change-points over the acceptance region
of a multiscale test which locally adapts to changes in the variance. The
multiscale test is a combination of local likelihood ratio tests which are
properly calibrated by scale dependent critical values in order to keep a
global nominal level alpha, even for finite samples. We show that HSMUCE
controls the error of over- and underestimation of the number of change-points.
To this end, new deviation bounds for F-type statistics are derived. Moreover,
we obtain confidence sets for the whole signal. All results are non-asymptotic
and uniform over a large class of heterogeneous change-point models. HSMUCE is
fast to compute, achieves the optimal detection rate and estimates the number
of change-points at almost optimal accuracy for vanishing signals, while still
being robust. We compare HSMUCE with several state of the art methods in
simulations and analyse current recordings of a transmembrane protein in the
bacterial outer membrane with pronounced heterogeneity for its states. An
R-package is available online
Binscatter Regressions
We introduce the \texttt{Stata} (and \texttt{R}) package \textsf{Binsreg},
which implements the binscatter methods developed in
\citet*{Cattaneo-Crump-Farrell-Feng_2019_Binscatter}. The package includes the
commands \texttt{binsreg}, \texttt{binsregtest}, and \texttt{binsregselect}.
The first command (\texttt{binsreg}) implements binscatter for the regression
function and its derivatives, offering several point estimation, confidence
intervals and confidence bands procedures, with particular focus on
constructing binned scatter plots. The second command (\texttt{binsregtest})
implements hypothesis testing procedures for parametric specification and for
nonparametric shape restrictions of the unknown regression function. Finally,
the third command (\texttt{binsregselect}) implements data-driven number of
bins selectors for binscatter implementation using either quantile-spaced or
evenly-spaced binning/partitioning. All the commands allow for covariate
adjustment, smoothness restrictions, weighting and clustering, among other
features. A companion \texttt{R} package with the same capabilities is also
available
Characterizing Ranked Chinese Syllable-to-Character Mapping Spectrum: A Bridge Between the Spoken and Written Chinese Language
One important aspect of the relationship between spoken and written Chinese
is the ranked syllable-to-character mapping spectrum, which is the ranked list
of syllables by the number of characters that map to the syllable. Previously,
this spectrum is analyzed for more than 400 syllables without distinguishing
the four intonations. In the current study, the spectrum with 1280 toned
syllables is analyzed by logarithmic function, Beta rank function, and
piecewise logarithmic function. Out of the three fitting functions, the
two-piece logarithmic function fits the data the best, both by the smallest sum
of squared errors (SSE) and by the lowest Akaike information criterion (AIC)
value. The Beta rank function is the close second. By sampling from a Poisson
distribution whose parameter value is chosen from the observed data, we
empirically estimate the -value for testing the
two-piece-logarithmic-function being better than the Beta rank function
hypothesis, to be 0.16. For practical purposes, the piecewise logarithmic
function and the Beta rank function can be considered a tie.Comment: 15 pages, 4 figure
- …