50,311 research outputs found
Approximating Data with weighted smoothing Splines
Given a data set (t_i, y_i), i=1,..., n with the t_i in [0,1] non-parametric
regression is concerned with the problem of specifying a suitable function
f_n:[0,1] -> R such that the data can be reasonably approximated by the points
(t_i, f_n(t_i)), i=1,..., n. If a data set exhibits large variations in local
behaviour, for example large peaks as in spectroscopy data, then the method
must be able to adapt to the local changes in smoothness. Whilst many methods
are able to accomplish this they are less successful at adapting derivatives.
In this paper we show how the goal of local adaptivity of the function and its
first and second derivatives can be attained in a simple manner using weighted
smoothing splines. A residual based concept of approximation is used which
forces local adaptivity of the regression function together with a global
regularization which makes the function as smooth as possible subject to the
approximation constraints
Nonparametric Regression, Confidence Regions and Regularization
In this paper we offer a unified approach to the problem of nonparametric
regression on the unit interval. It is based on a universal, honest and
non-asymptotic confidence region which is defined by a set of linear
inequalities involving the values of the functions at the design points.
Interest will typically centre on certain simplest functions in that region
where simplicity can be defined in terms of shape (number of local extremes,
intervals of convexity/concavity) or smoothness (bounds on derivatives) or a
combination of both. Once some form of regularization has been decided upon the
confidence region can be used to provide honest non-asymptotic confidence
bounds which are less informative but conceptually much simpler
Shifts in hexapod diversification and what Haldane could have said
Data on species richness and taxon age are assembled for the extant hexapod orders (insects and their six-legged relatives). Coupled with estimates of phylogenetic relatedness, and simple statistical null models, these data are used to locate where, on the hexapod tree, significant changes in the rate of cladogenesis (speciation-minus-extinction rate) have occurred. Significant differences are found between many successive pairs of sister taxa near the base of the hexapod tree, all of which are attributable to a shift in diversification rate after the origin of the Neoptera (insects with wing flexion) and before the origin of the Holometabola (insects with complete metamorphosis). No other shifts are identifiable amongst supraordinal taxa. Whilst the Coleoptera have probably diversified faster than either of their putative sister lineages, they do not stand out relative to other closely related clades. These results suggest that any Creator had a fondness for a much more inclusive clade than the Coleoptera, definitely as large as the Eumetabola (Holometabola plus bugs and their relatives), and possibly as large as the entire Neoptera. Simultaneous, hence probable causative events are discussed, of which the origin of wing flexion has been the focus of much attention
Long range financial data and model choice
Long range financial data as typified by the daily returns of the Standard and Poor's index exhibit common features such as heavy tails, long
range memory of the absolute values and clustering of periods of high and low volatility. These and other features are often referred to as stylized facts and parametric models for such data are required to reproduce them in some sense. Typically this is done by simulating some data sets under the model and demonstrating that the simulations also exhibits the stylized facts. Nevertheless when the parameters of such models are to be estimated recourse is very often taken to likelihood either in the form of maximum likelihood or Bayes. In this paper we expound a method of
determining parameter values which depends solely on the ability of the model to reproduce the relevant features of the data set. We introduce a new measure of the volatility of the volatility and show how it can be combined with the distribution of the returns and the autocorrelation of the absolute returns to determine parameter values. We also give a parametric model for such data and show that it can reproduce the required features
Approximating data (Approximating data and statistical procedures
Stochastic models approximate data and are not true representations of the same. Statistical procedure make use of approximate stochastic models to facilitate the analysis of data. --
The granular silo as a continuum plastic flow: the hour-glass vs the clepsydra
The granular silo is one of the many interesting illustrations of the
thixotropic property of granular matter: a rapid flow develops at the outlet,
propagating upwards through a dense shear flow while material at the bottom
corners of the container remains static. For large enough outlets, the
discharge flow is continuous; however, by contrast with the clepsydra for which
the flow velocity depends on the height of fluid left in the container, the
discharge rate of granular silos is constant. Implementing a plastic rheology
in a 2D Navier-Stokes solver (following the mu(I)-rheology or a constant
friction), we simulate the continuum counterpart of the granular silo. Doing
so, we obtain a constant flow rate during the discharge and recover the
Beverloo scaling independently of the initial filling height of the silo. We
show that lowering the value of the coefficient of friction leads to a
transition toward a different behavior, similar to that of a viscous fluid, and
where the filling height becomes active in the discharge process. The pressure
field shows that large enough values of the coefficient of friction (
0.3) allow for a low-pressure cavity to form above the outlet, and can thus
explain the Beverloo scaling. In conclusion, the difference between the
discharge of a hourglass and a clepsydra seems to reside in the existence or
not of a plastic yield stress.Comment: 6 pages, 6 figure
Approximating data with weighted smoothing splines
Given a data set (t_i, y_i), i = 1,... ,n with the t_i â [0, 1] non-parametric regression is concerned with the problem of specifying a suitable function f_n : [0, 1] â R such that the data can be reasonably approximated by the points (t_i, f_n(t_i)), i = 1,... ,n. A common desideratum is that the function fn be smooth but the path towards this goal is often the indirect one of assuming a âtrueâ data generating function f and then measuring performance by the expected mean square. The approach taken in this paper is a different one.
We specify precisely what we mean by a function fn being an adequate approximation to the data and then, using weighted splines, we try to maximize
the smoothness given the approximation constraints
breakdown and groups
The concept of breakdown point was introduced by Hodges (1967) and Hampel (1968, 1971) and still plays an important though at times a controversial role in robust statistics. It has proved most successful in the context of location, scale and regression problems. In this paper we argue that this success is intimately connected to the fact that the translation and affine groups act on the sample space and give rise to a definition of equivariance for statistical functionals. For such functionals a nontrivial upper bound for the breakdown point can be shown. In the absence of such a group structure a breakdown point of one is attainable and this is perhaps the decisive reason why the concept of breakdown point in other situations has not proved as successful. Even if a natural group is present it is often not sufficiently large to allow a nontrivial upper bound for the breakdown point. One exception to this is the problem of the autocorrelation structure of time series where we derive a nontrivial upper breakdown point using the group of realizable linear filters. The paper is formulated in an abstract manner to emphasize the role of the group and the resulting equivariance structure
- âŠ