24,483 research outputs found
Better estimates from binned income data: Interpolated CDFs and mean-matching
Researchers often estimate income statistics from summaries that report the
number of incomes in bins such as \$0-10,000, \$10,001-20,000,...,\$200,000+.
Some analysts assign incomes to bin midpoints, but this treats income as
discrete. Other analysts fit a continuous parametric distribution, but the
distribution may not fit well.
We fit nonparametric continuous distributions that reproduce the bin counts
perfectly by interpolating the cumulative distribution function (CDF). We also
show how both midpoints and interpolated CDFs can be constrained to reproduce
the mean of income when it is known.
We compare the methods' accuracy in estimating the Gini coefficients of all
3,221 US counties. Fitting parametric distributions is very slow. Fitting
interpolated CDFs is much faster and slightly more accurate. Both interpolated
CDFs and midpoints give dramatically better estimates if constrained to match a
known mean.
We have implemented interpolated CDFs in the binsmooth package for R. We have
implemented the midpoint method in the rpme command for Stata. Both
implementations can be constrained to match a known mean.Comment: 20 pages (including Appendix), 3 tables, 2 figures (+2 in Appendix
Short and long-term wind turbine power output prediction
In the wind energy industry, it is of great importance to develop models that
accurately forecast the power output of a wind turbine, as such predictions are
used for wind farm location assessment or power pricing and bidding,
monitoring, and preventive maintenance. As a first step, and following the
guidelines of the existing literature, we use the supervisory control and data
acquisition (SCADA) data to model the wind turbine power curve (WTPC). We
explore various parametric and non-parametric approaches for the modeling of
the WTPC, such as parametric logistic functions, and non-parametric piecewise
linear, polynomial, or cubic spline interpolation functions. We demonstrate
that all aforementioned classes of models are rich enough (with respect to
their relative complexity) to accurately model the WTPC, as their mean squared
error (MSE) is close to the MSE lower bound calculated from the historical
data. We further enhance the accuracy of our proposed model, by incorporating
additional environmental factors that affect the power output, such as the
ambient temperature, and the wind direction. However, all aforementioned
models, when it comes to forecasting, seem to have an intrinsic limitation, due
to their inability to capture the inherent auto-correlation of the data. To
avoid this conundrum, we show that adding a properly scaled ARMA modeling layer
increases short-term prediction performance, while keeping the long-term
prediction capability of the model
B-spline techniques for volatility modeling
This paper is devoted to the application of B-splines to volatility modeling,
specifically the calibration of the leverage function in stochastic local
volatility models and the parameterization of an arbitrage-free implied
volatility surface calibrated to sparse option data. We use an extension of
classical B-splines obtained by including basis functions with infinite
support. We first come back to the application of shape-constrained B-splines
to the estimation of conditional expectations, not merely from a scatter plot
but also from the given marginal distributions. An application is the Monte
Carlo calibration of stochastic local volatility models by Markov projection.
Then we present a new technique for the calibration of an implied volatility
surface to sparse option data. We use a B-spline parameterization of the
Radon-Nikodym derivative of the underlying's risk-neutral probability density
with respect to a roughly calibrated base model. We show that this method
provides smooth arbitrage-free implied volatility surfaces. Finally, we sketch
a Galerkin method with B-spline finite elements to the solution of the partial
differential equation satisfied by the Radon-Nikodym derivative.Comment: 25 page
Sparse implicitization by interpolation: Characterizing non-exactness and an application to computing discriminants
We revisit implicitization by interpolation in order to examine its properties in the context of sparse elimination theory. Based on the computation of a superset of the implicit support, implicitization is reduced to computing the nullspace of a numeric matrix. The approach is applicable to polynomial and rational parameterizations of curves and (hyper)surfaces of any dimension, including the case of parameterizations with base points.
Our support prediction is based on sparse (or toric) resultant theory, in order to exploit the sparsity of the input and the output. Our method may yield a multiple of the implicit equation: we characterize and quantify this situation by relating the nullspace dimension to the predicted support and its geometry. In this case, we obtain more than one multiples of the implicit equation; the latter can be obtained via multivariate polynomial gcd (or factoring).
All of the above techniques extend to the case of approximate computation, thus yielding a method of sparse approximate implicitization, which is important in tackling larger problems. We discuss our publicly available Maple implementation through several examples, including the benchmark of bicubic surface.
For a novel application, we focus on computing the discriminant of a multivariate polynomial, which characterizes the existence of multiple roots and generalizes the resultant of a polynomial system.
This yields an efficient, output-sensitive algorithm for
computing the discriminant polynomial
Probing spatial homogeneity with LTB models: a detailed discussion
Do current observational data confirm the assumptions of the cosmological
principle, or is there statistical evidence for deviations from spatial
homogeneity on large scales? To address these questions, we developed a
flexible framework based on spherically symmetric, but radially inhomogeneous
Lemaitre-Tolman-Bondi (LTB) models with synchronous Big Bang. We expanded the
(local) matter density profile in terms of flexible interpolation schemes and
orthonormal polynomials. A Monte Carlo technique in combination with recent
observational data was used to systematically vary the shape of these profiles.
In the first part of this article, we reconsider giant LTB voids without dark
energy to investigate whether extremely fine-tuned mass profiles can reconcile
these models with current data. While the local Hubble rate and supernovae can
easily be fitted without dark energy, however, model-independent constraints
from the Planck 2013 data require an unrealistically low local Hubble rate,
which is strongly inconsistent with the observed value; this result agrees well
with previous studies. In the second part, we explain why it seems natural to
extend our framework by a non-zero cosmological constant, which then allows us
to perform general tests of the cosmological principle. Moreover, these
extended models facilitate explorating whether fluctuations in the local matter
density profile might potentially alleviate the tension between local and
global measurements of the Hubble rate, as derived from Cepheid-calibrated type
Ia supernovae and CMB experiments, respectively. We show that current data
provide no evidence for deviations from spatial homogeneity on large scales.
More accurate constraints are required to ultimately confirm the validity of
the cosmological principle, however.Comment: 18 pages, 12 figures, 2 tables; accepted for publication in A&
GRID2D/3D: A computer program for generating grid systems in complex-shaped two- and three-dimensional spatial domains. Part 1: Theory and method
An efficient computer program, called GRID2D/3D was developed to generate single and composite grid systems within geometrically complex two- and three-dimensional (2- and 3-D) spatial domains that can deform with time. GRID2D/3D generates single grid systems by using algebraic grid generation methods based on transfinite interpolation in which the distribution of grid points within the spatial domain is controlled by stretching functions. All single grid systems generated by GRID2D/3D can have grid lines that are continuous and differentiable everywhere up to the second-order. Also, grid lines can intersect boundaries of the spatial domain orthogonally. GRID2D/3D generates composite grid systems by patching together two or more single grid systems. The patching can be discontinuous or continuous. For continuous composite grid systems, the grid lines are continuous and differentiable everywhere up to the second-order except at interfaces where different single grid systems meet. At interfaces where different single grid systems meet, the grid lines are only differentiable up to the first-order. For 2-D spatial domains, the boundary curves are described by using either cubic or tension spline interpolation. For 3-D spatial domains, the boundary surfaces are described by using either linear Coon's interpolation, bi-hyperbolic spline interpolation, or a new technique referred to as 3-D bi-directional Hermite interpolation. Since grid systems generated by algebraic methods can have grid lines that overlap one another, GRID2D/3D contains a graphics package for evaluating the grid systems generated. With the graphics package, the user can generate grid systems in an interactive manner with the grid generation part of GRID2D/3D. GRID2D/3D is written in FORTRAN 77 and can be run on any IBM PC, XT, or AT compatible computer. In order to use GRID2D/3D on workstations or mainframe computers, some minor modifications must be made in the graphics part of the program; no modifications are needed in the grid generation part of the program. This technical memorandum describes the theory and method used in GRID2D/3D
- …