2,566 research outputs found
Approximating the Distribution of the Median and other Robust Estimators on Uncertain Data
Robust estimators, like the median of a point set, are important for data
analysis in the presence of outliers. We study robust estimators for
locationally uncertain points with discrete distributions. That is, each point
in a data set has a discrete probability distribution describing its location.
The probabilistic nature of uncertain data makes it challenging to compute such
estimators, since the true value of the estimator is now described by a
distribution rather than a single point. We show how to construct and estimate
the distribution of the median of a point set. Building the approximate support
of the distribution takes near-linear time, and assigning probability to that
support takes quadratic time. We also develop a general approximation technique
for distributions of robust estimators with respect to ranges with bounded VC
dimension. This includes the geometric median for high dimensions and the
Siegel estimator for linear regression.Comment: Full version of a paper to appear at SoCG 201
Calculation of Weibull strength parameters and Batdorf flow-density constants for volume- and surface-flaw-induced fracture in ceramics
The calculation of shape and scale parameters of the two-parameter Weibull distribution is described using the least-squares analysis and maximum likelihood methods for volume- and surface-flaw-induced fracture in ceramics with complete and censored samples. Detailed procedures are given for evaluating 90 percent confidence intervals for maximum likelihood estimates of shape and scale parameters, the unbiased estimates of the shape parameters, and the Weibull mean values and corresponding standard deviations. Furthermore, the necessary steps are described for detecting outliers and for calculating the Kolmogorov-Smirnov and the Anderson-Darling goodness-of-fit statistics and 90 percent confidence bands about the Weibull distribution. It also shows how to calculate the Batdorf flaw-density constants by uing the Weibull distribution statistical parameters. The techniques described were verified with several example problems, from the open literature, and were coded. The techniques described were verified with several example problems from the open literature, and were coded in the Structural Ceramics Analysis and Reliability Evaluation (SCARE) design program
The Stochastic Fluctuation of the Quantile Regression Curve
Let (X1, Y1), . . ., (Xn, Yn) be i.i.d. rvs and let l(x) be the unknown p-quantile regression curve of Y on X. A quantile-smoother ln(x) is a localised, nonlinear estimator of l(x). The strong uniform consistency rate is established under general conditions. In many applications it is necessary to know the stochastic fluctuation of the process {ln(x) - l(x)}. Using strong approximations of the empirical process and extreme value theory allows us to consider the asymptotic maximal deviation sup06x61 |ln(x)-l(x)|. The derived result helps in the construction of a uniform confidence band for the quantile curve l(x). This confidence band can be applied as a model check, e.g. in econometrics. An application considers a labour market discrimination effect.Quantile Regression, Consistency Rate, Confidence Band, Check Function, Kernel Smoothing, Nonparametric Fitting
On the 3D structure of the mass, metallicity, and SFR space for SF galaxies
We demonstrate that the space formed by the star-formation rate (SFR),
gas-phase metallicity (Z), and stellar mass (M), can be reduced to a plane, as
first proposed by Lara-Lopez et al. We study three different approaches to find
the best representation of this 3D space, using a principal component analysis,
a regression fit, and binning of the data. The PCA shows that this 3D space can
be adequately represented in only 2 dimensions, i.e., a plane. We find that the
plane that minimises the chi^2 for all variables, and hence provides the best
representation of the data, corresponds to a regression fit to the stellar mass
as a function of SFR and , M=f(Z,SFR). We find that the distribution
resulting from the median values in bins for our data gives the highest chi^2.
We also show that the empirical calibrations to the oxygen abundance used to
derive the Fundamental Metallicity Relation (Nagao et al.) have important
limitations, which contribute to the apparent inconsistencies. The main problem
is that these empirical calibrations do not consider the ionization degree of
the gas. Furthermore, the use of the N2 index to estimate oxygen abundances
cannot be applied for ~8.8 because of the saturation of the [NII]6584 line in
the high-metallicity regime. Finally we provide an update of the Fundamental
Plane derived by Lara-Lopez et al.Comment: ApJ, accepted. 15 pages, 13 figure
Coherent frequentism
By representing the range of fair betting odds according to a pair of
confidence set estimators, dual probability measures on parameter space called
frequentist posteriors secure the coherence of subjective inference without any
prior distribution. The closure of the set of expected losses corresponding to
the dual frequentist posteriors constrains decisions without arbitrarily
forcing optimization under all circumstances. This decision theory reduces to
those that maximize expected utility when the pair of frequentist posteriors is
induced by an exact or approximate confidence set estimator or when an
automatic reduction rule is applied to the pair. In such cases, the resulting
frequentist posterior is coherent in the sense that, as a probability
distribution of the parameter of interest, it satisfies the axioms of the
decision-theoretic and logic-theoretic systems typically cited in support of
the Bayesian posterior. Unlike the p-value, the confidence level of an interval
hypothesis derived from such a measure is suitable as an estimator of the
indicator of hypothesis truth since it converges in sample-space probability to
1 if the hypothesis is true or to 0 otherwise under general conditions.Comment: The confidence-measure theory of inference and decision is explicitly
extended to vector parameters of interest. The derivation of upper and lower
confidence levels from valid and nonconservative set estimators is formalize
- …