32,344 research outputs found
On Sharp Identification Regions for Regression Under Interval Data
The reliable analysis of interval data (coarsened data) is one of the
most promising applications of imprecise probabilities in statistics. If one
refrains from making untestable, and often materially unjustified, strong
assumptions on the coarsening process, then the empirical distribution
of the data is imprecise, and statistical models are, in Manskiās terms,
partially identified. We first elaborate some subtle differences between
two natural ways of handling interval data in the dependent variable of
regression models, distinguishing between two different types of identification
regions, called Sharp Marrow Region (SMR) and Sharp Collection
Region (SCR) here. Focusing on the case of linear regression analysis, we
then derive some fundamental geometrical properties of SMR and SCR,
allowing a comparison of the regions and providing some guidelines for
their canonical construction.
Relying on the algebraic framework of adjunctions of two mappings between
partially ordered sets, we characterize SMR as a right adjoint and
as the monotone kernel of a criterion function based mapping, while SCR
is indeed interpretable as the corresponding monotone hull. Finally we
sketch some ideas on a compromise between SMR and SCR based on a
set-domained loss function.
This paper is an extended version of a shorter paper with the same title,
that is conditionally accepted for publication in the Proceedings of
the Eighth International Symposium on Imprecise Probability: Theories
and Applications. In the present paper we added proofs and the seventh
chapter with a small Monte-Carlo-Illustration, that would have made the
original paper too long
Simple Inference on Functionals of Set-Identified Parameters Defined by Linear Moments
This paper considers uniformly valid (over a class of data generating
processes) inference for linear functionals of partially identified parameters
in cases where the identified set is defined by linear (in the parameter)
moment inequalities. We propose a bootstrap procedure for constructing
uniformly valid confidence sets for a linear functional of a partially
identified parameter. The proposed method amounts to bootstrapping the value
functions of a linear optimization problem, and subsumes subvector inference as
a special case. In other words, this paper shows the conditions under which
``naively'' bootstrapping a linear program can be used to construct a
confidence set with uniform correct coverage for a partially identified linear
functional. Unlike other proposed subvector inference procedures, our procedure
does not require the researcher to repeatedly invert a hypothesis test, and is
extremely computationally efficient. In addition to the new procedure, the
paper also discusses connections between the literature on optimization and the
literature on subvector inference in partially identified models
Statistical modelling under epistemic data imprecision : some results on estimating multinomial distributions and logistic regression for coarse categorical data
Paper presented at 9th International Symposium on Imprecise Probability: Theories and Applications, Pescara, Italy, 2015. Abstract: The paper deals with parameter estimation for categorical data under epistemic data imprecision, where for a part of the data only coarse(ned) versions of the true values are observable. For different observation models formalizing the information available on the coarsening process, we derive the (typically set-valued) maximum likelihood estimators of the underlying distributions. We discuss the homogeneous case of independent and identically distributed variables as well as logistic regression under a categorical covariate. We start with the imprecise point estimator under an observation model describing the coarsening process without any further assumptions. Then we determine several sensitivity parameters that allow the refinement of the estimators in the presence of auxiliary information
Towards an exact reconstruction of a time-invariant model from time series data
Dynamic processes in biological systems may be profiled by measuring system properties over time. One way of representing such time series data is through weighted interaction networks, where the nodes in the network represent the measurables and the weighted edges represent interactions between any pair of nodes. Construction of these network models from time series data may involve seeking a robust data-consistent and time-invariant model to approximate and describe system dynamics. Many problems in mathematics, systems biology and physics can be recast into this form and may require finding the most consistent solution to a set of first order differential equations. This is especially challenging in cases where the number of data points is less than or equal to the number of measurables. We present a novel computational method for network reconstruction with limited time series data. To test our method, we use artificial time series data generated from known network models. We then attempt to reconstruct the original network from the time series data alone. We find good agreement between the original and predicted networks
On multivariate quantiles under partial orders
This paper focuses on generalizing quantiles from the ordering point of view.
We propose the concept of partial quantiles, which are based on a given partial
order. We establish that partial quantiles are equivariant under
order-preserving transformations of the data, robust to outliers, characterize
the probability distribution if the partial order is sufficiently rich,
generalize the concept of efficient frontier, and can measure dispersion from
the partial order perspective. We also study several statistical aspects of
partial quantiles. We provide estimators, associated rates of convergence, and
asymptotic distributions that hold uniformly over a continuum of quantile
indices. Furthermore, we provide procedures that can restore monotonicity
properties that might have been disturbed by estimation error, establish
computational complexity bounds, and point out a concentration of measure
phenomenon (the latter under independence and the componentwise natural order).
Finally, we illustrate the concepts by discussing several theoretical examples
and simulations. Empirical applications to compare intake nutrients within
diets, to evaluate the performance of investment funds, and to study the impact
of policies on tobacco awareness are also presented to illustrate the concepts
and their use.Comment: Published in at http://dx.doi.org/10.1214/10-AOS863 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Distribution Regression with Sample Selection, with an Application to Wage Decompositions in the UK
We develop a distribution regression model under endogenous sample selection.
This model is a semiparametric generalization of the Heckman selection model
that accommodates much richer patterns of heterogeneity in the selection
process and effect of the covariates. The model applies to continuous, discrete
and mixed outcomes. We study the identification of the model, and develop a
computationally attractive two-step method to estimate the model parameters,
where the first step is a probit regression for the selection equation and the
second step consists of multiple distribution regressions with selection
corrections for the outcome equation. We construct estimators of functionals of
interest such as actual and counterfactual distributions of latent and observed
outcomes via plug-in rule. We derive functional central limit theorems for all
the estimators and show the validity of multiplier bootstrap to carry out
functional inference. We apply the methods to wage decompositions in the UK
using new data. Here we decompose the difference between the male and female
wage distributions into four effects: composition, wage structure, selection
structure and selection sorting. After controlling for endogenous employment
selection, we still find substantial gender wage gap -- ranging from 21% to 40%
throughout the (latent) offered wage distribution that is not explained by
observable labor market characteristics. We also uncover positive sorting for
single men and negative sorting for married women that accounts for a
substantive fraction of the gender wage gap at the top of the distribution.
These findings can be interpreted as evidence of assortative matching in the
marriage market and glass-ceiling in the labor market.Comment: 72 pages, 4 tables, 39 figures, includes supplement with additional
empirical result
Partially Identified Prevalence Estimation under Misclassification using the Kappa Coefficient
We discuss a new strategy for prevalence estimation in the presence of misclassification. Our method is applicable when misclassification probabilities are unknown but independent replicate measurements are available. This yields the kappa coefficient, which indicates the agreement between the two measurements. From this information, a direct correction for misclassification is not feasible due to non-identifiability. However, it is possible to derive estimation intervals relying on the concept of partial identification. These intervals give interesting insights into possible bias due to misclassification. Furthermore, confidence intervals can be constructed. Our method is illustrated in several theoretical scenarios and in an example from oral health, where prevalence estimation of caries in children is the issue
- ā¦