129,498 research outputs found
The sorted effects method: discovering heterogeneous effects beyond their averages
Supplemental Data & Programs are available here: https://hdl.handle.net/2144/34409The partial (ceteris paribus) effects of interest in nonlinear and interactive linear models are heterogeneous as they can vary dramatically with the underlying observed or unobserved covariates. Despite the apparent importance of heterogeneity, a common practice in modern empirical work is to largely ignore it by reporting average partial effects (or, at best, average effects for some groups). While average effects provide very convenient scalar summaries of typical effects, by definition they fail to reflect the entire variety of the heterogeneous effects. In order to discover these effects much more fully, we propose to estimate and report sorted effects -- a collection of estimated partial effects sorted in increasing order and indexed by percentiles. By construction the sorted effect curves completely represent and help visualize the range of the heterogeneous effects in one plot. They are as convenient and easy to report in practice as the conventional average partial effects. They also serve as a basis for classification analysis, where we divide the observational units into most or least affected groups and summarize their characteristics. We provide a quantification of uncertainty (standard errors and confidence bands) for the estimated sorted effects and related classification analysis, and provide confidence sets for the most and least affected groups. The derived statistical results rely on establishing key, new mathematical results on Hadamard differentiability of a multivariate sorting operator and a related classification operator, which are of independent interest. We apply the sorted effects method and classification analysis to demonstrate several striking patterns in the gender wage gap.https://arxiv.org/abs/1512.05635Accepted manuscrip
The Sorted Effects Method: Discovering Heterogeneous Effects Beyond Their Averages
The partial (ceteris paribus) effects of interest in nonlinear and
interactive linear models are heterogeneous as they can vary dramatically with
the underlying observed or unobserved covariates. Despite the apparent
importance of heterogeneity, a common practice in modern empirical work is to
largely ignore it by reporting average partial effects (or, at best, average
effects for some groups). While average effects provide very convenient scalar
summaries of typical effects, by definition they fail to reflect the entire
variety of the heterogeneous effects. In order to discover these effects much
more fully, we propose to estimate and report sorted effects -- a collection of
estimated partial effects sorted in increasing order and indexed by
percentiles. By construction the sorted effect curves completely represent and
help visualize the range of the heterogeneous effects in one plot. They are as
convenient and easy to report in practice as the conventional average partial
effects. They also serve as a basis for classification analysis, where we
divide the observational units into most or least affected groups and summarize
their characteristics. We provide a quantification of uncertainty (standard
errors and confidence bands) for the estimated sorted effects and related
classification analysis, and provide confidence sets for the most and least
affected groups. The derived statistical results rely on establishing key, new
mathematical results on Hadamard differentiability of a multivariate sorting
operator and a related classification operator, which are of independent
interest. We apply the sorted effects method and classification analysis to
demonstrate several striking patterns in the gender wage gap.Comment: 62 pages, 9 figures, 8 tables, includes appendix with supplementary
material
Fast Preprocessing for Optimal Orthogonal Range Reporting and Range Successor with Applications to Text Indexing
Under the word RAM model, we design three data structures that can be
constructed in time over points in an grid.
The first data structure is an -word structure supporting
orthogonal range reporting in time, where denotes output
size and is an arbitrarily small constant. The second is an
-word structure supporting orthogonal range successor in
time, while the third is an -word structure
supporting sorted range reporting in time. The query times of
these data structures are optimal when the space costs must be within $O(n\
polylog\ n)O(n\sqrt{\lg n})O(\lg^{\epsilon} n)O(n\sqrt{\lg n})$ time. Hence our work is the
first that achieve the same preprocessing time for optimal orthogonal range
reporting and range successor. We also apply our results to improve the
construction time of text indexes
Optimal Color Range Reporting in One Dimension
Color (or categorical) range reporting is a variant of the orthogonal range
reporting problem in which every point in the input is assigned a \emph{color}.
While the answer to an orthogonal point reporting query contains all points in
the query range , the answer to a color reporting query contains only
distinct colors of points in . In this paper we describe an O(N)-space data
structure that answers one-dimensional color reporting queries in optimal
time, where is the number of colors in the answer and is the
number of points in the data structure. Our result can be also dynamized and
extended to the external memory model
Range Queries on Uncertain Data
Given a set of uncertain points on the real line, each represented by
its one-dimensional probability density function, we consider the problem of
building data structures on to answer range queries of the following three
types for any query interval : (1) top- query: find the point in that
lies in with the highest probability, (2) top- query: given any integer
as part of the query, return the points in that lie in
with the highest probabilities, and (3) threshold query: given any threshold
as part of the query, return all points of that lie in with
probabilities at least . We present data structures for these range
queries with linear or nearly linear space and efficient query time.Comment: 26 pages. A preliminary version of this paper appeared in ISAAC 2014.
In this full version, we also present solutions to the most general case of
the problem (i.e., the histogram bounded case), which were left as open
problems in the preliminary versio
- …