Search CORE

129,498 research outputs found

The sorted effects method: discovering heterogeneous effects beyond their averages

Author: Chernozhukov Victor
Fernández-Val Iván
Luo Ye
Publication venue: 'The Econometric Society'
Publication date: 01/11/2018
Field of study

Supplemental Data & Programs are available here: https://hdl.handle.net/2144/34409The partial (ceteris paribus) effects of interest in nonlinear and interactive linear models are heterogeneous as they can vary dramatically with the underlying observed or unobserved covariates. Despite the apparent importance of heterogeneity, a common practice in modern empirical work is to largely ignore it by reporting average partial effects (or, at best, average effects for some groups). While average effects provide very convenient scalar summaries of typical effects, by definition they fail to reflect the entire variety of the heterogeneous effects. In order to discover these effects much more fully, we propose to estimate and report sorted effects -- a collection of estimated partial effects sorted in increasing order and indexed by percentiles. By construction the sorted effect curves completely represent and help visualize the range of the heterogeneous effects in one plot. They are as convenient and easy to report in practice as the conventional average partial effects. They also serve as a basis for classification analysis, where we divide the observational units into most or least affected groups and summarize their characteristics. We provide a quantification of uncertainty (standard errors and confidence bands) for the estimated sorted effects and related classification analysis, and provide confidence sets for the most and least affected groups. The derived statistical results rely on establishing key, new mathematical results on Hadamard differentiability of a multivariate sorting operator and a related classification operator, which are of independent interest. We apply the sorted effects method and classification analysis to demonstrate several striking patterns in the gender wage gap.https://arxiv.org/abs/1512.05635Accepted manuscrip

DSpace@MIT

Boston University Institutional Repository (OpenBU)

The Sorted Effects Method: Discovering Heterogeneous Effects Beyond Their Averages

Author: Chernozhukov Victor
Fernandez-Val Ivan
Luo Ye
Publication venue
Publication date: 25/05/2018
Field of study

The partial (ceteris paribus) effects of interest in nonlinear and interactive linear models are heterogeneous as they can vary dramatically with the underlying observed or unobserved covariates. Despite the apparent importance of heterogeneity, a common practice in modern empirical work is to largely ignore it by reporting average partial effects (or, at best, average effects for some groups). While average effects provide very convenient scalar summaries of typical effects, by definition they fail to reflect the entire variety of the heterogeneous effects. In order to discover these effects much more fully, we propose to estimate and report sorted effects -- a collection of estimated partial effects sorted in increasing order and indexed by percentiles. By construction the sorted effect curves completely represent and help visualize the range of the heterogeneous effects in one plot. They are as convenient and easy to report in practice as the conventional average partial effects. They also serve as a basis for classification analysis, where we divide the observational units into most or least affected groups and summarize their characteristics. We provide a quantification of uncertainty (standard errors and confidence bands) for the estimated sorted effects and related classification analysis, and provide confidence sets for the most and least affected groups. The derived statistical results rely on establishing key, new mathematical results on Hadamard differentiability of a multivariate sorting operator and a related classification operator, which are of independent interest. We apply the sorted effects method and classification analysis to demonstrate several striking patterns in the gender wage gap.Comment: 62 pages, 9 figures, 8 tables, includes appendix with supplementary material

arXiv.org e-Print Archive

DSpace@MIT

Boston University Institutional Repository (OpenBU)

Fast Preprocessing for Optimal Orthogonal Range Reporting and Range Successor with Applications to Text Indexing

Author: Gao Younan
He Meng
Nekrich Yakov
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 28th Annual European Symposium on Algorithms (ESA 2020)
Publication date: 01/01/2020
Field of study

Under the word RAM model, we design three data structures that can be constructed in

O(n\sqrt{\lg n})

time over

n

points in an

n \times n

grid. The first data structure is an

O(n\lg^{\epsilon} n)

-word structure supporting orthogonal range reporting in

O(\lg\lg n+k)

time, where

k

denotes output size and

\epsilon

is an arbitrarily small constant. The second is an

O(n\lg\lg n)

-word structure supporting orthogonal range successor in

O(\lg\lg n)

time, while the third is an

O(n\lg^{\epsilon} n)

-word structure supporting sorted range reporting in

O(\lg\lg n+k)

time. The query times of these data structures are optimal when the space costs must be within $O(n\ polylog\ n)

words. Their exact space bounds match those of the best known results achieving the same query times, and the

O(n\sqrt{\lg n})

construction time beats the previous bounds on preprocessing. Previously, among 2d range search structures, only the orthogonal range counting structure of Chan and P\v{a}tra\c{s}cu (SODA 2010) and the linear space,

O(\lg^{\epsilon} n)

query time structure for orthogonal range successor by Belazzougui and Puglisi (SODA 2016) can be built in the same

O(n\sqrt{\lg n})$ time. Hence our work is the first that achieve the same preprocessing time for optimal orthogonal range reporting and range successor. We also apply our results to improve the construction time of text indexes

arXiv.org e-Print Archive

Michigan Technological University

Dagstuhl Research Online Publication Server

Optimal Color Range Reporting in One Dimension

Author: B. Chazelle
D.E. Willard
E.M. McCreight
L. Arge
M. Thorup
M.L. Fredman
P. Beame
P. Emde Boas van
P. Gupta
P.B. Miltersen
Q. Shi
R. Janardan
T.M. Chan
Publication venue
Publication date: 01/01/2013
Field of study

Color (or categorical) range reporting is a variant of the orthogonal range reporting problem in which every point in the input is assigned a \emph{color}. While the answer to an orthogonal point reporting query contains all points in the query range

Q

, the answer to a color reporting query contains only distinct colors of points in

Q

. In this paper we describe an O(N)-space data structure that answers one-dimensional color reporting queries in optimal

O(k+1)

time, where

k

is the number of colors in the answer and

N

is the number of points in the data structure. Our result can be also dynamized and extended to the external memory model

arXiv.org e-Print Archive

Crossref

Range Queries on Uncertain Data

Author: B Chazelle
B Chazelle
B Chazelle
B Chazelle
G Frederickson
J Driscoll
J Mitchell
M Yiu
P Agarwal
Publication venue
Publication date: 09/01/2015
Field of study

Given a set

P

n

uncertain points on the real line, each represented by its one-dimensional probability density function, we consider the problem of building data structures on

P

to answer range queries of the following three types for any query interval

I

: (1) top-

1

query: find the point in

P

that lies in

I

with the highest probability, (2) top-

k

query: given any integer

k\leq n

as part of the query, return the

k

points in

P

that lie in

I

with the highest probabilities, and (3) threshold query: given any threshold

\tau

as part of the query, return all points of

P

that lie in

I

with probabilities at least

\tau

. We present data structures for these range queries with linear or nearly linear space and efficient query time.Comment: 26 pages. A preliminary version of this paper appeared in ISAAC 2014. In this full version, we also present solutions to the most general case of the problem (i.e., the histogram bounded case), which were left as open problems in the preliminary versio

arXiv.org e-Print Archive

Crossref