125,536 research outputs found

    The sorted effects method: discovering heterogeneous effects beyond their averages

    Full text link
    Supplemental Data & Programs are available here: https://hdl.handle.net/2144/34409The partial (ceteris paribus) effects of interest in nonlinear and interactive linear models are heterogeneous as they can vary dramatically with the underlying observed or unobserved covariates. Despite the apparent importance of heterogeneity, a common practice in modern empirical work is to largely ignore it by reporting average partial effects (or, at best, average effects for some groups). While average effects provide very convenient scalar summaries of typical effects, by definition they fail to reflect the entire variety of the heterogeneous effects. In order to discover these effects much more fully, we propose to estimate and report sorted effects -- a collection of estimated partial effects sorted in increasing order and indexed by percentiles. By construction the sorted effect curves completely represent and help visualize the range of the heterogeneous effects in one plot. They are as convenient and easy to report in practice as the conventional average partial effects. They also serve as a basis for classification analysis, where we divide the observational units into most or least affected groups and summarize their characteristics. We provide a quantification of uncertainty (standard errors and confidence bands) for the estimated sorted effects and related classification analysis, and provide confidence sets for the most and least affected groups. The derived statistical results rely on establishing key, new mathematical results on Hadamard differentiability of a multivariate sorting operator and a related classification operator, which are of independent interest. We apply the sorted effects method and classification analysis to demonstrate several striking patterns in the gender wage gap.https://arxiv.org/abs/1512.05635Accepted manuscrip

    The Sorted Effects Method: Discovering Heterogeneous Effects Beyond Their Averages

    Full text link
    The partial (ceteris paribus) effects of interest in nonlinear and interactive linear models are heterogeneous as they can vary dramatically with the underlying observed or unobserved covariates. Despite the apparent importance of heterogeneity, a common practice in modern empirical work is to largely ignore it by reporting average partial effects (or, at best, average effects for some groups). While average effects provide very convenient scalar summaries of typical effects, by definition they fail to reflect the entire variety of the heterogeneous effects. In order to discover these effects much more fully, we propose to estimate and report sorted effects -- a collection of estimated partial effects sorted in increasing order and indexed by percentiles. By construction the sorted effect curves completely represent and help visualize the range of the heterogeneous effects in one plot. They are as convenient and easy to report in practice as the conventional average partial effects. They also serve as a basis for classification analysis, where we divide the observational units into most or least affected groups and summarize their characteristics. We provide a quantification of uncertainty (standard errors and confidence bands) for the estimated sorted effects and related classification analysis, and provide confidence sets for the most and least affected groups. The derived statistical results rely on establishing key, new mathematical results on Hadamard differentiability of a multivariate sorting operator and a related classification operator, which are of independent interest. We apply the sorted effects method and classification analysis to demonstrate several striking patterns in the gender wage gap.Comment: 62 pages, 9 figures, 8 tables, includes appendix with supplementary material

    Fast Preprocessing for Optimal Orthogonal Range Reporting and Range Successor with Applications to Text Indexing

    Get PDF
    Under the word RAM model, we design three data structures that can be constructed in O(nlgn)O(n\sqrt{\lg n}) time over nn points in an n×nn \times n grid. The first data structure is an O(nlgϵn)O(n\lg^{\epsilon} n)-word structure supporting orthogonal range reporting in O(lglgn+k)O(\lg\lg n+k) time, where kk denotes output size and ϵ\epsilon is an arbitrarily small constant. The second is an O(nlglgn)O(n\lg\lg n)-word structure supporting orthogonal range successor in O(lglgn)O(\lg\lg n) time, while the third is an O(nlgϵn)O(n\lg^{\epsilon} n)-word structure supporting sorted range reporting in O(lglgn+k)O(\lg\lg n+k) time. The query times of these data structures are optimal when the space costs must be within $O(n\ polylog\ n)words.Theirexactspaceboundsmatchthoseofthebestknownresultsachievingthesamequerytimes,andthe words. Their exact space bounds match those of the best known results achieving the same query times, and the O(n\sqrt{\lg n})constructiontimebeatsthepreviousboundsonpreprocessing.Previously,among2drangesearchstructures,onlytheorthogonalrangecountingstructureofChanandPaˇtras¸cu(SODA2010)andthelinearspace, construction time beats the previous bounds on preprocessing. Previously, among 2d range search structures, only the orthogonal range counting structure of Chan and P\v{a}tra\c{s}cu (SODA 2010) and the linear space, O(\lg^{\epsilon} n)querytimestructurefororthogonalrangesuccessorbyBelazzouguiandPuglisi(SODA2016)canbebuiltinthesame query time structure for orthogonal range successor by Belazzougui and Puglisi (SODA 2016) can be built in the same O(n\sqrt{\lg n})$ time. Hence our work is the first that achieve the same preprocessing time for optimal orthogonal range reporting and range successor. We also apply our results to improve the construction time of text indexes

    Optimal Color Range Reporting in One Dimension

    Full text link
    Color (or categorical) range reporting is a variant of the orthogonal range reporting problem in which every point in the input is assigned a \emph{color}. While the answer to an orthogonal point reporting query contains all points in the query range QQ, the answer to a color reporting query contains only distinct colors of points in QQ. In this paper we describe an O(N)-space data structure that answers one-dimensional color reporting queries in optimal O(k+1)O(k+1) time, where kk is the number of colors in the answer and NN is the number of points in the data structure. Our result can be also dynamized and extended to the external memory model

    Range Queries on Uncertain Data

    Full text link
    Given a set PP of nn uncertain points on the real line, each represented by its one-dimensional probability density function, we consider the problem of building data structures on PP to answer range queries of the following three types for any query interval II: (1) top-11 query: find the point in PP that lies in II with the highest probability, (2) top-kk query: given any integer knk\leq n as part of the query, return the kk points in PP that lie in II with the highest probabilities, and (3) threshold query: given any threshold τ\tau as part of the query, return all points of PP that lie in II with probabilities at least τ\tau. We present data structures for these range queries with linear or nearly linear space and efficient query time.Comment: 26 pages. A preliminary version of this paper appeared in ISAAC 2014. In this full version, we also present solutions to the most general case of the problem (i.e., the histogram bounded case), which were left as open problems in the preliminary versio
    corecore