6,376 research outputs found

    Robust nearest-neighbor methods for classifying high-dimensional data

    Get PDF
    We suggest a robust nearest-neighbor approach to classifying high-dimensional data. The method enhances sensitivity by employing a threshold and truncates to a sequence of zeros and ones in order to reduce the deleterious impact of heavy-tailed data. Empirical rules are suggested for choosing the threshold. They require the bare minimum of data; only one data vector is needed from each population. Theoretical and numerical aspects of performance are explored, paying particular attention to the impacts of correlation and heterogeneity among data components. On the theoretical side, it is shown that our truncated, thresholded, nearest-neighbor classifier enjoys the same classification boundary as more conventional, nonrobust approaches, which require finite moments in order to achieve good performance. In particular, the greater robustness of our approach does not come at the price of reduced effectiveness. Moreover, when both training sample sizes equal 1, our new method can have performance equal to that of optimal classifiers that require independent and identically distributed data with known marginal distributions; yet, our classifier does not itself need conditions of this type.Comment: Published in at http://dx.doi.org/10.1214/08-AOS591 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Inference in components of variance models with low replication.

    Get PDF
    In components of variance models the data are viewed as arising through a sum of two random variables, representing between- and within-group variation, respectively. The former is generally interpreted as a group effect, and the latter as error. It is assumed that these variables are stochastically independent and that the distributions of the group effect and the error do not vary from one instance to another. If each group effect can be replicated a large number of times, then standard methods can be used to estimate the distributions of both the group effect and the error. This cannot be achieved without replication, however. How feasible is distribution estimation if it is not possible to replicate prolifically? Can the distributions of random effects and errors be estimated consistently from a small number of replications of each of a large number of noisy group effects, for example, in a nonparametric setting? Often extensive replication is practically infeasible, in particular, if inherently small numbers of individuals exhibit any given group effect. Yet it is quite unclear how to conduct inference in this case. We show that inference is possible, even if the number of replications is as small as 2. Two methods are proposed, both based on Fourier inversion. One, which is substantially more computer intensive than the other, exhibits better performance in numerical experiments.

    Regional Income Divergence in China

    Get PDF
    Numerous policy studies have argued that conditions have prevailed in China since the open door economic reforms of the late 1970s that have encouraged rapid growth at the expense of regional income inequality across the provinces of China. In this paper we use recently developed nonstationary panel techniques to provide empirical support for the fact that the long run tendency since the reforms has been for provincial level incomes to continue to diverge. More importantly, we show that this divergence cannot be attributed to the presence of separate, regional convergence clubs divided among common geographic subgroupings such as the coastal versus interior provinces. Furthermore, we also show that the divergence cannot be attributed to differences in the degree of preferential open-door policies. Rather, we find that the divergence is pervasive both nationally and within these various regional and political subgroupings. We argue that these results point to other causes for regional income divergence, and they also carry potentially important implications for other regions of the world.China, convergence, nonstationary panels

    THREE CENTURIES OF INEQUALITY IN BRITAIN AND AMERICA

    Get PDF
    Income and wealth inequality rose over the first 150 years of U.S. history. They may have risen at times in Britain before 1875. The first half of this century equalized pre-fisc incomes more in Britain than in America. From the 1970s to the 1990s inequality rose in both countries, reversing some of the previous equalization. Government redistribution explains part but not all of the reversals in inequality trends. Factor-market forces and economic growth would have produced a similar chronology of rises and falls in income inequality even without shifts in the progressivity of redistribution through government. For economies starting from highly unequal property ownership, the development process lowers inequality. History suggests, however, that this may happen only once. Redistribution toward the poor tends to happen least in those times and polities where it would seem most justified by the usual goals of welfare policy.

    Bootstrap tests for simple structures in nonparametric time series regression.

    Get PDF
    This paper concerns statistical tests for simple structures such as parametric models, lower order models and additivity in a general nonparametric autoregression setting. We propose to use a modified L2-distance between the nonparametric estimator of regression function and its counterpart under null hypothesis as our test statistic which delimits the contribution from areas where data are sparse. The asymptotic properties of the test statistic are established, which indicates the test statistic is asymptotically equivalent to a quadratic form of innovations. A regression type resampling scheme (i.e. wild bootstrap) is adapted to estimate the distribution of this quadratic form. Further, we have shown that asymptotically this bootstrap distribution is indeed the distribution of the test statistics under null hypothesis. The proposed methodology has been illustrated by both simulation and application to German stock index data.
    corecore