53 research outputs found

    Optimal detection of the feature matching map in presence of noise and outliers

    Full text link
    We consider the problem of finding the matching map between two sets of dd dimensional vectors from noisy observations, where the second set contains outliers. The matching map is then an injection, which can be consistently estimated only if the vectors of the second set are well separated. The main result shows that, in the high-dimensional setting, a detection region of unknown injection can be characterized by the sets of vectors for which the inlier-inlier distance is of order at least d1/4d^{1/4} and the inlier-outlier distance is of order at least d1/2d^{1/2}. These rates are achieved using the estimated matching minimizing the sum of logarithms of distances between matched pairs of points. We also prove lower bounds establishing optimality of these rates. Finally, we report results of numerical experiments on both synthetic and real world data that illustrate our theoretical results and provide further insight into the properties of the estimators studied in this work

    Optimal Permutation Estimation in Crowd-Sourcing problems

    Full text link
    Motivated by crowd-sourcing applications, we consider a model where we have partial observations from a bivariate isotonic n x d matrix with an unknown permutation π\pi * acting on its rows. Focusing on the twin problems of recovering the permutation π\pi * and estimating the unknown matrix, we introduce a polynomial-time procedure achieving the minimax risk for these two problems, this for all possible values of n, d, and all possible sampling efforts. Along the way, we establish that, in some regimes, recovering the unknown permutation π\pi * is considerably simpler than estimating the matrix

    Powerful modifications of William' test on trend

    Get PDF
    [no abstract

    Smoothing and Ordering in Discriminant Analysis

    Get PDF
    This thesis addresses the question of how to achieve reliable estimation of the posterior probability function in discriminant analysis, both for continuous and ordered discrete feature variables. In the latter instance we are also concerned with the estimation of a posterior, which, regarded as a function of the feature variables, is ordered with respect to one or more independent variable. Chapter 1 introduces the discrimination problem, establishes notation and describes the possible approaches. Methods of density estimation, for use in discriminant analysis, are described, including the kernel method, as are some more direct approaches to discrimination and classification. Some comparative studies and their conclusions are reviewed. Means of assessing the performance of a discriminant rule are described with emphasis on measures of reliability rather than separation. The final section mentions briefly the important problem of variable selection, although this is not addressed elsewhere in the thesis. Chapter 2 addresses the problem of choosing smoothing parameters in kernel density estimation with continuous variables when this is to be used in the discrimination context. It is natural to suspect that the optimal degree of smoothing for marginal density estimates may not be that which will produce an optimal density ratio or posterior probability function when two such estimates are combined. A simulation study confirms that some popular methods for choosing the smoothing parameter can produce an estimated density ratio which Is poor in terms of mean square error. Some alternatives are proposed based on direct assessment measures of reliability, not of the marginal estimates but of the predicted probabilities. These are compared to the marginal approaches. To a more limited extent, the optimal (minimum mean square error) kernel method is compared to an optimal spline estimate of the density ratio. Both the marginal and direct methods are then applied to a real data set and the resulting estimates compared with a spline estimate. Chapter 3 discusses ordered variables, from qualitative orderings to grouped continuous variables, ways in which ordering can affect a data set and suitable models In each case. Particular emphasis is given to discrete kernel estimators and isotonic regression techniques. Some problems in applying existing algorithms for the latter are described and suggestions made for overcoming these. Chapter 4 applies ordered kernels and isotonic regression to 1- and 2-dimensional problems using the data of Titterington et al. (1981), concluding that the kernel methods are unable to recover the type of ordering manifested by the data and that a diagnostic approach is required. The results are compared in the univariate case to those in Chapter 2, Section 2.6 which used continuous kernels. The use of isotonic regression is then compared with 2 logistic models and an independence model using the same data set but with 3 variables. Suggestions are made for further smoothing of the isotonic estimator, 2 of which are implemented. Finally, Chapter 5 draws some conclusions and makes suggestions for further work. In particular, isotonic splines may be worthy of investigation

    Statistical Estimation And Inference For Permutation Based Model

    Get PDF
    Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data. People spend lots of time dealing with different kinds of data sets. The structure of the data plays an important role in statistics. Among different structures of data, one interesting structure is the permutation, which involves in different kinds of problems, such as recommender system, online gaming, decision making and sports tournament. This thesis is motivated by my interest in understanding the permutation in statistics. Comparing to the wide applications of permutation related model, little is known to the property of permutation in statistics. There are a variety challenges that arise and lots of problems waiting for us to explore in the permutation based model. This thesis aims to solve several interesting problems of the permutation based model in statistics, which may help us to understand more about the property and characteristic of permutation. As a result of the various topics explored, this thesis is split into three parts. In Chapter 2, we discuss the estimation problem of unimodal SST model in the pairwise comparison problem. We prove that the CLS estimator is rate optimal up to a poly(log log n) factor and propose the computational efficient interval sorting estimator, as a computational efficient algorithm to the estimation problem. In Chapter 3, we shift our attention to the inference problem of the permutation based model. We study different kinds of inference problem, including the hypothesis testing problem in noisy sorting model and confidence set construction problems in generalized permutation based model. Network analysis is another important topic related to the permutation. In Chapter 4, we study the optimality of local belief propagation algorithm in the partial recovery problem of stochastic block model. We prove that local BP algorithm can reach the optimality in a certain regime. Moreover, in the regime where local BP algorithm may not achieve the optimal misclassified fraction, we will prove that local BP algorithm can be used in correcting other algorithms and get optimal algorithm to the partial recovery problem
    • …
    corecore