848 research outputs found

    An adaptive nearest neighbor rule for classification

    Full text link
    We introduce a variant of the kk-nearest neighbor classifier in which kk is chosen adaptively for each query, rather than supplied as a parameter. The choice of kk depends on properties of each neighborhood, and therefore may significantly vary between different points. (For example, the algorithm will use larger kk for predicting the labels of points in noisy regions.) We provide theory and experiments that demonstrate that the algorithm performs comparably to, and sometimes better than, kk-NN with an optimal choice of kk. In particular, we derive bounds on the convergence rates of our classifier that depend on a local quantity we call the `advantage' which is significantly weaker than the Lipschitz conditions used in previous convergence rate proofs. These generalization bounds hinge on a variant of the seminal Uniform Convergence Theorem due to Vapnik and Chervonenkis; this variant concerns conditional probabilities and may be of independent interest

    A free central-limit theorem for dynamical systems

    Full text link
    The free central-limit theorem, a fundamental theorem in free probability, states that empirical averages of freely independent random variables are asymptotically semi-circular. We extend this theorem to general dynamical systems of operators that we define using a free random variable XX coupled with a group of *-automorphims describing the evolution of XX. We introduce free mixing coefficients that measure how far a dynamical system is from being freely independent. Under conditions on those coefficients, we prove that the free central-limit theorem also holds for these processes and provide Berry-Essen bounds. We generalize this to triangular arrays and U-statistics. Finally we draw connections with classical probability and random matrix theory with a series of examples

    Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values

    Full text link
    This work is motivated by the needs of predictive analytics on healthcare data as represented by Electronic Medical Records. Such data is invariably problematic: noisy, with missing entries, with imbalance in classes of interests, leading to serious bias in predictive modeling. Since standard data mining methods often produce poor performance measures, we argue for development of specialized techniques of data-preprocessing and classification. In this paper, we propose a new method to simultaneously classify large datasets and reduce the effects of missing values. It is based on a multilevel framework of the cost-sensitive SVM and the expected maximization imputation method for missing values, which relies on iterated regression analyses. We compare classification results of multilevel SVM-based algorithms on public benchmark datasets with imbalanced classes and missing values as well as real data in health applications, and show that our multilevel SVM-based method produces fast, and more accurate and robust classification results.Comment: arXiv admin note: substantial text overlap with arXiv:1503.0625

    Some invariant biorthogonal sets with an application to coherent states

    Full text link
    We show how to construct, out of a certain basis invariant under the action of one or more unitary operators, a second biorthogonal set with similar properties. In particular, we discuss conditions for this new set to be also a basis of the Hilbert space, and we apply the procedure to coherent states. We conclude the paper considering a simple application of our construction to pseudo-hermitian quantum mechanics.Comment: in press in Journal of Mathematical Analysis and Applications. arXiv admin note: text overlap with arXiv:0904.088
    • …
    corecore