12,247 research outputs found

    Profile Likelihood Biclustering

    Full text link
    Biclustering, the process of simultaneously clustering the rows and columns of a data matrix, is a popular and effective tool for finding structure in a high-dimensional dataset. Many biclustering procedures appear to work well in practice, but most do not have associated consistency guarantees. To address this shortcoming, we propose a new biclustering procedure based on profile likelihood. The procedure applies to a broad range of data modalities, including binary, count, and continuous observations. We prove that the procedure recovers the true row and column classes when the dimensions of the data matrix tend to infinity, even if the functional form of the data distribution is misspecified. The procedure requires computing a combinatorial search, which can be expensive in practice. Rather than performing this search directly, we propose a new heuristic optimization procedure based on the Kernighan-Lin heuristic, which has nice computational properties and performs well in simulations. We demonstrate our procedure with applications to congressional voting records, and microarray analysis.Comment: 40 pages, 11 figures; R package in development at https://github.com/patperry/biclustp

    Minimax rank estimation for subspace tracking

    Full text link
    Rank estimation is a classical model order selection problem that arises in a variety of important statistical signal and array processing systems, yet is addressed relatively infrequently in the extant literature. Here we present sample covariance asymptotics stemming from random matrix theory, and bring them to bear on the problem of optimal rank estimation in the context of the standard array observation model with additive white Gaussian noise. The most significant of these results demonstrates the existence of a phase transition threshold, below which eigenvalues and associated eigenvectors of the sample covariance fail to provide any information on population eigenvalues. We then develop a decision-theoretic rank estimation framework that leads to a simple ordered selection rule based on thresholding; in contrast to competing approaches, however, it admits asymptotic minimax optimality and is free of tuning parameters. We analyze the asymptotic performance of our rank selection procedure and conclude with a brief simulation study demonstrating its practical efficacy in the context of subspace tracking.Comment: 10 pages, 4 figures; final versio
    • …
    corecore