1 research outputs found
Spherical Regression under Mismatch Corruption with Application to Automated Knowledge Translation
Motivated by a series of applications in data integration, language
translation, bioinformatics, and computer vision, we consider spherical
regression with two sets of unit-length vectors when the data are corrupted by
a small fraction of mismatch in the response-predictor pairs. We propose a
three-step algorithm in which we initialize the parameters by solving an
orthogonal Procrustes problem to estimate a translation matrix
ignoring the mismatch. We then estimate a mapping matrix aiming to correct the
mismatch using hard-thresholding to induce sparsity, while incorporating
potential group information. We eventually obtain a refined estimate for
by removing the estimated mismatched pairs. We derive the error
bound for the initial estimate of in both fixed and
high-dimensional setting. We demonstrate that the refined estimate of
achieves an error rate that is as good as if no mismatch is
present. We show that our mapping recovery method not only correctly
distinguishes one-to-one and one-to-many correspondences, but also consistently
identifies the matched pairs and estimates the weight vector for combined
correspondence. We examine the finite sample performance of the proposed method
via extensive simulation studies, and with application to the unsupervised
translation of medical codes using electronic health records data