161 research outputs found

    Max-Sliced Mutual Information

    Full text link
    Quantifying the dependence between high-dimensional random variables is central to statistical learning and inference. Two classical methods are canonical correlation analysis (CCA), which identifies maximally correlated projected versions of the original variables, and Shannon's mutual information, which is a universal dependence measure that also captures high-order dependencies. However, CCA only accounts for linear dependence, which may be insufficient for certain applications, while mutual information is often infeasible to compute/estimate in high dimensions. This work proposes a middle ground in the form of a scalable information-theoretic generalization of CCA, termed max-sliced mutual information (mSMI). mSMI equals the maximal mutual information between low-dimensional projections of the high-dimensional variables, which reduces back to CCA in the Gaussian case. It enjoys the best of both worlds: capturing intricate dependencies in the data while being amenable to fast computation and scalable estimation from samples. We show that mSMI retains favorable structural properties of Shannon's mutual information, like variational forms and identification of independence. We then study statistical estimation of mSMI, propose an efficiently computable neural estimator, and couple it with formal non-asymptotic error bounds. We present experiments that demonstrate the utility of mSMI for several tasks, encompassing independence testing, multi-view representation learning, algorithmic fairness, and generative modeling. We observe that mSMI consistently outperforms competing methods with little-to-no computational overhead.Comment: Accepted at NeurIPS 202

    Privacy Preserving Domain Adaptation for Semantic Segmentation of Medical Images

    Full text link
    Convolutional neural networks (CNNs) have led to significant improvements in tasks involving semantic segmentation of images. CNNs are vulnerable in the area of biomedical image segmentation because of distributional gap between two source and target domains with different data modalities which leads to domain shift. Domain shift makes data annotations in new modalities necessary because models must be retrained from scratch. Unsupervised domain adaptation (UDA) is proposed to adapt a model to new modalities using solely unlabeled target domain data. Common UDA algorithms require access to data points in the source domain which may not be feasible in medical imaging due to privacy concerns. In this work, we develop an algorithm for UDA in a privacy-constrained setting, where the source domain data is inaccessible. Our idea is based on encoding the information from the source samples into a prototypical distribution that is used as an intermediate distribution for aligning the target domain distribution with the source domain distribution. We demonstrate the effectiveness of our algorithm by comparing it to state-of-the-art medical image semantic segmentation approaches on two medical image semantic segmentation datasets
    • …
    corecore