158 research outputs found

    Blessing of High-Order Dimensionality: from Non-Convex to Convex Optimization for Sensor Network Localization

    Full text link
    This paper investigates the Sensor Network Localization (SNL) problem, which seeks to determine sensor locations based on known anchor locations and partially given anchors-sensors and sensors-sensors distances. Two primary methods for solving the SNL problem are analyzed: the low-dimensional method that directly minimizes a loss function, and the high-dimensional semi-definite relaxation (SDR) method that reformulates the SNL problem as an SDP (semi-definite programming) problem. The paper primarily focuses on the intrinsic non-convexity of the loss function of the low-dimensional method, which is shown in our main theorem. The SDR method, via second-order dimension augmentation, is discussed in the context of its ability to transform non-convex problems into convex ones; while the first-order direct dimension augmentation fails. Additionally, we will show that more edges don't necessarily contribute to the better convexity of the loss function. Moreover, we provide an explanation for the success of the SDR+GD (gradient descent) method which uses the SDR solution as a warm-start of the minimization of the loss function by gradient descent. The paper also explores the parallels among SNL, max-cut, and neural networks in terms of the blessing of high-order dimension augmentation.Comment: 25 pages, 9 figures. References in arxiv. arXiv:1801.06146, arXiv:1810.04805, arXiv preprint arXiv:1906.05474, arXiv preprint arXiv:1801.0614

    Inefficiency of K-FAC for Large Batch Size Training

    Full text link
    In stochastic optimization, using large batch sizes during training can leverage parallel resources to produce faster wall-clock training times per training epoch. However, for both training loss and testing error, recent results analyzing large batch Stochastic Gradient Descent (SGD) have found sharp diminishing returns, beyond a certain critical batch size. In the hopes of addressing this, it has been suggested that the Kronecker-Factored Approximate Curvature (\mbox{K-FAC}) method allows for greater scalability to large batch sizes, for non-convex machine learning problems such as neural network optimization, as well as greater robustness to variation in model hyperparameters. Here, we perform a detailed empirical analysis of large batch size training %of these two hypotheses, for both \mbox{K-FAC} and SGD, evaluating performance in terms of both wall-clock time and aggregate computational cost. Our main results are twofold: first, we find that both \mbox{K-FAC} and SGD doesn't have ideal scalability behavior beyond a certain batch size, and that \mbox{K-FAC} does not exhibit improved large-batch scalability behavior, as compared to SGD; and second, we find that \mbox{K-FAC}, in addition to requiring more hyperparameters to tune, suffers from similar hyperparameter sensitivity behavior as does SGD. We discuss extensive results using ResNet and AlexNet on \mbox{CIFAR-10} and SVHN, respectively, as well as more general implications of our findings

    UT5: Pretraining Non autoregressive T5 with unrolled denoising

    Full text link
    Recent advances in Transformer-based Large Language Models have made great strides in natural language generation. However, to decode K tokens, an autoregressive model needs K sequential forward passes, which may be a performance bottleneck for large language models. Many non-autoregressive (NAR) research are aiming to address this sequentiality bottleneck, albeit many have focused on a dedicated architecture in supervised benchmarks. In this work, we studied unsupervised pretraining for non auto-regressive T5 models via unrolled denoising and shown its SoTA results in downstream generation tasks such as SQuAD question generation and XSum

    Feature selective temporal prediction of Alzheimer’s disease progression using hippocampus surface morphometry

    Full text link
    IntroductionPrediction of Alzheimer’s disease (AD) progression based on baseline measures allows us to understand disease progression and has implications in decisions concerning treatment strategy. To this end, we combine a predictive multi‐task machine learning method (cFSGL) with a novel MR‐based multivariate morphometric surface map of the hippocampus (mTBM) to predict future cognitive scores of patients.MethodsPrevious work has shown that a multi‐task learning framework that performs prediction of all future time points simultaneously (cFSGL) can be used to encode both sparsity as well as temporal smoothness. The authors showed that this method is able to predict cognitive outcomes of ADNI subjects using FreeSurfer‐based baseline MRI features, MMSE score demographic information and ApoE status. Whilst volumetric information may hold generalized information on brain status, we hypothesized that hippocampus specific information may be more useful in predictive modeling of AD. To this end, we applied a multivariate tensor‐based parametric surface analysis method (mTBM) to extract features from the hippocampal surfaces.ResultsWe combined mTBM features with traditional surface features such as middle axis distance, the Jacobian determinant as well as 2 of the Jacobian principal eigenvalues to yield 7 normalized hippocampal surface maps of 300 points each. By combining these 7 × 300 = 2100 features together with the previous ~350 features, we illustrate how this type of sparsifying method can be applied to an entire surface map of the hippocampus that yields a feature space that is 2 orders of magnitude larger than what was previously attempted.ConclusionsBy combining the power of the cFSGL multi‐task machine learning framework with the addition of AD sensitive mTBM feature maps of the hippocampus surface, we are able to improve the predictive performance of ADAS cognitive scores 6, 12, 24, 36 and 48 months from baseline.In this work, we present our results of using machine learning to predict temporal behavior changes in Alzheimers Disease using entire topological feature maps of the hippocampus surface (2100 feature points). Our paper demonstrates that it is possible to use an entire topological map instead of just imaging derived volumetric measurements for predicting behavioral changes. We compare these results with previous results using only volumetric MR imaging features (309 features points) and show through repeated cross‐validation rounds that we are able to get better predictive power.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/137757/1/brb3733_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/137757/2/brb3733.pd
    corecore