158 research outputs found
Blessing of High-Order Dimensionality: from Non-Convex to Convex Optimization for Sensor Network Localization
This paper investigates the Sensor Network Localization (SNL) problem, which
seeks to determine sensor locations based on known anchor locations and
partially given anchors-sensors and sensors-sensors distances. Two primary
methods for solving the SNL problem are analyzed: the low-dimensional method
that directly minimizes a loss function, and the high-dimensional semi-definite
relaxation (SDR) method that reformulates the SNL problem as an SDP
(semi-definite programming) problem. The paper primarily focuses on the
intrinsic non-convexity of the loss function of the low-dimensional method,
which is shown in our main theorem. The SDR method, via second-order dimension
augmentation, is discussed in the context of its ability to transform
non-convex problems into convex ones; while the first-order direct dimension
augmentation fails. Additionally, we will show that more edges don't
necessarily contribute to the better convexity of the loss function. Moreover,
we provide an explanation for the success of the SDR+GD (gradient descent)
method which uses the SDR solution as a warm-start of the minimization of the
loss function by gradient descent. The paper also explores the parallels among
SNL, max-cut, and neural networks in terms of the blessing of high-order
dimension augmentation.Comment: 25 pages, 9 figures. References in arxiv. arXiv:1801.06146,
arXiv:1810.04805, arXiv preprint arXiv:1906.05474, arXiv preprint
arXiv:1801.0614
Inefficiency of K-FAC for Large Batch Size Training
In stochastic optimization, using large batch sizes during training can
leverage parallel resources to produce faster wall-clock training times per
training epoch. However, for both training loss and testing error, recent
results analyzing large batch Stochastic Gradient Descent (SGD) have found
sharp diminishing returns, beyond a certain critical batch size. In the hopes
of addressing this, it has been suggested that the Kronecker-Factored
Approximate Curvature (\mbox{K-FAC}) method allows for greater scalability to
large batch sizes, for non-convex machine learning problems such as neural
network optimization, as well as greater robustness to variation in model
hyperparameters. Here, we perform a detailed empirical analysis of large batch
size training %of these two hypotheses, for both \mbox{K-FAC} and SGD,
evaluating performance in terms of both wall-clock time and aggregate
computational cost. Our main results are twofold: first, we find that both
\mbox{K-FAC} and SGD doesn't have ideal scalability behavior beyond a certain
batch size, and that \mbox{K-FAC} does not exhibit improved large-batch
scalability behavior, as compared to SGD; and second, we find that
\mbox{K-FAC}, in addition to requiring more hyperparameters to tune, suffers
from similar hyperparameter sensitivity behavior as does SGD. We discuss
extensive results using ResNet and AlexNet on \mbox{CIFAR-10} and SVHN,
respectively, as well as more general implications of our findings
UT5: Pretraining Non autoregressive T5 with unrolled denoising
Recent advances in Transformer-based Large Language Models have made great
strides in natural language generation. However, to decode K tokens, an
autoregressive model needs K sequential forward passes, which may be a
performance bottleneck for large language models. Many non-autoregressive (NAR)
research are aiming to address this sequentiality bottleneck, albeit many have
focused on a dedicated architecture in supervised benchmarks. In this work, we
studied unsupervised pretraining for non auto-regressive T5 models via unrolled
denoising and shown its SoTA results in downstream generation tasks such as
SQuAD question generation and XSum
Feature selective temporal prediction of Alzheimer’s disease progression using hippocampus surface morphometry
IntroductionPrediction of Alzheimer’s disease (AD) progression based on baseline measures allows us to understand disease progression and has implications in decisions concerning treatment strategy. To this end, we combine a predictive multi‐task machine learning method (cFSGL) with a novel MR‐based multivariate morphometric surface map of the hippocampus (mTBM) to predict future cognitive scores of patients.MethodsPrevious work has shown that a multi‐task learning framework that performs prediction of all future time points simultaneously (cFSGL) can be used to encode both sparsity as well as temporal smoothness. The authors showed that this method is able to predict cognitive outcomes of ADNI subjects using FreeSurfer‐based baseline MRI features, MMSE score demographic information and ApoE status. Whilst volumetric information may hold generalized information on brain status, we hypothesized that hippocampus specific information may be more useful in predictive modeling of AD. To this end, we applied a multivariate tensor‐based parametric surface analysis method (mTBM) to extract features from the hippocampal surfaces.ResultsWe combined mTBM features with traditional surface features such as middle axis distance, the Jacobian determinant as well as 2 of the Jacobian principal eigenvalues to yield 7 normalized hippocampal surface maps of 300 points each. By combining these 7 × 300 = 2100 features together with the previous ~350 features, we illustrate how this type of sparsifying method can be applied to an entire surface map of the hippocampus that yields a feature space that is 2 orders of magnitude larger than what was previously attempted.ConclusionsBy combining the power of the cFSGL multi‐task machine learning framework with the addition of AD sensitive mTBM feature maps of the hippocampus surface, we are able to improve the predictive performance of ADAS cognitive scores 6, 12, 24, 36 and 48 months from baseline.In this work, we present our results of using machine learning to predict temporal behavior changes in Alzheimers Disease using entire topological feature maps of the hippocampus surface (2100 feature points). Our paper demonstrates that it is possible to use an entire topological map instead of just imaging derived volumetric measurements for predicting behavioral changes. We compare these results with previous results using only volumetric MR imaging features (309 features points) and show through repeated cross‐validation rounds that we are able to get better predictive power.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/137757/1/brb3733_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/137757/2/brb3733.pd
- …