2,338 research outputs found
An Efficient Global Optimality Certificate for Landmark-Based SLAM
Modern state estimation is often formulated as an optimization problem and
solved using efficient local search methods. These methods at best guarantee
convergence to local minima, but, in some cases, global optimality can also be
certified. Although such global optimality certificates have been well
established for 3D \textit{pose-graph optimization}, the details have yet to be
worked out for the 3D landmark-based SLAM problem, in which estimated states
include both robot poses and map landmarks. In this paper, we address this gap
by using graph-theoretic approach to cast the subproblems of landmark-based
SLAM into a form that yields a sufficient condition for global optimality.
Efficient methods of computing the optimality certificates for these
subproblems exist, but first require the construction of a large data matrix.
We show that this matrix can be constructed with complexity that remains linear
in the number of landmarks and does not exceed the state-of-the-art
computational complexity of one local solver iteration. We demonstrate the
efficacy of the certificate on simulated and real-world landmark-based SLAM
problems. Finally, we study the robustness of the global optimality certificate
to measurement noise, taking into consideration the effect of the underlying
measurement graph.Comment: 8 pages, 7 figure
Safe and Smooth: Certified Continuous-Time Range-Only Localization
A common approach to localize a mobile robot is by measuring distances to
points of known positions, called anchors. Locating a device from distance
measurements is typically posed as a non-convex optimization problem, stemming
from the nonlinearity of the measurement model. Non-convex optimization
problems may yield suboptimal solutions when local iterative solvers such as
Gauss-Newton are employed. In this paper, we design an optimality certificate
for continuous-time range-only localization. Our formulation allows for the
integration of a motion prior, which ensures smoothness of the solution and is
crucial for localizing from only a few distance measurements. The proposed
certificate comes at little additional cost since it has the same complexity as
the sparse local solver itself: linear in the number of positions. We show,
both in simulation and on real-world datasets, that the efficient local solver
often finds the globally optimal solution (confirmed by our certificate), but
it may converge to local solutions with high errors, which our certificate
correctly detects.Comment: 10 pages, 7 figures, accepted to IEEE Robotics and Automation Letters
(this arXiv version contains supplementary appendix
On Semidefinite Relaxations for Matrix-Weighted State-Estimation Problems in Robotics
In recent years, there has been remarkable progress in the development of
so-called certifiable perception methods, which leverage semidefinite, convex
relaxations to find global optima of perception problems in robotics. However,
many of these relaxations rely on simplifying assumptions that facilitate the
problem formulation, such as an isotropic measurement noise distribution. In
this paper, we explore the tightness of the semidefinite relaxations of
matrix-weighted (anisotropic) state-estimation problems and reveal the
limitations lurking therein: matrix-weighted factors can cause convex
relaxations to lose tightness. In particular, we show that the semidefinite
relaxations of localization problems with matrix weights may be tight only for
low noise levels. We empirically explore the factors that contribute to this
loss of tightness and demonstrate that redundant constraints can be used to
regain tightness, albeit at the expense of real-time performance. As a second
technical contribution of this paper, we show that the state-of-the-art
relaxation of scalar-weighted SLAM cannot be used when matrix weights are
considered. We provide an alternate formulation and show that its SDP
relaxation is not tight (even for very low noise levels) unless specific
redundant constraints are used. We demonstrate the tightness of our
formulations on both simulated and real-world data
Toward Globally Optimal State Estimation Using Automatically Tightened Semidefinite Relaxations
In recent years, semidefinite relaxations of common optimization problems in
robotics have attracted growing attention due to their ability to provide
globally optimal solutions. In many cases, it was shown that specific
handcrafted redundant constraints are required to obtain tight relaxations and
thus global optimality. These constraints are formulation-dependent and
typically require a lengthy manual process to find. Instead, the present paper
suggests an automatic method to find a set of sufficient redundant constraints
to obtain tightness, if they exist. We first propose an efficient feasibility
check to determine if a given set of variables can lead to a tight formulation.
Secondly, we show how to scale the method to problems of bigger size. At no
point of the process do we have to manually find redundant constraints. We
showcase the effectiveness of the approach, in simulation and on real datasets,
for range-based localization and stereo-based pose estimation. Finally, we
reproduce semidefinite relaxations presented in recent literature and show that
our automatic method finds a smaller set of constraints sufficient for
tightness than previously considered.Comment: 18 pages, 20 figure
Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers
Large-scale transformer models have become the de-facto architectures for
various machine learning applications, e.g., CV and NLP. However, those large
models also introduce prohibitive training costs. To mitigate this issue, we
propose a novel random and layerwise token dropping method (random-LTD), which
skips the computation of a subset of the input tokens at all middle layers.
Particularly, random-LTD achieves considerable speedups and comparable accuracy
as the standard training baseline. Compared to other token dropping methods,
random-LTD does not require (1) any importance score-based metrics, (2) any
special token treatment (e.g., [CLS]), and (3) many layers in full sequence
length training except the first and the last layers. Besides, a new LayerToken
learning rate schedule is proposed for pretraining problems that resolve the
heavy tuning requirement for our proposed training mechanism. Finally, we
demonstrate that random-LTD can be applied to broader applications, including
GPT and BERT pretraining as well as ViT and GPT finetuning tasks. Our results
show that random-LTD can save about 33.3% theoretical compute cost and 25.6%
wall-clock training time while achieving similar zero-shot evaluations on
GPT-31.3B as compared to baseline.Comment: 22 page
DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
Recent advances on deep learning models come at the price of formidable
training cost. The increasing model size is one of the root causes, but another
less-emphasized fact is that data scale is actually increasing at a similar
speed as model scale, and the training cost is proportional to both of them.
Compared to the rapidly evolving model architecture, how to efficiently use the
training data (especially for the expensive foundation model pretraining) is
both less explored and difficult to realize due to the lack of a convenient
framework that focuses on data efficiency capabilities. To this end, we present
DeepSpeed Data Efficiency, a framework that makes better use of data, increases
training efficiency, and improves model quality. Specifically, we propose and
combine two data efficiency techniques: efficient data sampling via a general
curriculum learning library, and efficient data routing via a novel random
layerwise token dropping technique. For GPT-3 1.3B language model pretraining,
our work achieves 12.5x less data/time/cost (\$3.7K if rent on Azure), while
still maintaining 95% of model quality compared to baseline with full data and
cost (\$46.3K). For GPT-3 1.3B and BERT-large pretraining, our work can also
achieve the same model quality with up to 2x less data/time/cost, or achieve
better model quality under same data/time/cost. DeepSpeed Data Efficiency is
easy to use and tune, enabling us to easily apply it and verify its benefit on
additional tasks including GPT-3 MoE model pretraining and small-scale
GPT-2/ViT finetuning.Comment: Published in AAAI 2024 Main Technical Track. Equal contribution by
the first 3 authors. Code has been released as a part of
https://github.com/microsoft/DeepSpeed. Part of this paper is from our
previous arxiv report (arXiv:2211.11586
ZeRO++: Extremely Efficient Collective Communication for Giant Model Training
Zero Redundancy Optimizer (ZeRO) has been used to train a wide range of large
language models on massive GPUs clusters due to its ease of use, efficiency,
and good scalability. However, when training on low-bandwidth clusters, or at
scale which forces batch size per GPU to be small, ZeRO's effective throughput
is limited because of high communication volume from gathering weights in
forward pass, backward pass, and averaging gradients. This paper introduces
three communication volume reduction techniques, which we collectively refer to
as ZeRO++, targeting each of the communication collectives in ZeRO. First is
block-quantization based all-gather. Second is data remapping that trades-off
communication for more memory. Third is a novel all-to-all based quantized
gradient averaging paradigm as replacement of reduce-scatter collective, which
preserves accuracy despite communicating low precision data. Collectively,
ZeRO++ reduces communication volume of ZeRO by 4x, enabling up to 2.16x better
throughput at 384 GPU scale.Comment: 12 page
STAR-loc: Dataset for STereo And Range-based localization
This document contains a detailed description of the STAR-loc dataset. For a
quick starting guide please refer to the associated Github repository
(https://github.com/utiasASRL/starloc). The dataset consists of stereo camera
data (rectified/raw images and inertial measurement unit measurements) and
ultra-wideband (UWB) data (range measurements) collected on a sensor rig in a
Vicon motion capture arena. The UWB anchors and visual landmarks (Apriltags)
are of known position, so the dataset can be used for both localization and
Simultaneous Localization and Mapping (SLAM).Comment: 15 pages, 15 figure
RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model
Text-to-image generation (TTI) refers to the usage of models that could
process text input and generate high fidelity images based on text
descriptions. Text-to-image generation using neural networks could be traced
back to the emergence of Generative Adversial Network (GAN), followed by the
autoregressive Transformer. Diffusion models are one prominent type of
generative model used for the generation of images through the systematic
introduction of noises with repeating steps. As an effect of the impressive
results of diffusion models on image synthesis, it has been cemented as the
major image decoder used by text-to-image models and brought text-to-image
generation to the forefront of machine-learning (ML) research. In the era of
large models, scaling up model size and the integration with large language
models have further improved the performance of TTI models, resulting the
generation result nearly indistinguishable from real-world images,
revolutionizing the way we retrieval images. Our explorative study has
incentivised us to think that there are further ways of scaling text-to-image
models with the combination of innovative model architectures and prediction
enhancement techniques. We have divided the work of this survey into five main
sections wherein we detail the frameworks of major literature in order to delve
into the different types of text-to-image generation methods. Following this we
provide a detailed comparison and critique of these methods and offer possible
pathways of improvement for future work. In the future work, we argue that TTI
development could yield impressive productivity improvements for creation,
particularly in the context of the AIGC era, and could be extended to more
complex tasks such as video generation and 3D generation
- …