Search CORE

2,338 research outputs found

An Efficient Global Optimality Certificate for Landmark-Based SLAM

Author: Barfoot Timothy D.
Holmes Connor
Publication venue
Publication date: 07/09/2022
Field of study

Modern state estimation is often formulated as an optimization problem and solved using efficient local search methods. These methods at best guarantee convergence to local minima, but, in some cases, global optimality can also be certified. Although such global optimality certificates have been well established for 3D \textit{pose-graph optimization}, the details have yet to be worked out for the 3D landmark-based SLAM problem, in which estimated states include both robot poses and map landmarks. In this paper, we address this gap by using graph-theoretic approach to cast the subproblems of landmark-based SLAM into a form that yields a sufficient condition for global optimality. Efficient methods of computing the optimality certificates for these subproblems exist, but first require the construction of a large data matrix. We show that this matrix can be constructed with complexity that remains linear in the number of landmarks and does not exceed the state-of-the-art computational complexity of one local solver iteration. We demonstrate the efficacy of the certificate on simulated and real-world landmark-based SLAM problems. Finally, we study the robustness of the global optimality certificate to measurement noise, taking into consideration the effect of the underlying measurement graph.Comment: 8 pages, 7 figure

arXiv.org e-Print Archive

Safe and Smooth: Certified Continuous-Time Range-Only Localization

Author: Barfoot Timothy D.
Dümbgen Frederike
Holmes Connor
Publication venue
Publication date: 29/09/2023
Field of study

A common approach to localize a mobile robot is by measuring distances to points of known positions, called anchors. Locating a device from distance measurements is typically posed as a non-convex optimization problem, stemming from the nonlinearity of the measurement model. Non-convex optimization problems may yield suboptimal solutions when local iterative solvers such as Gauss-Newton are employed. In this paper, we design an optimality certificate for continuous-time range-only localization. Our formulation allows for the integration of a motion prior, which ensures smoothness of the solution and is crucial for localizing from only a few distance measurements. The proposed certificate comes at little additional cost since it has the same complexity as the sparse local solver itself: linear in the number of positions. We show, both in simulation and on real-world datasets, that the efficient local solver often finds the globally optimal solution (confirmed by our certificate), but it may converge to local solutions with high errors, which our certificate correctly detects.Comment: 10 pages, 7 figures, accepted to IEEE Robotics and Automation Letters (this arXiv version contains supplementary appendix

arXiv.org e-Print Archive

On Semidefinite Relaxations for Matrix-Weighted State-Estimation Problems in Robotics

Author: Barfoot Timothy D
Dümbgen Frederike
Holmes Connor
Publication venue
Publication date: 14/08/2023
Field of study

In recent years, there has been remarkable progress in the development of so-called certifiable perception methods, which leverage semidefinite, convex relaxations to find global optima of perception problems in robotics. However, many of these relaxations rely on simplifying assumptions that facilitate the problem formulation, such as an isotropic measurement noise distribution. In this paper, we explore the tightness of the semidefinite relaxations of matrix-weighted (anisotropic) state-estimation problems and reveal the limitations lurking therein: matrix-weighted factors can cause convex relaxations to lose tightness. In particular, we show that the semidefinite relaxations of localization problems with matrix weights may be tight only for low noise levels. We empirically explore the factors that contribute to this loss of tightness and demonstrate that redundant constraints can be used to regain tightness, albeit at the expense of real-time performance. As a second technical contribution of this paper, we show that the state-of-the-art relaxation of scalar-weighted SLAM cannot be used when matrix weights are considered. We provide an alternate formulation and show that its SDP relaxation is not tight (even for very low noise levels) unless specific redundant constraints are used. We demonstrate the tightness of our formulations on both simulated and real-world data

arXiv.org e-Print Archive

Toward Globally Optimal State Estimation Using Automatically Tightened Semidefinite Relaxations

Author: Agro Ben
Barfoot Timothy D.
Dümbgen Frederike
Holmes Connor
Publication venue
Publication date: 08/09/2023
Field of study

In recent years, semidefinite relaxations of common optimization problems in robotics have attracted growing attention due to their ability to provide globally optimal solutions. In many cases, it was shown that specific handcrafted redundant constraints are required to obtain tight relaxations and thus global optimality. These constraints are formulation-dependent and typically require a lengthy manual process to find. Instead, the present paper suggests an automatic method to find a set of sufficient redundant constraints to obtain tightness, if they exist. We first propose an efficient feasibility check to determine if a given set of variables can lead to a tight formulation. Secondly, we show how to scale the method to problems of bigger size. At no point of the process do we have to manually find redundant constraints. We showcase the effectiveness of the approach, in simulation and on real datasets, for range-based localization and stereo-based pose estimation. Finally, we reproduce semidefinite relaxations presented in recent literature and show that our automatic method finds a smaller set of constraints sufficient for tightness than previously considered.Comment: 18 pages, 20 figure

arXiv.org e-Print Archive

Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers

Author: He Yuxiong
Holmes Connor
Li Cheng
Li Conglong
Wu Xiaoxia
Yao Zhewei
Zhang Minjia
Publication venue
Publication date: 17/11/2022
Field of study

Large-scale transformer models have become the de-facto architectures for various machine learning applications, e.g., CV and NLP. However, those large models also introduce prohibitive training costs. To mitigate this issue, we propose a novel random and layerwise token dropping method (random-LTD), which skips the computation of a subset of the input tokens at all middle layers. Particularly, random-LTD achieves considerable speedups and comparable accuracy as the standard training baseline. Compared to other token dropping methods, random-LTD does not require (1) any importance score-based metrics, (2) any special token treatment (e.g., [CLS]), and (3) many layers in full sequence length training except the first and the last layers. Besides, a new LayerToken learning rate schedule is proposed for pretraining problems that resolve the heavy tuning requirement for our proposed training mechanism. Finally, we demonstrate that random-LTD can be applied to broader applications, including GPT and BERT pretraining as well as ViT and GPT finetuning tasks. Our results show that random-LTD can save about 33.3% theoretical compute cost and 25.6% wall-clock training time while achieving similar zero-shot evaluations on GPT-31.3B as compared to baseline.Comment: 22 page

arXiv.org e-Print Archive

DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing

Author: He Yuxiong
Holmes Connor
Li Cheng
Li Conglong
Wu Xiaoxia
Yao Zhewei
Zhang Minjia
Publication venue
Publication date: 14/01/2024
Field of study

Recent advances on deep learning models come at the price of formidable training cost. The increasing model size is one of the root causes, but another less-emphasized fact is that data scale is actually increasing at a similar speed as model scale, and the training cost is proportional to both of them. Compared to the rapidly evolving model architecture, how to efficiently use the training data (especially for the expensive foundation model pretraining) is both less explored and difficult to realize due to the lack of a convenient framework that focuses on data efficiency capabilities. To this end, we present DeepSpeed Data Efficiency, a framework that makes better use of data, increases training efficiency, and improves model quality. Specifically, we propose and combine two data efficiency techniques: efficient data sampling via a general curriculum learning library, and efficient data routing via a novel random layerwise token dropping technique. For GPT-3 1.3B language model pretraining, our work achieves 12.5x less data/time/cost (\$3.7K if rent on Azure), while still maintaining 95% of model quality compared to baseline with full data and cost (\$46.3K). For GPT-3 1.3B and BERT-large pretraining, our work can also achieve the same model quality with up to 2x less data/time/cost, or achieve better model quality under same data/time/cost. DeepSpeed Data Efficiency is easy to use and tune, enabling us to easily apply it and verify its benefit on additional tasks including GPT-3 MoE model pretraining and small-scale GPT-2/ViT finetuning.Comment: Published in AAAI 2024 Main Technical Track. Equal contribution by the first 3 authors. Code has been released as a part of https://github.com/microsoft/DeepSpeed. Part of this paper is from our previous arxiv report (arXiv:2211.11586

arXiv.org e-Print Archive

ZeRO++: Extremely Efficient Collective Communication for Giant Model Training

Author: He Yuxiong
Holmes Connor
Jacobs Sam Ade
Qin Heyang
Rajbhandari Samyam
Ruwase Olatunji
Wang Guanhua
Yan Feng
Yang Lei
Publication venue
Publication date: 16/06/2023
Field of study

Zero Redundancy Optimizer (ZeRO) has been used to train a wide range of large language models on massive GPUs clusters due to its ease of use, efficiency, and good scalability. However, when training on low-bandwidth clusters, or at scale which forces batch size per GPU to be small, ZeRO's effective throughput is limited because of high communication volume from gathering weights in forward pass, backward pass, and averaging gradients. This paper introduces three communication volume reduction techniques, which we collectively refer to as ZeRO++, targeting each of the communication collectives in ZeRO. First is block-quantization based all-gather. Second is data remapping that trades-off communication for more memory. Third is a novel all-to-all based quantized gradient averaging paradigm as replacement of reduce-scatter collective, which preserves accuracy despite communicating low precision data. Collectively, ZeRO++ reduces communication volume of ZeRO by 4x, enabling up to 2.16x better throughput at 384 GPU scale.Comment: 12 page

arXiv.org e-Print Archive

STAR-loc: Dataset for STereo And Range-based localization

Author: Barfoot Timothy D.
Cossette Charles C.
Dümbgen Frederike
Forbes James R.
Holmes Connor
Ny Jerome Le
Shalaby Mohammed A.
Publication venue
Publication date: 11/09/2023
Field of study

This document contains a detailed description of the STAR-loc dataset. For a quick starting guide please refer to the associated Github repository (https://github.com/utiasASRL/starloc). The dataset consists of stereo camera data (rectified/raw images and inertial measurement unit measurements) and ultra-wideband (UWB) data (range measurements) collected on a sensor rig in a Vicon motion capture arena. The UWB anchors and visual landmarks (Apriltags) are of known position, so the dataset can be used for both localization and Simultaneous Localization and Mapping (SLAM).Comment: 15 pages, 15 figure

arXiv.org e-Print Archive

RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model

Author: Bie Fengxiang
Clifton David A.
Ghanem Adam
Golnari Pareesa
He Yuxiong
Holmes Connor
Song Shuaiwen Leon
Tao Dacheng
Wu Xiaoxia
Yang Yibo
Yao Zhewei
Zhang Minjia
Zhou Zhongzhu
Publication venue
Publication date: 01/09/2023
Field of study

Text-to-image generation (TTI) refers to the usage of models that could process text input and generate high fidelity images based on text descriptions. Text-to-image generation using neural networks could be traced back to the emergence of Generative Adversial Network (GAN), followed by the autoregressive Transformer. Diffusion models are one prominent type of generative model used for the generation of images through the systematic introduction of noises with repeating steps. As an effect of the impressive results of diffusion models on image synthesis, it has been cemented as the major image decoder used by text-to-image models and brought text-to-image generation to the forefront of machine-learning (ML) research. In the era of large models, scaling up model size and the integration with large language models have further improved the performance of TTI models, resulting the generation result nearly indistinguishable from real-world images, revolutionizing the way we retrieval images. Our explorative study has incentivised us to think that there are further ways of scaling text-to-image models with the combination of innovative model architectures and prediction enhancement techniques. We have divided the work of this survey into five main sections wherein we detail the frameworks of major literature in order to delve into the different types of text-to-image generation methods. Following this we provide a detailed comparison and critique of these methods and offer possible pathways of improvement for future work. In the future work, we argue that TTI development could yield impressive productivity improvements for creation, particularly in the context of the AIGC era, and could be extended to more complex tasks such as video generation and 3D generation

arXiv.org e-Print Archive