148 research outputs found
Quantitative Robustness Analysis of Quantum Programs (Extended Version)
Quantum computation is a topic of significant recent interest, with practical
advances coming from both research and industry. A major challenge in quantum
programming is dealing with errors (quantum noise) during execution. Because
quantum resources (e.g., qubits) are scarce, classical error correction
techniques applied at the level of the architecture are currently
cost-prohibitive. But while this reality means that quantum programs are almost
certain to have errors, there as yet exists no principled means to reason about
erroneous behavior. This paper attempts to fill this gap by developing a
semantics for erroneous quantum while-programs, as well as a logic for
reasoning about them. This logic permits proving a property we have identified,
called -robustness, which characterizes possible "distance" between
an ideal program and an erroneous one. We have proved the logic sound, and
showed its utility on several case studies, notably: (1) analyzing the
robustness of noisy versions of the quantum Bernoulli factory (QBF) and quantum
walk (QW); (2) demonstrating the (in)effectiveness of different error
correction schemes on single-qubit errors; and (3) analyzing the robustness of
a fault-tolerant version of QBF.Comment: 34 pages, LaTeX; v2: fixed typo
Zero-Shot Wireless Indoor Navigation through Physics-Informed Reinforcement Learning
The growing focus on indoor robot navigation utilizing wireless signals has
stemmed from the capability of these signals to capture high-resolution angular
and temporal measurements. Prior heuristic-based methods, based on radio
frequency propagation, are intuitive and generalizable across simple scenarios,
yet fail to navigate in complex environments. On the other hand, end-to-end
(e2e) deep reinforcement learning (RL), powered by advanced computing
machinery, can explore the entire state space, delivering surprising
performance when facing complex wireless environments. However, the price to
pay is the astronomical amount of training samples, and the resulting policy,
without fine-tuning (zero-shot), is unable to navigate efficiently in new
scenarios unseen in the training phase. To equip the navigation agent with
sample-efficient learning and {zero-shot} generalization, this work proposes a
novel physics-informed RL (PIRL) where a distance-to-target-based cost
(standard in e2e) is augmented with physics-informed reward shaping. The key
intuition is that wireless environments vary, but physics laws persist. After
learning to utilize the physics information, the agent can transfer this
knowledge across different tasks and navigate in an unknown environment without
fine-tuning. The proposed PIRL is evaluated using a wireless digital twin (WDT)
built upon simulations of a large class of indoor environments from the AI
Habitat dataset augmented with electromagnetic (EM) radiation simulation for
wireless signals. It is shown that the PIRL significantly outperforms both e2e
RL and heuristic-based solutions in terms of generalization and performance.
Source code is available at \url{https://github.com/Panshark/PIRL-WIN}.Comment: 16 pages, 13 figures, 4 table
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning
Recent advances in Large Multimodal Models (LMM) have made it possible for
various applications in human-machine interactions. However, developing LMMs
that can comprehend, reason, and plan in complex and diverse 3D environments
remains a challenging topic, especially considering the demand for
understanding permutation-invariant point cloud 3D representations of the 3D
scene. Existing works seek help from multi-view images, and project 2D features
to 3D space as 3D scene representations. This, however, leads to huge
computational overhead and performance degradation. In this paper, we present
LL3DA, a Large Language 3D Assistant that takes point cloud as direct input and
respond to both textual-instructions and visual-prompts. This help LMMs better
comprehend human interactions and further help to remove the ambiguities in
cluttered 3D scenes. Experiments show that LL3DA achieves remarkable results,
and surpasses various 3D vision-language models on both 3D Dense Captioning and
3D Question Answering.Comment: Project Page: https://ll3da.github.io
Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense Captioning
3D dense captioning requires a model to translate its understanding of an
input 3D scene into several captions associated with different object regions.
Existing methods adopt a sophisticated "detect-then-describe" pipeline, which
builds explicit relation modules upon a 3D detector with numerous hand-crafted
components. While these methods have achieved initial success, the cascade
pipeline tends to accumulate errors because of duplicated and inaccurate box
estimations and messy 3D scenes. In this paper, we first propose Vote2Cap-DETR,
a simple-yet-effective transformer framework that decouples the decoding
process of caption generation and object localization through parallel
decoding. Moreover, we argue that object localization and description
generation require different levels of scene understanding, which could be
challenging for a shared set of queries to capture. To this end, we propose an
advanced version, Vote2Cap-DETR++, which decouples the queries into
localization and caption queries to capture task-specific features.
Additionally, we introduce the iterative spatial refinement strategy to vote
queries for faster convergence and better localization performance. We also
insert additional spatial information to the caption head for more accurate
descriptions. Without bells and whistles, extensive experiments on two commonly
used datasets, ScanRefer and Nr3D, demonstrate Vote2Cap-DETR and
Vote2Cap-DETR++ surpass conventional "detect-then-describe" methods by a large
margin. Codes will be made available at
https://github.com/ch3cook-fdu/Vote2Cap-DETR
- …