122 research outputs found
milliFlow: Scene Flow Estimation on mmWave Radar Point Cloud for Human Motion Sensing
Approaching the era of ubiquitous computing, human motion sensing plays a
crucial role in smart systems for decision making, user interaction, and
personalized services. Extensive research has been conducted on human tracking,
pose estimation, gesture recognition, and activity recognition, which are
predominantly based on cameras in traditional methods. However, the intrusive
nature of cameras limits their use in smart home applications. To address this,
mmWave radars have gained popularity due to their privacy-friendly features. In
this work, we propose \textit{milliFlow}, a novel deep learning method for
scene flow estimation as a complementary motion information for mmWave point
cloud, serving as an intermediate level of features and directly benefiting
downstream human motion sensing tasks. Experimental results demonstrate the
superior performance of our method with an average 3D endpoint error of 4.6cm,
significantly surpassing the competing approaches. Furthermore, by
incorporating scene flow information, we achieve remarkable improvements in
human activity recognition, human parsing, and human body part tracking. To
foster further research in this area, we provide our codebase and dataset for
open access.Comment: 15 pages, 8 figure
GammaE: Gamma Embeddings for Logical Queries on Knowledge Graphs
Embedding knowledge graphs (KGs) for multi-hop logical reasoning is a
challenging problem due to massive and complicated structures in many KGs.
Recently, many promising works projected entities and queries into a geometric
space to efficiently find answers. However, it remains challenging to model the
negation and union operator. The negation operator has no strict boundaries,
which generates overlapped embeddings and leads to obtaining ambiguous answers.
An additional limitation is that the union operator is non-closure, which
undermines the model to handle a series of union operators. To address these
problems, we propose a novel probabilistic embedding model, namely Gamma
Embeddings (GammaE), for encoding entities and queries to answer different
types of FOL queries on KGs. We utilize the linear property and strong boundary
support of the Gamma distribution to capture more features of entities and
queries, which dramatically reduces model uncertainty. Furthermore, GammaE
implements the Gamma mixture method to design the closed union operator. The
performance of GammaE is validated on three large logical query datasets.
Experimental results show that GammaE significantly outperforms
state-of-the-art models on public benchmarks
3-D Motion Capture of an Unmodified Drone with Single-chip Millimeter Wave Radar
Accurate motion capture of aerial robots in 3-D is a key enabler for
autonomous operation in indoor environments such as warehouses or factories, as
well as driving forward research in these areas. The most commonly used
solutions at present are optical motion capture (e.g. VICON) and Ultrawideband
(UWB), but these are costly and cumbersome to deploy, due to their requirement
of multiple cameras/sensors spaced around the tracking area. They also require
the drone to be modified to carry an active or passive marker. In this work, we
present an inexpensive system that can be rapidly installed, based on
single-chip millimeter wave (mmWave) radar. Importantly, the drone does not
need to be modified or equipped with any markers, as we exploit the Doppler
signals from the rotating propellers. Furthermore, 3-D tracking is possible
from a single point, greatly simplifying deployment. We develop a novel deep
neural network and demonstrate decimeter level 3-D tracking at 10Hz, achieving
better performance than classical baselines. Our hope is that this low-cost
system will act to catalyse inexpensive drone research and increased autonomy.Comment: Submitted to The 2021 International Conference on Robotics and
Automation (ICRA 2021
Autonomous Learning of Speaker Identity and WiFi Geofence From Noisy Sensor Data
A fundamental building block towards intelligent environments is the ability to understand who is present in a certain area. A ubiquitous way of detecting this is to exploit unique vocal characteristics as people interact with one another in common spaces. However, manually enrolling users into a biometric database is time-consuming and not robust to vocal deviations over time. Instead, consider audio features sampled during a meeting, yielding a noisy set of possible voiceprints. With a number of meetings and knowledge of participation, e.g., sniffed wireless Media Access Control (MAC) addresses, can we learn to associate a specific identity with a particular voiceprint? To address this problem, this paper advocates an Internet of Things (IoT) solution and proposes to use co-located WiFi as supervisory weak labels to automatically bootstrap the labelling process. In particular, a novel cross-modality labelling algorithm is proposed that jointly optimises the clustering and association process, which solves the inherent mismatching issues arising from heterogeneous sensor data. At the same time, we further propose to reuse the labelled data to iteratively update wireless geofence models and curate device specific thresholds. Extensive experimental results from two different scenarios demonstrate that our proposed method is able to achieve 2-fold improvement in labelling compared with conventional methods and can achieve reliable speaker recognition in the wild
AtLoc: Attention Guided Camera Localization
Deep learning has achieved impressive results in camera localization, but
current single-image techniques typically suffer from a lack of robustness,
leading to large outliers. To some extent, this has been tackled by sequential
(multi-images) or geometry constraint approaches, which can learn to reject
dynamic objects and illumination conditions to achieve better performance. In
this work, we show that attention can be used to force the network to focus on
more geometrically robust objects and features, achieving state-of-the-art
performance in common benchmark, even if using only a single image as input.
Extensive experimental evidence is provided through public indoor and outdoor
datasets. Through visualization of the saliency maps, we demonstrate how the
network learns to reject dynamic objects, yielding superior global camera pose
regression performance. The source code is avaliable at
https://github.com/BingCS/AtLoc
Prompt Space Optimizing Few-shot Reasoning Success with Large Language Models
Prompt engineering is an essential technique for enhancing the abilities of
large language models (LLMs) by providing explicit and specific instructions.
It enables LLMs to excel in various tasks, such as arithmetic reasoning,
question answering, summarization, relation extraction, machine translation,
and sentiment analysis. Researchers have been actively exploring different
prompt engineering strategies, such as Chain of Thought (CoT), Zero-CoT, and
In-context learning. However, an unresolved problem arises from the fact that
current approaches lack a solid theoretical foundation for determining optimal
prompts. To address this issue in prompt engineering, we propose a new and
effective approach called Prompt Space. Our methodology utilizes text
embeddings to obtain basis vectors by matrix decomposition, and then constructs
a space for representing all prompts. Prompt Space significantly outperforms
state-of-the-art prompt paradigms on ten public reasoning benchmarks. Notably,
without the help of the CoT method and the prompt "Let's think step by step",
Prompt Space shows superior performance over the few-shot method. Overall, our
approach provides a robust and fundamental theoretical framework for selecting
simple and effective prompts. This advancement marks a significant step towards
improving prompt engineering for a wide variety of applications in LLMs.Comment: Natural language processing (NLP
- …