276 research outputs found
Securing Cyber-Physical Social Interactions on Wrist-worn Devices
Since ancient Greece, handshaking has been commonly practiced between two people as a friendly gesture to express trust and respect, or form a mutual agreement. In this article, we show that such physical contact can be used to bootstrap secure cyber contact between the smart devices worn by users. The key observation is that during handshaking, although belonged to two different users, the two hands involved in the shaking events are often rigidly connected, and therefore exhibit very similar motion patterns. We propose a novel key generation system, which harvests motion data during user handshaking from the wrist-worn smart devices such as smartwatches or fitness bands, and exploits the matching motion patterns to generate symmetric keys on both parties. The generated keys can be then used to establish a secure communication channel for exchanging data between devices. This provides a much more natural and user-friendly alternative for many applications, e.g., exchanging/sharing contact details, friending on social networks, or even making payments, since it doesn’t involve extra bespoke hardware, nor require the users to perform pre-defined gestures. We implement the proposed key generation system on off-the-shelf smartwatches, and extensive evaluation shows that it can reliably generate 128-bit symmetric keys just after around 1s of handshaking (with success rate >99%), and is resilient to different types of attacks including impersonate mimicking attacks, impersonate passive attacks, or eavesdropping attacks. Specifically, for real-time impersonate mimicking attacks, in our experiments, the Equal Error Rate (EER) is only 1.6% on average. We also show that the proposed key generation system can be extremely lightweight and is able to run in-situ on the resource-constrained smartwatches without incurring excessive resource consumption
SQLdepth: Generalizable Self-Supervised Fine-Structured Monocular Depth Estimation
Recently, self-supervised monocular depth estimation has gained popularity
with numerous applications in autonomous driving and robotics. However,
existing solutions primarily seek to estimate depth from immediate visual
features, and struggle to recover fine-grained scene details with limited
generalization. In this paper, we introduce SQLdepth, a novel approach that can
effectively learn fine-grained scene structures from motion. In SQLdepth, we
propose a novel Self Query Layer (SQL) to build a self-cost volume and infer
depth from it, rather than inferring depth from feature maps. The self-cost
volume implicitly captures the intrinsic geometry of the scene within a single
frame. Each individual slice of the volume signifies the relative distances
between points and objects within a latent space. Ultimately, this volume is
compressed to the depth map via a novel decoding approach. Experimental results
on KITTI and Cityscapes show that our method attains remarkable
state-of-the-art performance (AbsRel = on KITTI, on KITTI with
improved ground-truth and on Cityscapes), achieves , and
error reduction from the previous best. In addition, our approach
showcases reduced training complexity, computational efficiency, improved
generalization, and the ability to recover fine-grained scene details.
Moreover, the self-supervised pre-trained and metric fine-tuned SQLdepth can
surpass existing supervised methods by significant margins (AbsRel = ,
error reduction). self-matching-oriented relative distance querying in
SQL improves the robustness and zero-shot generalization capability of
SQLdepth. Code and the pre-trained weights will be publicly available. Code is
available at
\href{https://github.com/hisfog/SQLdepth-Impl}{https://github.com/hisfog/SQLdepth-Impl}.Comment: 14 pages, 9 figure
Bridging the Domain Gap for Multi-Agent Perception
Existing multi-agent perception algorithms usually select to share deep
neural features extracted from raw sensing data between agents, achieving a
trade-off between accuracy and communication bandwidth limit. However, these
methods assume all agents have identical neural networks, which might not be
practical in the real world. The transmitted features can have a large domain
gap when the models differ, leading to a dramatic performance drop in
multi-agent perception. In this paper, we propose the first lightweight
framework to bridge such domain gaps for multi-agent perception, which can be a
plug-in module for most existing systems while maintaining confidentiality. Our
framework consists of a learnable feature resizer to align features in multiple
dimensions and a sparse cross-domain transformer for domain adaption. Extensive
experiments on the public multi-agent perception dataset V2XSet have
demonstrated that our method can effectively bridge the gap for features from
different domains and outperform other baseline methods significantly by at
least 8% for point-cloud-based 3D object detection.Comment: Accepted by ICRA2023.Code: https://github.com/DerrickXuNu/MPD
Fusion of 3D LIDAR and Camera Data for Object Detection in Autonomous Vehicle Applications
It’s critical for an autonomous vehicle to acquire accurate and real-time information of the objects in its vicinity, which will fully guarantee the safety of the passengers and vehicle in various environment. 3D LIDAR can directly obtain the position and geometrical structure of the object within its detection range, while vision camera is very suitable for object recognition. Accordingly, this paper presents a novel object detection and identification method fusing the complementary information of two kind of sensors. We first utilize the 3D LIDAR data to generate accurate object-region proposals effectively. Then, these candidates are mapped into the image space where the regions of interest (ROI) of the proposals are selected and input to a convolutional neural network (CNN) for further object recognition. In order to identify all sizes of objects precisely, we combine the features of the last three layers of the CNN to extract multi-scale features of the ROIs. The evaluation results on the KITTI dataset demonstrate that : (1) Unlike sliding windows that produce thousands of candidate object-region proposals, 3D LIDAR provides an average of 86 real candidates per frame and the minimal recall rate is higher than 95%, which greatly lowers the proposals extraction time; (2) The average processing time for each frame of the proposed method is only 66.79ms, which meets the real-time demand of autonomous vehicles; (3) The average identification accuracies of our method for car and pedestrian on the moderate level are 89.04% and 78.18% respectively, which outperform most previous methods
Domain Adaptation For Vehicle Detection In Traffic Surveillance Images From Daytime To Nighttime
Vehicle detection in traffic surveillance images is an important approach to obtain vehicle data and rich traffic flow parameters. Recently, deep learning based methods have been widely used in vehicle detection with high accuracy and efficiency. However, deep learning based methods require a large number of manually labeled ground truths (bounding box of each vehicle in each image) to train the Convolutional Neural Networks (CNN). In the modern urban surveillance cameras, there are already many manually labeled ground truths in daytime images for training CNN, while there are little or much less manually labeled ground truths in nighttime images. In this paper, we focus on the research to make maximum usage of labeled daytime images (Source Domain) to help the vehicle detection in unlabeled nighttime images (Target Domain). For this purpose, we propose a new method based on Faster R-CNN with Domain Adaptation (DA) to improve the vehicle detection at nighttime. With the assistance of DA, the domain distribution discrepancy of Source and Target Domains is reduced. We collected a new dataset of 2,200 traffic images (1,200 for daytime and 1,000 for nighttime) of 57,059 vehicles for training and testing CNN. In the experiment, only using the manually labeled ground truths of daytime data, Faster R- CNN obtained 82.84% as F-measure on the nighttime vehicle detection, while the proposed method (Faster R-CNN+DA) achieved 86.39% as F-measure on the nighttime vehicle detection
- …