146 research outputs found
Real-Time Traffic Light Recognition Based on C-HOG Features
This paper proposes a real-time traffic light detection and recognition algorithm that would allow for the recognition of traffic signals in intelligent vehicles. This algorithm is based on C-HOG features (Color and HOG features) and Support Vector Machine (SVM). The algorithm extracted red and green areas in the video accurately, and then screened the eligible area. Thereafter, the C-HOG features of all kinds of lights could be extracted. Finally, this work used SVM to build a classifier of corresponding category lights. This algorithm obtained accurate real-time information based on the judgment of the decision function. Furthermore, experimental results show that this algorithm demonstrated accuracy and good real-time performance
Data-Augmented Contact Model for Rigid Body Simulation
Accurately modeling contact behaviors for real-world, near-rigid materials
remains a grand challenge for existing rigid-body physics simulators. This
paper introduces a data-augmented contact model that incorporates analytical
solutions with observed data to predict the 3D contact impulse which could
result in rigid bodies bouncing, sliding or spinning in all directions. Our
method enhances the expressiveness of the standard Coulomb contact model by
learning the contact behaviors from the observed data, while preserving the
fundamental contact constraints whenever possible. For example, a classifier is
trained to approximate the transitions between static and dynamic frictions,
while non-penetration constraint during collision is enforced analytically. Our
method computes the aggregated effect of contact for the entire rigid body,
instead of predicting the contact force for each contact point individually,
removing the exponential decline in accuracy as the number of contact points
increases.Comment: 7 pages, 7 figures. Submitted to ICRA 2019. Added video attachment
with full 3D experiments: https://youtu.be/AKSD8TabDV
Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds
Recent studies have presented compelling evidence that large language models
(LLMs) can equip embodied agents with the self-driven capability to interact
with the world, which marks an initial step toward versatile robotics. However,
these efforts tend to overlook the visual richness of open worlds, rendering
the entire interactive process akin to "a blindfolded text-based game."
Consequently, LLM-based agents frequently encounter challenges in intuitively
comprehending their surroundings and producing responses that are easy to
understand. In this paper, we propose Steve-Eye, an end-to-end trained large
multimodal model designed to address this limitation. Steve-Eye integrates the
LLM with a visual encoder which enables it to process visual-text inputs and
generate multimodal feedback. In addition, we use a semi-automatic strategy to
collect an extensive dataset comprising 850K open-world instruction pairs,
empowering our model to encompass three essential functions for an agent:
multimodal perception, foundational knowledge base, and skill prediction and
planning. Lastly, we develop three open-world evaluation benchmarks, then carry
out extensive experiments from a wide range of perspectives to validate our
model's capability to strategically act and plan. Codes and datasets will be
released.Comment: 19 pages, 19 figure
Revisiting the Spatial and Temporal Modeling for Few-shot Action Recognition
Spatial and temporal modeling is one of the most core aspects of few-shot
action recognition. Most previous works mainly focus on long-term temporal
relation modeling based on high-level spatial representations, without
considering the crucial low-level spatial features and short-term temporal
relations. Actually, the former feature could bring rich local semantic
information, and the latter feature could represent motion characteristics of
adjacent frames, respectively. In this paper, we propose SloshNet, a new
framework that revisits the spatial and temporal modeling for few-shot action
recognition in a finer manner. First, to exploit the low-level spatial
features, we design a feature fusion architecture search module to
automatically search for the best combination of the low-level and high-level
spatial features. Next, inspired by the recent transformer, we introduce a
long-term temporal modeling module to model the global temporal relations based
on the extracted spatial appearance features. Meanwhile, we design another
short-term temporal modeling module to encode the motion characteristics
between adjacent frame representations. After that, the final predictions can
be obtained by feeding the embedded rich spatial-temporal features to a common
frame-level class prototype matcher. We extensively validate the proposed
SloshNet on four few-shot action recognition datasets, including
Something-Something V2, Kinetics, UCF101, and HMDB51. It achieves favorable
results against state-of-the-art methods in all datasets
Blind2Sound: Self-Supervised Image Denoising without Residual Noise
Self-supervised blind denoising for Poisson-Gaussian noise remains a
challenging task. Pseudo-supervised pairs constructed from single noisy images
re-corrupt the signal and degrade the performance. The visible blindspots solve
the information loss in masked inputs. However, without explicitly noise
sensing, mean square error as an objective function cannot adjust denoising
intensities for dynamic noise levels, leading to noticeable residual noise. In
this paper, we propose Blind2Sound, a simple yet effective approach to overcome
residual noise in denoised images. The proposed adaptive re-visible loss senses
noise levels and performs personalized denoising without noise residues while
retaining the signal lossless. The theoretical analysis of intermediate medium
gradients guarantees stable training, while the Cramer Gaussian loss acts as a
regularization to facilitate the accurate perception of noise levels and
improve the performance of the denoiser. Experiments on synthetic and
real-world datasets show the superior performance of our method, especially for
single-channel images
Semantic-aware Transmission Scheduling: a Monotonicity-driven Deep Reinforcement Learning Approach
For cyber-physical systems in the 6G era, semantic communications connecting
distributed devices for dynamic control and remote state estimation are
required to guarantee application-level performance, not merely focus on
communication-centric performance. Semantics here is a measure of the
usefulness of information transmissions. Semantic-aware transmission scheduling
of a large system often involves a large decision-making space, and the optimal
policy cannot be obtained by existing algorithms effectively. In this paper, we
first investigate the fundamental properties of the optimal semantic-aware
scheduling policy and then develop advanced deep reinforcement learning (DRL)
algorithms by leveraging the theoretical guidelines. Our numerical results show
that the proposed algorithms can substantially reduce training time and enhance
training performance compared to benchmark algorithms.Comment: This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessibl
LLaMA Rider: Spurring Large Language Models to Explore the Open World
Recently, various studies have leveraged Large Language Models (LLMs) to help
decision-making and planning in environments, and try to align the LLMs'
knowledge with the world conditions. Nonetheless, the capacity of LLMs to
continuously acquire environmental knowledge and adapt in an open world remains
uncertain. In this paper, we propose an approach to spur LLMs to explore the
open world, gather experiences, and learn to improve their task-solving
capabilities. In this approach, a multi-round feedback-revision mechanism is
utilized to encourage LLMs to actively select appropriate revision actions
guided by feedback information from the environment. This facilitates
exploration and enhances the model's performance. Besides, we integrate
sub-task relabeling to assist LLMs in maintaining consistency in sub-task
planning and help the model learn the combinatorial nature between tasks,
enabling it to complete a wider range of tasks through training based on the
acquired exploration experiences. By evaluation in Minecraft, an open-ended
sandbox world, we demonstrate that our approach LLaMA-Rider enhances the
efficiency of the LLM in exploring the environment, and effectively improves
the LLM's ability to accomplish more tasks through fine-tuning with merely 1.3k
instances of collected data, showing minimal training costs compared to the
baseline using reinforcement learning.Comment: 18 page
Structure-Enhanced Deep Reinforcement Learning for Optimal Transmission Scheduling
Remote state estimation of large-scale distributed dynamic processes plays an
important role in Industry 4.0 applications. In this paper, by leveraging the
theoretical results of structural properties of optimal scheduling policies, we
develop a structure-enhanced deep reinforcement learning (DRL) framework for
optimal scheduling of a multi-sensor remote estimation system to achieve the
minimum overall estimation mean-square error (MSE). In particular, we propose a
structure-enhanced action selection method, which tends to select actions that
obey the policy structure. This explores the action space more effectively and
enhances the learning efficiency of DRL agents. Furthermore, we introduce a
structure-enhanced loss function to add penalty to actions that do not follow
the policy structure. The new loss function guides the DRL to converge to the
optimal policy structure quickly. Our numerical results show that the proposed
structure-enhanced DRL algorithms can save the training time by 50% and reduce
the remote estimation MSE by 10% to 25%, when compared to benchmark DRL
algorithms.Comment: This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessibl
- …