Search CORE

146 research outputs found

Real-Time Traffic Light Recognition Based on C-HOG Features

Author: Liu Hongzhe
Yuan Jiazheng
Zhou Xuanru
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 29/11/2017
Field of study

This paper proposes a real-time traffic light detection and recognition algorithm that would allow for the recognition of traffic signals in intelligent vehicles. This algorithm is based on C-HOG features (Color and HOG features) and Support Vector Machine (SVM). The algorithm extracted red and green areas in the video accurately, and then screened the eligible area. Thereafter, the C-HOG features of all kinds of lights could be extracted. Finally, this work used SVM to build a classifier of corresponding category lights. This algorithm obtained accurate real-time information based on the judgment of the decision function. Furthermore, experimental results show that this algorithm demonstrated accuracy and good real-time performance

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Data-Augmented Contact Model for Rigid Body Simulation

Author: Jiang Yifeng
Liu C. Karen
Sun Jiazheng
Publication venue
Publication date: 22/09/2018
Field of study

Accurately modeling contact behaviors for real-world, near-rigid materials remains a grand challenge for existing rigid-body physics simulators. This paper introduces a data-augmented contact model that incorporates analytical solutions with observed data to predict the 3D contact impulse which could result in rigid bodies bouncing, sliding or spinning in all directions. Our method enhances the expressiveness of the standard Coulomb contact model by learning the contact behaviors from the observed data, while preserving the fundamental contact constraints whenever possible. For example, a classifier is trained to approximate the transitions between static and dynamic frictions, while non-penetration constraint during collision is enforced analytically. Our method computes the aggregated effect of contact for the entire rigid body, instead of predicting the contact force for each contact point individually, removing the exponential decline in accuracy as the number of contact points increases.Comment: 7 pages, 7 figures. Submitted to ICRA 2019. Added video attachment with full 3D experiments: https://youtu.be/AKSD8TabDV

arXiv.org e-Print Archive

Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds

Author: Feng Yicheng
Liu Jiazheng
Lu Zongqing
Zheng Sipeng
Publication venue
Publication date: 07/12/2023
Field of study

Recent studies have presented compelling evidence that large language models (LLMs) can equip embodied agents with the self-driven capability to interact with the world, which marks an initial step toward versatile robotics. However, these efforts tend to overlook the visual richness of open worlds, rendering the entire interactive process akin to "a blindfolded text-based game." Consequently, LLM-based agents frequently encounter challenges in intuitively comprehending their surroundings and producing responses that are easy to understand. In this paper, we propose Steve-Eye, an end-to-end trained large multimodal model designed to address this limitation. Steve-Eye integrates the LLM with a visual encoder which enables it to process visual-text inputs and generate multimodal feedback. In addition, we use a semi-automatic strategy to collect an extensive dataset comprising 850K open-world instruction pairs, empowering our model to encompass three essential functions for an agent: multimodal perception, foundational knowledge base, and skill prediction and planning. Lastly, we develop three open-world evaluation benchmarks, then carry out extensive experiments from a wide range of perspectives to validate our model's capability to strategically act and plan. Codes and datasets will be released.Comment: 19 pages, 19 figure

arXiv.org e-Print Archive

Revisiting the Spatial and Temporal Modeling for Few-shot Action Recognition

Author: Liu Yong
Mu Boyu
Wang Mengmeng
Xing Jiazheng
Publication venue
Publication date: 07/04/2023
Field of study

Spatial and temporal modeling is one of the most core aspects of few-shot action recognition. Most previous works mainly focus on long-term temporal relation modeling based on high-level spatial representations, without considering the crucial low-level spatial features and short-term temporal relations. Actually, the former feature could bring rich local semantic information, and the latter feature could represent motion characteristics of adjacent frames, respectively. In this paper, we propose SloshNet, a new framework that revisits the spatial and temporal modeling for few-shot action recognition in a finer manner. First, to exploit the low-level spatial features, we design a feature fusion architecture search module to automatically search for the best combination of the low-level and high-level spatial features. Next, inspired by the recent transformer, we introduce a long-term temporal modeling module to model the global temporal relations based on the extracted spatial appearance features. Meanwhile, we design another short-term temporal modeling module to encode the motion characteristics between adjacent frame representations. After that, the final predictions can be obtained by feeding the embedded rich spatial-temporal features to a common frame-level class prototype matcher. We extensively validate the proposed SloshNet on four few-shot action recognition datasets, including Something-Something V2, Kinetics, UCF101, and HMDB51. It achieves favorable results against state-of-the-art methods in all datasets

arXiv.org e-Print Archive

Blind2Sound: Self-Supervised Image Denoising without Residual Noise

Author: Han Hua
Liu Jiazheng
Wang Zejin
Zhai Hao
Publication venue
Publication date: 14/03/2023
Field of study

Self-supervised blind denoising for Poisson-Gaussian noise remains a challenging task. Pseudo-supervised pairs constructed from single noisy images re-corrupt the signal and degrade the performance. The visible blindspots solve the information loss in masked inputs. However, without explicitly noise sensing, mean square error as an objective function cannot adjust denoising intensities for dynamic noise levels, leading to noticeable residual noise. In this paper, we propose Blind2Sound, a simple yet effective approach to overcome residual noise in denoised images. The proposed adaptive re-visible loss senses noise levels and performs personalized denoising without noise residues while retaining the signal lossless. The theoretical analysis of intermediate medium gradients guarantees stable training, while the Cramer Gaussian loss acts as a regularization to facilitate the accurate perception of noise levels and improve the performance of the denoiser. Experiments on synthetic and real-world datasets show the superior performance of our method, especially for single-channel images

arXiv.org e-Print Archive

Semantic-aware Transmission Scheduling: a Monotonicity-driven Deep Reinforcement Learning Approach

Author: Chen Jiazheng
Li Yonghui
Liu Wanchun
Quevedo Daniel
Vucetic Branka
Publication venue
Publication date: 23/05/2023
Field of study

For cyber-physical systems in the 6G era, semantic communications connecting distributed devices for dynamic control and remote state estimation are required to guarantee application-level performance, not merely focus on communication-centric performance. Semantics here is a measure of the usefulness of information transmissions. Semantic-aware transmission scheduling of a large system often involves a large decision-making space, and the optimal policy cannot be obtained by existing algorithms effectively. In this paper, we first investigate the fundamental properties of the optimal semantic-aware scheduling policy and then develop advanced deep reinforcement learning (DRL) algorithms by leveraging the theoretical guidelines. Our numerical results show that the proposed algorithms can substantially reduce training time and enhance training performance compared to benchmark algorithms.Comment: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

arXiv.org e-Print Archive

LLaMA Rider: Spurring Large Language Models to Explore the Open World

Author: Feng Yicheng
Liu Jiazheng
Lu Zongqing
Wang Yuxuan
Zheng Sipeng
Publication venue
Publication date: 13/10/2023
Field of study

Recently, various studies have leveraged Large Language Models (LLMs) to help decision-making and planning in environments, and try to align the LLMs' knowledge with the world conditions. Nonetheless, the capacity of LLMs to continuously acquire environmental knowledge and adapt in an open world remains uncertain. In this paper, we propose an approach to spur LLMs to explore the open world, gather experiences, and learn to improve their task-solving capabilities. In this approach, a multi-round feedback-revision mechanism is utilized to encourage LLMs to actively select appropriate revision actions guided by feedback information from the environment. This facilitates exploration and enhances the model's performance. Besides, we integrate sub-task relabeling to assist LLMs in maintaining consistency in sub-task planning and help the model learn the combinatorial nature between tasks, enabling it to complete a wider range of tasks through training based on the acquired exploration experiences. By evaluation in Minecraft, an open-ended sandbox world, we demonstrate that our approach LLaMA-Rider enhances the efficiency of the LLM in exploring the environment, and effectively improves the LLM's ability to accomplish more tasks through fine-tuning with merely 1.3k instances of collected data, showing minimal training costs compared to the baseline using reinforcement learning.Comment: 18 page

arXiv.org e-Print Archive

Structure-Enhanced Deep Reinforcement Learning for Optimal Transmission Scheduling

Author: Chen Jiazheng
Li Yonghui
Liu Wanchun
Quevedo Daniel E.
Vucetic Branka
Publication venue
Publication date: 19/11/2022
Field of study

Remote state estimation of large-scale distributed dynamic processes plays an important role in Industry 4.0 applications. In this paper, by leveraging the theoretical results of structural properties of optimal scheduling policies, we develop a structure-enhanced deep reinforcement learning (DRL) framework for optimal scheduling of a multi-sensor remote estimation system to achieve the minimum overall estimation mean-square error (MSE). In particular, we propose a structure-enhanced action selection method, which tends to select actions that obey the policy structure. This explores the action space more effectively and enhances the learning efficiency of DRL agents. Furthermore, we introduce a structure-enhanced loss function to add penalty to actions that do not follow the policy structure. The new loss function guides the DRL to converge to the optimal policy structure quickly. Our numerical results show that the proposed structure-enhanced DRL algorithms can save the training time by 50% and reduce the remote estimation MSE by 10% to 25%, when compared to benchmark DRL algorithms.Comment: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

arXiv.org e-Print Archive

Queensland University of Technology ePrints Archive