Search CORE

31 research outputs found

Open-Vocabulary Affordance Detection using Knowledge Distillation and Text-Point Correlation

Author: Huang Baoru
Le Ngan
Nguyen Anh
Nguyen Toan
Van Vo Tuan
Vo Thieu
Vu Minh Nhat
Publication venue
Publication date: 19/09/2023
Field of study

Affordance detection presents intricate challenges and has a wide range of robotic applications. Previous works have faced limitations such as the complexities of 3D object shapes, the wide range of potential affordances on real-world objects, and the lack of open-vocabulary support for affordance understanding. In this paper, we introduce a new open-vocabulary affordance detection method in 3D point clouds, leveraging knowledge distillation and text-point correlation. Our approach employs pre-trained 3D models through knowledge distillation to enhance feature extraction and semantic understanding in 3D point clouds. We further introduce a new text-point correlation method to learn the semantic links between point cloud features and open-vocabulary labels. The intensive experiments show that our approach outperforms previous works and adapts to new affordance labels and unseen objects. Notably, our method achieves the improvement of 7.96% mIOU score compared to the baselines. Furthermore, it offers real-time inference which is well-suitable for robotic manipulation applications.Comment: 8 page

arXiv.org e-Print Archive

Language-Conditioned Affordance-Pose Detection in 3D Point Clouds

Author: Huang Baoru
Le Bac
Le Ngan
Nguyen Anh
Nguyen Toan
Truong Vy
Van Vo Tuan
Vo Thieu
Vu Minh Nhat
Publication venue
Publication date: 19/09/2023
Field of study

Affordance detection and pose estimation are of great importance in many robotic applications. Their combination helps the robot gain an enhanced manipulation capability, in which the generated pose can facilitate the corresponding affordance task. Previous methods for affodance-pose joint learning are limited to a predefined set of affordances, thus limiting the adaptability of robots in real-world environments. In this paper, we propose a new method for language-conditioned affordance-pose joint learning in 3D point clouds. Given a 3D point cloud object, our method detects the affordance region and generates appropriate 6-DoF poses for any unconstrained affordance label. Our method consists of an open-vocabulary affordance detection branch and a language-guided diffusion model that generates 6-DoF poses based on the affordance text. We also introduce a new high-quality dataset for the task of language-driven affordance-pose joint learning. Intensive experimental results demonstrate that our proposed method works effectively on a wide range of open-vocabulary affordances and outperforms other baselines by a large margin. In addition, we illustrate the usefulness of our method in real-world robotic applications. Our code and dataset are publicly available at https://3DAPNet.github.ioComment: Project page: https://3DAPNet.github.i

arXiv.org e-Print Archive

Open-Vocabulary Affordance Detection in 3D Point Clouds

Author: Le Ngan
Nguyen Anh
Nguyen Dzung
Nguyen Toan
Vo Thieu
Vu Minh Nhat
Vuong An
Publication venue
Publication date: 09/07/2023
Field of study

Affordance detection is a challenging problem with a wide variety of robotic applications. Traditional affordance detection methods are limited to a predefined set of affordance labels, hence potentially restricting the adaptability of intelligent robots in complex and dynamic environments. In this paper, we present the Open-Vocabulary Affordance Detection (OpenAD) method, which is capable of detecting an unbounded number of affordances in 3D point clouds. By simultaneously learning the affordance text and the point feature, OpenAD successfully exploits the semantic relationships between affordances. Therefore, our proposed method enables zero-shot detection and can be able to detect previously unseen affordances without a single annotation example. Intensive experimental results show that OpenAD works effectively on a wide range of affordance detection setups and outperforms other baselines by a large margin. Additionally, we demonstrate the practicality of the proposed OpenAD in real-world robotic applications with a fast inference speed (~100ms). Our project is available at https://openad2023.github.io.Comment: Accepted to The 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2023

arXiv.org e-Print Archive

Research Notes : Exotic soybean observational yield performance trial in Kien Gian province --Mekong Delta --Vietnam --dry season 1981-1982

Author: Chu Huu Tin
Dao Van Lau Van
Nguyen Van Thieu
Tran Van Hiep
Vo Cong Khanh
Publication venue: Iowa State University Digital Repository
Publication date: 01/04/1982
Field of study

In the light of mutual technical assistance, six soybean varieties, namely \u27Bon minori\u27 (2 lines), \u27Enrei\u27 (2 lines), \u27Akiyoshi\u27 and \u27Hyuuga\u27, were forwarded by registered mail from Japan to Vietnam in May, 1981. Due to its long postal course, soybean seed was only received in November 1981. A month later, seeds were planted on December 10, 1981, at the provincial seed farm of Kien Giang province and then harvested on February 26, 1982

Digital Repository @ Iowa State University (ISU)

Language-driven Scene Synthesis using Multi-conditional Diffusion Model

Author: Huang Baoru
Nguyen Anh
Nguyen Dzung
Nguyen Toan Tien
Vo Thieu
Vu Minh Nhat
Vuong An
Publication venue
Publication date: 24/10/2023
Field of study

Scene synthesis is a challenging problem with several industrial applications. Recently, substantial efforts have been directed to synthesize the scene using human motions, room layouts, or spatial graphs as the input. However, few studies have addressed this problem from multiple modalities, especially combining text prompts. In this paper, we propose a language-driven scene synthesis task, which is a new task that integrates text prompts, human motion, and existing objects for scene synthesis. Unlike other single-condition synthesis tasks, our problem involves multiple conditions and requires a strategy for processing and encoding them into a unified space. To address the challenge, we present a multi-conditional diffusion model, which differs from the implicit unification approach of other diffusion literature by explicitly predicting the guiding points for the original data distribution. We demonstrate that our approach is theoretically supportive. The intensive experiment results illustrate that our method outperforms state-of-the-art benchmarks and enables natural scene editing applications. The source code and dataset can be accessed at https://lang-scene-synth.github.io/.Comment: Accepted to NeurIPS 202

arXiv.org e-Print Archive