6 research outputs found

    Audio-Visual Deception Detection: DOLOS Dataset and Parameter-Efficient Crossmodal Learning

    Full text link
    Deception detection in conversations is a challenging yet important task, having pivotal applications in many fields such as credibility assessment in business, multimedia anti-frauds, and custom security. Despite this, deception detection research is hindered by the lack of high-quality deception datasets, as well as the difficulties of learning multimodal features effectively. To address this issue, we introduce DOLOS\footnote {The name ``DOLOS" comes from Greek mythology.}, the largest gameshow deception detection dataset with rich deceptive conversations. DOLOS includes 1,675 video clips featuring 213 subjects, and it has been labeled with audio-visual feature annotations. We provide train-test, duration, and gender protocols to investigate the impact of different factors. We benchmark our dataset on previously proposed deception detection approaches. To further improve the performance by fine-tuning fewer parameters, we propose Parameter-Efficient Crossmodal Learning (PECL), where a Uniform Temporal Adapter (UT-Adapter) explores temporal attention in transformer-based architectures, and a crossmodal fusion module, Plug-in Audio-Visual Fusion (PAVF), combines crossmodal information from audio-visual features. Based on the rich fine-grained audio-visual annotations on DOLOS, we also exploit multi-task learning to enhance performance by concurrently predicting deception and audio-visual features. Experimental results demonstrate the desired quality of the DOLOS dataset and the effectiveness of the PECL. The DOLOS dataset and the source codes are available at https://github.com/NMS05/Audio-Visual-Deception-Detection-DOLOS-Dataset-and-Parameter-Efficient-Crossmodal-Learning/tree/main.Comment: 11 pages, 6 figure

    Dynamic object removal in point clouds for efficient SLAM

    No full text
    Autonomous cars are one of the greatest technological advancements of this decade and a giant leap in the transportation industry and mobile robotics. Autonomous cars face various challenges to achieve Level 5 autonomy and one amongst the challenges is to find a fast and reliable algorithms for simultaneous localisation and mapping (SLAM) in real time environments. SLAM algorithms enable an autonomous car to perceive its environment and identify its position relative to it. The major limitation of the SLAM algorithm, especially while building a map is to have static environmental features, i.e. without any dynamic or moving objects. Research work on SLAM over the past years have produced state of the art algorithms, but virtually all of them assume the environment to be static without moving objects. But in real time environments the autonomous cars must face a lot of moving objects like pedestrians, cyclists, pets etc. This problem is not only associated to autonomous cars, but it is common to all of mobile robots. To enable research progress, human effort is invested in order to manually identify and remove the dynamic objects and then proceed with the SLAM research. But this approach is time consuming, labour intensive, less reliable and does not provide a permanent solution. In this dissertation, a novel algorithm is proposed that can identify and remove dynamic objects in the point clouds obtained from Light Detection and Ranging (LiDAR) sensor and reconstruct a static scene. This algorithm acts as a pre-processing stage and outputs a static scene to traditional SLAM algorithms. The algorithm is tailored for autonomous vehicles with low computational complexity. Experiments were performed using the dataset obtained from KITTI Vision Benchmark suite, which contains real time Lidar data obtained from autonomous cars running on the streets of Karlsruhe, Germany. The algorithm effectively removes the dynamic objects and reconstructs a static scene. This dissertation is a small step in the journey to make autonomous cars a reality and the applications are not only limited to autonomous cars, but also to all of mobile robots. It makes the traditional SLAM algorithms robust and more reliable.Master of Science (Computer Control and Automation

    Towards dependable object detection

    No full text
    A high confidence in object detection is very crucial for object detector modules to be used in real world applications. Though the confidence scores in object detection can be improved by using better and larger training data set and using more robust architectures, which is an approach from the computer vision side, this paper aims to improve the detection confidence by finding the effective viewpoints which makes a dependable use of the available object detectors. In particular, we investigate the effect of viewing distance on detection confidence of an object detector, which can further be used to control a camera mounted on a mobile robot to approach a better viewing position. We consider the cases with both fixed focal length and variable focal length camera. Experimental results are presented to demonstrate the efficacy and validity of the techniques presented in this work.Agency for Science, Technology and Research (A*STAR)This work was supported by the Agency for Science, Technology and Research of Singapore (A*STAR), under the National Robotics Programme (NRP) - Robotics Domain Specific (RDS) (Ref # 1922200001)

    Platform-independent visual installation progress monitoring for construction automation

    No full text
    Efficient interior progress monitoring is crucial for the timely completion of construction projects. Although robots have been used for data acquisition to automate interior progress monitoring, existing methods do not adequately consider the variations of robot platforms and different types of construction environments, which makes it challenging to apply these methods to various robots and environments. This paper proposes an integrated system that achieves automated interior installation progress monitoring, which can be applied to various construction environments and robot platforms. Algorithms are proposed to systematically generate navigation goal points for robot navigations based on BIM information of objects, which enables the detection of target objects with the proper viewing distance and angle. A transformer-based object detector is used to recognize the installation status of building elements, and a progress updating module is developed to correlate the detection results with the robot's sensory and BIM information to generate a construction progress report for interior installation. This framework hierarchically estimates the percentage of project completion and allows for tracking of the installation work progress. The proposed system has been verified through laboratory and onsite experiments using various platforms, including a mobile robot, a four-legged robot, a drone, and a smartphone camera.Agency for Science, Technology and Research (A*STAR)Building and Construction Authority (BCA)This work is supported by the Agency For Science, Technology and Research (A*STAR), Singapore, under the National Robotics Program (NRP)-Robotics Domain Specific (RDS: Ref. [1922200001]). Special thanks to Teambuild Construction Group, Singapore, and Building and Construction Authority (BCA), Singapore, for providing support and research resources

    Audio-visual deception detection: DOLOS dataset and parameter-efficient crossmodal learning

    No full text
    Deception detection in conversations is a challenging yet important task, having pivotal applications in many fields such as credibility assessment in business, multimedia anti-frauds, and custom security. Despite this, deception detection research is hindered by the lack of high-quality deception datasets, as well as the difficulties of learning multimodal features effectively. To address this issue, we introduce DOLOS\footnote {The name ``DOLOS" comes from Greek mythology.}, the largest gameshow deception detection dataset with rich deceptive conversations. DOLOS includes 1,675 video clips featuring 213 subjects, and it has been labeled with audio-visual feature annotations. We provide train-test, duration, and gender protocols to investigate the impact of different factors. We benchmark our dataset on previously proposed deception detection approaches. To further improve the performance by fine-tuning fewer parameters, we propose Parameter-Efficient Crossmodal Learning (PECL), where a Uniform Temporal Adapter (UT-Adapter) explores temporal attention in transformer-based architectures, and a crossmodal fusion module, Plug-in Audio-Visual Fusion (PAVF), combines crossmodal information from audio-visual features. Based on the rich fine-grained audio-visual annotations on DOLOS, we also exploit multi-task learning to enhance performance by concurrently predicting deception and audio-visual features. Experimental results demonstrate the desired quality of the DOLOS dataset and the effectiveness of the PECL. The DOLOS dataset and the source codes are available at~\href{https://github.com/NMS05/Audio-Visual-Deception-Detection-DOLOS-Dataset-and-Parameter-Efficient-Crossmodal-Learning/tree/main}{here}.Submitted/Accepted versionThis research is supported in part by the NTU-PKU Joint Research Institute (a collaboration be- tween the Nanyang Technological University and Peking University that is sponsored by a donation from the Ng Teng Fong Charitable Foundation), and the DSO National Laboratories, Singapore, under the project agreement No. DSOCL21238

    Robot-assisted object detection for construction automation : data and information-driven approach

    No full text
    In construction automation, robotics solution is becoming an emerging technology with the advent of artificial intelligence and advancement in mechatronic systems. In construction buildings, regular inspections are carried out to ensure project completion as per approved plans and quality standards. Currently, expert human inspectors are deployed onsite to perform inspection tasks with the naked eye and conventional tools. This process is time-consuming, labor-intensive, dangerous, repetitive, and may yield subjective results. In this paper, we propose a robotic system equipped with perception sensors and intelligent algorithms to help construction supervisors remotely identify the construction materials, detect component installations and defects, and generate report of their status and location information. Building Information Model (BIM) is used for mobile robot navigation and to retrieve building component's location information. Unlike the current deep learning-based object detection which depends heavily on training data, this work proposes a data and information-driven approach which incorporates offline training data, sensor data and BIM information to achieve BIM-based object coverage navigation, BIM-based false detection filtering, and a fine manoeuvre technique to improve on object detections during real-time automated task execution by robots. This allows the user to select building components to be inspected and the mobile robot navigates autonomously to the target components using BIM generated navigation map. An object detector then detects the building components and materials and generates an inspection report. The proposed system is verified through laboratory and onsite experiments.Agency for Science, Technology and Research (A*STAR)Accepted versionThis work is supported by the Agency For Science, Technology and Research of Singapore (A*STAR), Singapore, under the National Robotics Program (NRP)-Robotics Domain Specific (RDS: Ref. 1922200001)
    corecore