59 research outputs found

    Towards dense people detection with deep learning and depth images

    Get PDF
    This paper describes a novel DNN-based system, named PD3net, that detects multiple people from a single depth image, in real time. The proposed neural network processes a depth image and outputs a likelihood map in image coordinates, where each detection corresponds to a Gaussian-shaped local distribution, centered at each person?s head. This likelihood map encodes both the number of detected people as well as their position in the image, from which the 3D position can be computed. The proposed DNN includes spatially separated convolutions to increase performance, and runs in real-time with low budget GPUs. We use synthetic data for initially training the network, followed by fine tuning with a small amount of real data. This allows adapting the network to different scenarios without needing large and manually labeled image datasets. Due to that, the people detection system presented in this paper has numerous potential applications in different fields, such as capacity control, automatic video-surveillance, people or groups behavior analysis, healthcare or monitoring and assistance of elderly people in ambient assisted living environments. In addition, the use of depth information does not allow recognizing the identity of people in the scene, thus enabling their detection while preserving their privacy. The proposed DNN has been experimentally evaluated and compared with other state-of-the-art approaches, including both classical and DNN-based solutions, under a wide range of experimental conditions. The achieved results allows concluding that the proposed architecture and the training strategy are effective, and the network generalize to work with scenes different from those used during training. We also demonstrate that our proposal outperforms existing methods and can accurately detect people in scenes with significant occlusions.Ministerio de Economía y CompetitividadUniversidad de AlcaláAgencia Estatal de Investigació

    Evaluation of Microsoft Kinect 360 and Microsoft Kinect One for robotics and computer vision applications

    Get PDF
    To understand which one beetween Kinect 360 and Kinect One is better in computer vision or robotic applications, an in-depth study of the two Kinects is necessary. In the first section of this thesis, the Kinects’ features will be compared by some tests; in the second section, instead, these sensors will be evaluated and compared for the purpose of robotics and computer vision application

    Real-Time 3-D Motion Gesture Recognition using Kinect2 as Basis for Traditional Dance Scripting

    Get PDF
    This preliminary study presents a system capable of recognizing human gesture in real-time. The gesture is acquired from a Kinect2 sensor which provides skeleton joints represented by three-dimensional coordinate points. The model set consists of eight motion gestures is provided for basis of gesture recognition using Dynamic Time Warping (DTW) algorithm. DTW algorithm is utilized to identify in real time manner by measuring the shortest combined distances in x, y, and z coordinates in order to determined the matched gesture. It can be shown that the system is able to recognize these 8 motions in real time with some limitations. The findings of the this study will provide solid foundation of further research in which the ultimate goal of the research is to create system to automatically recognize sequence of motions in Indonesian traditional dances and convert them into standardized Resource Description Framework (RDF) scripts for the purpose of preserving these dances

    DS-SLAM: A Semantic Visual SLAM towards Dynamic Environments

    Full text link
    Simultaneous Localization and Mapping (SLAM) is considered to be a fundamental capability for intelligent mobile robots. Over the past decades, many impressed SLAM systems have been developed and achieved good performance under certain circumstances. However, some problems are still not well solved, for example, how to tackle the moving objects in the dynamic environments, how to make the robots truly understand the surroundings and accomplish advanced tasks. In this paper, a robust semantic visual SLAM towards dynamic environments named DS-SLAM is proposed. Five threads run in parallel in DS-SLAM: tracking, semantic segmentation, local mapping, loop closing, and dense semantic map creation. DS-SLAM combines semantic segmentation network with moving consistency check method to reduce the impact of dynamic objects, and thus the localization accuracy is highly improved in dynamic environments. Meanwhile, a dense semantic octo-tree map is produced, which could be employed for high-level tasks. We conduct experiments both on TUM RGB-D dataset and in the real-world environment. The results demonstrate the absolute trajectory accuracy in DS-SLAM can be improved by one order of magnitude compared with ORB-SLAM2. It is one of the state-of-the-art SLAM systems in high-dynamic environments. Now the code is available at our github: https://github.com/ivipsourcecode/DS-SLAMComment: 7 pages, accepted at the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2018). Now the code is available at our github: https://github.com/ivipsourcecode/DS-SLA

    An audio-visual system for object-based audio : from recording to listening

    Get PDF
    Object-based audio is an emerging representation for audio content, where content is represented in a reproduction format-agnostic way and, thus, produced once for consumption on many different kinds of devices. This affords new opportunities for immersive, personalized, and interactive listening experiences. This paper introduces an end-to-end object-based spatial audio pipeline, from sound recording to listening. A high-level system architecture is proposed, which includes novel audiovisual interfaces to support object-based capture and listenertracked rendering, and incorporates a proposed component for objectification, that is, recording content directly into an object-based form. Text-based and extensible metadata enable communication between the system components. An open architecture for object rendering is also proposed. The system’s capabilities are evaluated in two parts. First, listener-tracked reproduction of metadata automatically estimated from two moving talkers is evaluated using an objective binaural localization model. Second, object-based scene capture with audio extracted using blind source separation (to remix between two talkers) and beamforming (to remix a recording of a jazz group) is evaluate

    More Than a Feeling: Learning to Grasp and Regrasp using Vision and Touch

    Full text link
    For humans, the process of grasping an object relies heavily on rich tactile feedback. Most recent robotic grasping work, however, has been based only on visual input, and thus cannot easily benefit from feedback after initiating contact. In this paper, we investigate how a robot can learn to use tactile information to iteratively and efficiently adjust its grasp. To this end, we propose an end-to-end action-conditional model that learns regrasping policies from raw visuo-tactile data. This model -- a deep, multimodal convolutional network -- predicts the outcome of a candidate grasp adjustment, and then executes a grasp by iteratively selecting the most promising actions. Our approach requires neither calibration of the tactile sensors, nor any analytical modeling of contact forces, thus reducing the engineering effort required to obtain efficient grasping policies. We train our model with data from about 6,450 grasping trials on a two-finger gripper equipped with GelSight high-resolution tactile sensors on each finger. Across extensive experiments, our approach outperforms a variety of baselines at (i) estimating grasp adjustment outcomes, (ii) selecting efficient grasp adjustments for quick grasping, and (iii) reducing the amount of force applied at the fingers, while maintaining competitive performance. Finally, we study the choices made by our model and show that it has successfully acquired useful and interpretable grasping behaviors.Comment: 8 pages. Published on IEEE Robotics and Automation Letters (RAL). Website: https://sites.google.com/view/more-than-a-feelin

    Towards a multimodal repository of expressive movement qualities in dance

    Get PDF
    In this paper, we present a new multimodal repository for the analysis of expressive movement qualities in dance. First, we discuss guidelines and methodology that we applied to create this repository. Next, the technical setup of recordings and the platform for capturing the synchronized audio-visual, physiological, and motion capture data are presented. The initial content of the repository consists of about 90 minutes of short dance performances movement sequences, and improvisations performed by four dancers, displaying three expressive qualities: Fluidity, Impulsivity, and Rigidity
    • …
    corecore