877 research outputs found

    WATCHING PEOPLE: ALGORITHMS TO STUDY HUMAN MOTION AND ACTIVITIES

    Get PDF
    Nowadays human motion analysis is one of the most active research topics in Computer Vision and it is receiving an increasing attention from both the industrial and scientific communities. The growing interest in human motion analysis is motivated by the increasing number of promising applications, ranging from surveillance, human–computer interaction, virtual reality to healthcare, sports, computer games and video conferencing, just to name a few. The aim of this thesis is to give an overview of the various tasks involved in visual motion analysis of the human body and to present the issues and possible solutions related to it. In this thesis, visual motion analysis is categorized into three major areas related to the interpretation of human motion: tracking of human motion using virtual pan-tilt-zoom (vPTZ) camera, recognition of human motions and human behaviors segmentation. In the field of human motion tracking, a virtual environment for PTZ cameras (vPTZ) is presented to overcame the mechanical limitations of PTZ cameras. The vPTZ is built on equirectangular images acquired by 360° cameras and it allows not only the development of pedestrian tracking algorithms but also the comparison of their performances. On the basis of this virtual environment, three novel pedestrian tracking algorithms for 360° cameras were developed, two of which adopt a tracking-by-detection approach while the last adopts a Bayesian approach. The action recognition problem is addressed by an algorithm that represents actions in terms of multinomial distributions of frequent sequential patterns of different length. Frequent sequential patterns are series of data descriptors that occur many times in the data. The proposed method learns a codebook of frequent sequential patterns by means of an apriori-like algorithm. An action is then represented with a Bag-of-Frequent-Sequential-Patterns approach. In the last part of this thesis a methodology to semi-automatically annotate behavioral data given a small set of manually annotated data is presented. The resulting methodology is not only effective in the semi-automated annotation task but can also be used in presence of abnormal behaviors, as demonstrated empirically by testing the system on data collected from children affected by neuro-developmental disorders

    Dataset of Panoramic Images for People Tracking in Service Robotics

    Get PDF
    We provide a framework for constructing a guided robot for usage in hospitals in this thesis. The omnidirectional camera on the robot allows it to recognize and track the person who is following it. Furthermore, when directing the individual to their preferred position in the hospital, the robot must be aware of its surroundings and avoid accidents with other people or items. To train and evaluate our robot's performance, we developed an auto-labeling framework for creating a dataset of panoramic videos captured by the robot's omnidirectional camera. We labeled each person in the video and their real position in the robot's frame, enabling us to evaluate the accuracy of our tracking system and guide the development of the robot's navigation algorithms. Our research expands on earlier work that has established a framework for tracking individuals using omnidirectional cameras. We want to contribute to the continuing work to enhance the precision and dependability of these tracking systems, which is essential for the creation of efficient guiding robots in healthcare facilities, by developing a benchmark dataset. Our research has the potential to improve the patient experience and increase the efficiency of healthcare institutions by reducing staff time spent guiding patients through the facility.We provide a framework for constructing a guided robot for usage in hospitals in this thesis. The omnidirectional camera on the robot allows it to recognize and track the person who is following it. Furthermore, when directing the individual to their preferred position in the hospital, the robot must be aware of its surroundings and avoid accidents with other people or items. To train and evaluate our robot's performance, we developed an auto-labeling framework for creating a dataset of panoramic videos captured by the robot's omnidirectional camera. We labeled each person in the video and their real position in the robot's frame, enabling us to evaluate the accuracy of our tracking system and guide the development of the robot's navigation algorithms. Our research expands on earlier work that has established a framework for tracking individuals using omnidirectional cameras. We want to contribute to the continuing work to enhance the precision and dependability of these tracking systems, which is essential for the creation of efficient guiding robots in healthcare facilities, by developing a benchmark dataset. Our research has the potential to improve the patient experience and increase the efficiency of healthcare institutions by reducing staff time spent guiding patients through the facility

    Depth- and Semantics-aware Multi-modal Domain Translation: Generating 3D Panoramic Color Images from LiDAR Point Clouds

    Full text link
    This work presents a new depth- and semantics-aware conditional generative model, named TITAN-Next, for cross-domain image-to-image translation in a multi-modal setup between LiDAR and camera sensors. The proposed model leverages scene semantics as a mid-level representation and is able to translate raw LiDAR point clouds to RGB-D camera images by solely relying on semantic scene segments. We claim that this is the first framework of its kind and it has practical applications in autonomous vehicles such as providing a fail-safe mechanism and augmenting available data in the target image domain. The proposed model is evaluated on the large-scale and challenging Semantic-KITTI dataset, and experimental findings show that it considerably outperforms the original TITAN-Net and other strong baselines by 23.7%\% margin in terms of IoU

    PASS: Panoramic Annular Semantic Segmentation

    Get PDF
    Pixel-wise semantic segmentation is capable of unifying most of driving scene perception tasks, and has enabled striking progress in the context of navigation assistance, where an entire surrounding sensing is vital. However, current mainstream semantic segmenters are predominantly benchmarked against datasets featuring narrow Field of View (FoV), and a large part of vision-based intelligent vehicles use only a forward-facing camera. In this paper, we propose a Panoramic Annular Semantic Segmentation (PASS) framework to perceive the whole surrounding based on a compact panoramic annular lens system and an online panorama unfolding process. To facilitate the training of PASS models, we leverage conventional FoV imaging datasets, bypassing the efforts entailed to create fully dense panoramic annotations. To consistently exploit the rich contextual cues in the unfolded panorama, we adapt our real-time ERF-PSPNet to predict semantically meaningful feature maps in different segments, and fuse them to fulfill panoramic scene parsing. The innovation lies in the network adaptation to enable smooth and seamless segmentation, combined with an extended set of heterogeneous data augmentations to attain robustness in panoramic imagery. A comprehensive variety of experiments demonstrates the effectiveness for real-world surrounding perception in a single PASS, while the adaptation proposal is exceptionally positive for state-of-the-art efficient networks.Ministerio de EconomĂ­a y CompetitividadComunidad de Madri

    Multi-Clip Video Editing from a Single Viewpoint

    Get PDF
    International audienceWe propose a framework for automatically generating multiple clips suitable for video editing by simulating pan-tilt-zoom camera movements within the frame of a single static camera. Assuming important actors and objects can be localized using computer vision techniques, our method requires only minimal user input to define the subject matter of each sub-clip. The composition of each sub-clip is automatically computed in a novel L1-norm optimization framework. Our approach encodes several common cinematographic practices into a single convex cost function minimization problem, resulting in aesthetically pleasing sub-clips which can easily be edited together using off-the-shelf multi-clip video editing software. We demonstrate our approach on five video sequences of a live theatre performance by generating multiple synchronized subclips for each sequence

    A Survey on Human-aware Robot Navigation

    Full text link
    Intelligent systems are increasingly part of our everyday lives and have been integrated seamlessly to the point where it is difficult to imagine a world without them. Physical manifestations of those systems on the other hand, in the form of embodied agents or robots, have so far been used only for specific applications and are often limited to functional roles (e.g. in the industry, entertainment and military fields). Given the current growth and innovation in the research communities concerned with the topics of robot navigation, human-robot-interaction and human activity recognition, it seems like this might soon change. Robots are increasingly easy to obtain and use and the acceptance of them in general is growing. However, the design of a socially compliant robot that can function as a companion needs to take various areas of research into account. This paper is concerned with the navigation aspect of a socially-compliant robot and provides a survey of existing solutions for the relevant areas of research as well as an outlook on possible future directions.Comment: Robotics and Autonomous Systems, 202
    • …
    corecore