122 research outputs found

    Seguimento ativo de agentes dinâmicos multivariados usando informação vectorial

    Get PDF
    Doutoramento em Engenharia MecânicaO objeto principal da presente tese é o estudo de sistemas avançados de segurança, no âmbito da segurança automóvel, baseando-se na previsão de movimentos e ações dos agentes externos. Esta tese propõe tratar os agentes como entidades dinâmicas, com motivações e constrangimentos próprios. Apresenta-se, para tal, novas técnicas de seguimento dos referidos agentes levando em linha de conta as suas especificidades. Em decorrência, estuda-se dedicadamente dois tipos de agentes: os veículos automóveis e os peões. Quanto aos veículos automóveis, propõe-se melhorar a capacidade de previsão de movimentos recorrendo a modelos avançados que representam corretamente os constrangimentos presentes nos veículos. Assim, foram desenvolvidos algoritmos avançados de seguimento de agentes com recurso a modelos de movimento não holonómicos. Estes algoritmos fazem uso de dados vectoriais de distância fornecidos por sensores de distância laser. Para os peões, devido à sua complexidade (designadamente a ausência de constrangimentos de movimentos) propõe-se que a análise da sua linguagem corporal permita detetar atempadamente possíveis intenções de movimentos. Assim, foram desenvolvidos algoritmos de perceção de pose de peões adaptados ao campo da segurança automóvel com recurso a uso de dados de distâncias 3D obtidos com uma câmara stereo. De notar que os diversos algoritmos foram testados em experiências realizadas em ambiente real.The main topic of this thesis is the study of advanced safety systems, in the field of automotive safety, based on the prediction of the movement and actions of external agents. This thesis proposes to treat the agents as dynamic entities with their own motivations as constraints. As so, new target tracking techniques are proposed taking into account the targets’ specificities. Therefore, two different types of agents are dedicatedly studied: automobile vehicles and pedestrians. For the automobile vehicles, a technique to improve motion prediction by the use of advanced motion models is proposed, these models will correctly represent the constrains that exist in this kind of vehicle. With this goal, advanced target tracking algorithms coupled with nonholonomic motion models were developed. These algorithms make use of vectorial range data supplied by laser range sensors. Concerning the pedestrians, due to the problem complexity (mainly due to the lack of any specific motion constraint), it is proposed that the analysis of the pedestrians body language will allow to detected early the pedestrian intentions and movements. As so, pedestrian pose estimation algorithms specially adapted to the field of automotive safety were developed; these algorithms use 3D point cloud data obtained with a stereo camera. The various algorithms were tested in experiments conducted in real conditions

    Learning, Moving, And Predicting With Global Motion Representations

    Get PDF
    In order to effectively respond to and influence the world they inhabit, animals and other intelligent agents must understand and predict the state of the world and its dynamics. An agent that can characterize how the world moves is better equipped to engage it. Current methods of motion computation rely on local representations of motion (such as optical flow) or simple, rigid global representations (such as camera motion). These methods are useful, but they are difficult to estimate reliably and limited in their applicability to real-world settings, where agents frequently must reason about complex, highly nonrigid motion over long time horizons. In this dissertation, I present methods developed with the goal of building more flexible and powerful notions of motion needed by agents facing the challenges of a dynamic, nonrigid world. This work is organized around a view of motion as a global phenomenon that is not adequately addressed by local or low-level descriptions, but that is best understood when analyzed at the level of whole images and scenes. I develop methods to: (i) robustly estimate camera motion from noisy optical flow estimates by exploiting the global, statistical relationship between the optical flow field and camera motion under projective geometry; (ii) learn representations of visual motion directly from unlabeled image sequences using learning rules derived from a formulation of image transformation in terms of its group properties; (iii) predict future frames of a video by learning a joint representation of the instantaneous state of the visual world and its motion, using a view of motion as transformations of world state. I situate this work in the broader context of ongoing computational and biological investigations into the problem of estimating motion for intelligent perception and action

    {Mo2Cap2}: Real-time Mobile {3D} Motion Capture with a Cap-mounted Fisheye Camera

    Get PDF
    We propose the first real-time approach for the egocentric estimation of 3D human body pose in a wide range of unconstrained everyday activities. This setting has a unique set of challenges, such as mobility of the hardware setup, and robustness to long capture sessions with fast recovery from tracking failures. We tackle these challenges based on a novel lightweight setup that converts a standard baseball cap to a device for high-quality pose estimation based on a single cap-mounted fisheye camera. From the captured egocentric live stream, our CNN based 3D pose estimation approach runs at 60Hz on a consumer-level GPU. In addition to the novel hardware setup, our other main contributions are: 1) a large ground truth training corpus of top-down fisheye images and 2) a novel disentangled 3D pose estimation approach that takes the unique properties of the egocentric viewpoint into account. As shown by our evaluation, we achieve lower 3D joint error as well as better 2D overlay than the existing baselines

    Visuelle Detektion unabhängig bewegter Objekte durch einen bewegten monokularen Beobachter

    Get PDF
    The development of a driver assistant system supporting drivers in complex intersection situations would be a major achievement for traffic safety, since many traffic accidents happen in such situations. While this is a highly complex task, which is still not accomplished, this thesis focused on one important and obligatory aspect of such systems: The visual detection of independently moving objects. Information about moving objects can, for example, be used in an attention guidance system, which is a central component of any complete intersection assistant system. The decision to base such a system on visual input had two reasons: (i) Humans gather their information to a large extent visually and (ii) cameras are inexpensive and already widely used in luxury and professional vehicles for specific applications. Mimicking the articulated human head and eyes, agile camera systems are desirable. To avoid heavy and sensitive stereo rigs, a small and lightweight monocular camera system mounted on a pan-tilt unit has been chosen as input device. In this thesis information about moving objects has been used to develop a prototype of an attention guidance system. It is based on the analysis of sequences from a single freely moving camera and on measurements from inertial sensors rigidly coupled with the camera system.Die Entwicklung eines Fahrerassistenzsystems, welches den Fahrer in komplexen Kreuzungssituationen unterstützt, wäre ein wichtiger Beitrag zur Verkehrssicherheit, da sehr viele Unfälle in solchen Situationen passieren. Dies ist eine hochgradig komplexe Aufgabe und daher liegt der Fokus dieser Arbeit auf einen wichtigen und notwendigen Aspekt solcher Systeme: Die visuelle Detektion unabhängig bewegter Objekte. Informationen über bewegte Objekte können z.B. für ein System zur Aufmerksamkeitssteuerung verwendet werden. Solch ein System ist ein integraler Bestandteil eines jeden kompletten Kreuzungsassistenzssystems. Zwei Gründe haben zu der Entscheidung geführt, das System auf visuellen Daten zu stützen: (i) Der Mensch sammelt seine Informationen zum Großteil visuell und (ii) Kameras sind zum Einen günstig und zum Anderen bereits jetzt in vielen Fahrzeugen verfügbar. Agile Kamerasysteme sind nötig um den beweglichen menschlichen Kopf zu imitieren. Die Wahl einer kleinen und leichten monokularen Kamera, die auf einer Schwenk-Neige-Einheit montiert ist, vermeidet die Verwendung von schweren und empfindlichen Stereokamerasystemen. Mit den Informationen über bewegte Objekte ist in dieser Arbeit der Prototyp eines Fahrerassistenzsystems Aufmerksamkeitssteuerung entwickelt worden. Das System basiert auf der Analyse von Bildsequenzen einer frei bewegten Kamera und auf Messungen von der mit der Kamera starr gekoppelten Inertialsensorik

    An Outlook into the Future of Egocentric Vision

    Full text link
    What will the future be? We wonder! In this survey, we explore the gap between current research in egocentric vision and the ever-anticipated future, where wearable computing, with outward facing cameras and digital overlays, is expected to be integrated in our every day lives. To understand this gap, the article starts by envisaging the future through character-based stories, showcasing through examples the limitations of current technology. We then provide a mapping between this future and previously defined research tasks. For each task, we survey its seminal works, current state-of-the-art methodologies and available datasets, then reflect on shortcomings that limit its applicability to future research. Note that this survey focuses on software models for egocentric vision, independent of any specific hardware. The paper concludes with recommendations for areas of immediate explorations so as to unlock our path to the future always-on, personalised and life-enhancing egocentric vision.Comment: We invite comments, suggestions and corrections here: https://openreview.net/forum?id=V3974SUk1

    Multiple Human Pose Estimation with Temporally Consistent 3D Pictorial Structures

    Get PDF
    Multiple human 3D pose estimation from multiple camera views is a challenging task in unconstrained environments. Each individual has to be matched across each view and then the body pose has to be estimated. Additionally, the body pose of every individual changes in a consistent manner over time. To address these challenges, we propose a temporally consistent 3D Pictorial Structures model (3DPS) for multiple human pose estimation from multiple camera views. Our model builds on the 3D Pictorial Structures to introduce the notion of temporal consistency between the inferred body poses. We derive this property by relying on multi-view human tracking. Identifying each individual before inference significantly reduces the size of the state space and positively influences the performance as well. To evaluate our method, we use two challenging multiple human datasets in unconstrained environments. We compare our method with the state-of-the-art approaches and achieve better results

    RGB-D datasets using microsoft kinect or similar sensors: a survey

    Get PDF
    RGB-D data has turned out to be a very useful representation of an indoor scene for solving fundamental computer vision problems. It takes the advantages of the color image that provides appearance information of an object and also the depth image that is immune to the variations in color, illumination, rotation angle and scale. With the invention of the low-cost Microsoft Kinect sensor, which was initially used for gaming and later became a popular device for computer vision, high quality RGB-D data can be acquired easily. In recent years, more and more RGB-D image/video datasets dedicated to various applications have become available, which are of great importance to benchmark the state-of-the-art. In this paper, we systematically survey popular RGB-D datasets for different applications including object recognition, scene classification, hand gesture recognition, 3D-simultaneous localization and mapping, and pose estimation. We provide the insights into the characteristics of each important dataset, and compare the popularity and the difficulty of those datasets. Overall, the main goal of this survey is to give a comprehensive description about the available RGB-D datasets and thus to guide researchers in the selection of suitable datasets for evaluating their algorithms

    Vision-Guided State Estimation and Control of Robotic Manipulators Which Lack Proprioceptive Sensors

    Get PDF
    This paper presents a vision-based approach for estimating the configuration of, and providing control signals for, an under-sensored robot manipulator using a single monocular camera. Some remote manipulators, used for decommissioning tasks in the nuclear industry, lack proprioceptive sensors because electronics are vulnerable to radiation. Additionally, even if proprioceptive joint sensors could be retrofitted, such heavy-duty manipulators are often deployed on mobile vehicle platforms, which are significantly and erratically perturbed when powerful hydraulic drilling or cutting tools are deployed at the end-effector. In these scenarios, it would be beneficial to use external sensory information, e.g. vision, for estimating the robot configuration with respect to the scene or task. Conventional visual servoing methods typically rely on joint encoder values for controlling the robot. In contrast, our framework assumes that no joint encoders are available, and estimates the robot configuration by visually tracking several parts of the robot, and then enforcing equality between a set of transformation matrices which relate the frames of the camera, world and tracked robot parts. To accomplish this, we propose two alternative methods based on optimisation. We evaluate the performance of our developed framework by visually tracking the pose of a conventional robot arm, where the joint encoders are used to provide ground-truth for evaluating the precision of the vision system. Additionally, we evaluate the precision with which visual feedback can be used to control the robot's end-effector to follow a desired trajectory
    • …
    corecore