92 research outputs found

    Relating First-person and Third-person Vision

    Get PDF
    Thanks to the availability and increasing popularity of wearable devices such as GoPro cameras, smart phones and glasses, we have access to a plethora of videos captured from the first person (egocentric) perspective. Capturing the world from the perspective of one\u27s self, egocentric videos bear characteristics distinct from the more traditional third-person (exocentric) videos. In many computer vision tasks (e.g. identification, action recognition, face recognition, pose estimation, etc.), the human actors are the main focus. Hence, detecting, localizing, and recognizing the human actor is often incorporated as a vital component. In an egocentric video however, the person behind the camera is often the person of interest. This would change the nature of the task at hand, given that the camera holder is usually not visible in the content of his/her egocentric video. In other words, our knowledge about the visual appearance, pose, etc. on the egocentric camera holder is very limited, suggesting reliance on other cues in first person videos. First and third person videos have been separately studied in the past in the computer vision community. However, the relationship between first and third person vision has yet to be fully explored. Relating these two views systematically could potentially benefit many computer vision tasks and applications. This thesis studies this relationship in several aspects. We explore supervised and unsupervised approaches for relating these two views seeking different objectives such as identification, temporal alignment, and action classification. We believe that this exploration could lead to a better understanding the relationship of these two drastically different sources of information

    Vehicle Tracking and Motion Estimation Based on Stereo Vision Sequences

    Get PDF
    In this dissertation, a novel approach for estimating trajectories of road vehicles such as cars, vans, or motorbikes, based on stereo image sequences is presented. Moving objects are detected and reliably tracked in real-time from within a moving car. The resulting information on the pose and motion state of other moving objects with respect to the own vehicle is an essential basis for future driver assistance and safety systems, e.g., for collision prediction. The focus of this contribution is on oncoming traffic, while most existing work in the literature addresses tracking the lead vehicle. The overall approach is generic and scalable to a variety of traffic scenes including inner city, country road, and highway scenarios. A considerable part of this thesis addresses oncoming traffic at urban intersections. The parameters to be estimated include the 3D position and orientation of an object relative to the ego-vehicle, as well as the object's shape, dimension, velocity, acceleration and the rotational velocity (yaw rate). The key idea is to derive these parameters from a set of tracked 3D points on the object's surface, which are registered to a time-consistent object coordinate system, by means of an extended Kalman filter. Combining the rigid 3D point cloud model with the dynamic model of a vehicle is one main contribution of this thesis. Vehicle tracking at intersections requires covering a wide range of different object dynamics, since vehicles can turn quickly. Three different approaches for tracking objects during highly dynamic turn maneuvers up to extreme maneuvers such as skidding are presented and compared. These approaches allow for an online adaptation of the filter parameter values, overcoming manual parameter tuning depending on the dynamics of the tracked object in the scene. This is the second main contribution. Further issues include the introduction of two initialization methods, a robust outlier handling, a probabilistic approach for assigning new points to a tracked object, as well as mid-level fusion of the vision-based approach with a radar sensor. The overall system is systematically evaluated both on simulated and real-world data. The experimental results show the proposed system is able to accurately estimate the object pose and motion parameters in a variety of challenging situations, including night scenes, quick turn maneuvers, and partial occlusions. The limits of the system are also carefully investigated.In dieser Dissertation wird ein Ansatz zur Trajektorienschätzung von Straßenfahrzeugen (PKW, Lieferwagen, Motorräder,...) anhand von Stereo-Bildfolgen vorgestellt. Bewegte Objekte werden in Echtzeit aus einem fahrenden Auto heraus automatisch detektiert, vermessen und deren Bewegungszustand relativ zum eigenen Fahrzeug zuverlässig bestimmt. Die gewonnenen Informationen liefern einen entscheidenden Grundstein für zukünftige Fahrerassistenz- und Sicherheitssysteme im Automobilbereich, beispielsweise zur Kollisionsprädiktion. Während der Großteil der existierenden Literatur das Detektieren und Verfolgen vorausfahrender Fahrzeuge in Autobahnszenarien adressiert, setzt diese Arbeit einen Schwerpunkt auf den Gegenverkehr, speziell an städtischen Kreuzungen. Der Ansatz ist jedoch grundsätzlich generisch und skalierbar für eine Vielzahl an Verkehrssituationen (Innenstadt, Landstraße, Autobahn). Die zu schätzenden Parameter beinhalten die räumliche Lage des anderen Fahrzeugs relativ zum eigenen Fahrzeug, die Objekt-Geschwindigkeit und -Längsbeschleunigung, sowie die Rotationsgeschwindigkeit (Gierrate) des beobachteten Objektes. Zusätzlich werden die Objektabmaße sowie die Objektform rekonstruiert. Die Grundidee ist es, diese Parameter anhand der Transformation von beobachteten 3D Punkten, welche eine ortsfeste Position auf der Objektoberfläche besitzen, mittels eines rekursiven Schätzers (Kalman Filter) zu bestimmen. Ein wesentlicher Beitrag dieser Arbeit liegt in der Kombination des Starrkörpermodells der Punktewolke mit einem Fahrzeugbewegungsmodell. An Kreuzungen können sehr unterschiedliche Dynamiken auftreten, von einer Geradeausfahrt mit konstanter Geschwindigkeit bis hin zum raschen Abbiegen. Um eine manuelle Parameteradaption abhängig von der jeweiligen Szene zu vermeiden, werden drei verschiedene Ansätze zur automatisierten Anpassung der Filterparameter an die vorliegende Situation vorgestellt und verglichen. Dies stellt den zweiten Hauptbeitrag der Arbeit dar. Weitere wichtige Beiträge sind zwei alternative Initialisierungsmethoden, eine robuste Ausreißerbehandlung, ein probabilistischer Ansatz zur Zuordnung neuer Objektpunkte, sowie die Fusion des bildbasierten Verfahrens mit einem Radar-Sensor. Das Gesamtsystem wird im Rahmen dieser Arbeit systematisch anhand von simulierten und realen Straßenverkehrsszenen evaluiert. Die Ergebnisse zeigen, dass das vorgestellte Verfahren in der Lage ist, die unbekannten Objektparameter auch unter schwierigen Umgebungsbedingungen, beispielsweise bei Nacht, schnellen Abbiegemanövern oder unter Teilverdeckungen, sehr präzise zu schätzen. Die Grenzen des Systems werden ebenfalls sorgfältig untersucht

    EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding

    Full text link
    With the surge in attention to Egocentric Hand-Object Interaction (Ego-HOI), large-scale datasets such as Ego4D and EPIC-KITCHENS have been proposed. However, most current research is built on resources derived from third-person video action recognition. This inherent domain gap between first- and third-person action videos, which have not been adequately addressed before, makes current Ego-HOI suboptimal. This paper rethinks and proposes a new framework as an infrastructure to advance Ego-HOI recognition by Probing, Curation and Adaption (EgoPCA). We contribute comprehensive pre-train sets, balanced test sets and a new baseline, which are complete with a training-finetuning strategy. With our new framework, we not only achieve state-of-the-art performance on Ego-HOI benchmarks but also build several new and effective mechanisms and settings to advance further research. We believe our data and the findings will pave a new way for Ego-HOI understanding. Code and data are available at https://mvig-rhos.com/ego_pcaComment: ICCV 202

    Dependability for declarative mechanisms: neural networks in autonomous vehicles decision making.

    Get PDF
    Despite being introduced in 1958, neural networks appeared in numerous applications of different fields in the last decade. This change was possible thanks to the reduced costs of computing power required for deep neural networks, and increasing available data that provide examples for training sets. The 2012 ImageNet image classification competition is often used as a example to describe how neural networks became at this time good candidates for applications: during this competition a neural network based solution won for the first time. In the following editions, all winning solutions were based on neural networks. Since then, neural networks have shown great results in several non critical applications (image recognition, sound recognition, text analysis, etc...). There is a growing interest to use them in critical applications as their ability to generalize makes them good candidates for applications such as autonomous vehicles, but standards do not allow that yet. Autonomous driving functions are currently researched by the industry with the final objective of producing in the near future fully autonomous vehicles, as defined by the fifth level of the SAE international (Society of Automotive Engineers) classification. Autonomous driving process is usually decomposed into four different parts: the where sensors get information from the environment, the where the data from the different sensors is merged into one representation of the environment, the that uses the representation of the environment to decide what should be the vehicles behavior and the commands to send to the actuators and finally the part that implements these commands. In this thesis, following the interest of the company Stellantis, we will focus on the decision part of this process, considering neural network based solution. Automotive being a safety critical application, it is required to implement and ensure the dependability of the systems, and this is why neural networks use is not allowed at the moment: their lack of safety forbid their use in such applications. Dependability methods for classical software systems are well known, but neural networks do not have yet similar dependable mechanisms to guarantee their trust. This problem is due to several reasons, among them the difficulty to test applications with a quasi-infinite operational domain and whose functions are hard to define exhaustively in the specifications. Here we can find the motivation of this thesis: how can we ensure the dependability of neural networks in the context of decision for autonomous vehicles? Research is now being conducted on the topic of dependability and safety of neural networks with several approaches being considered and our research is motivated by the great potential in safety critical applications mentioned above. In this thesis, we will focus on one category of method that seems to be a good candidate to ensure the dependability of neural networks by solving some of the problems of testing: the formal verification for neural networks. These methods aim to prove that a neural network respects a safety property on an entire range of its input and output domains. Formal verification is already used in other domains and is seen as a trusted method to give confidence in a system, but it remains for the moment a research topic for neural networks with currently no industrial applications. The main contributions of this thesis are the following: a proposal of a characterization of neural network from a software development perspective, and a corresponding classification of their faults, errors and failures, the identification of a potential threat to the use of formal verification. This threat is the erroneous neural network model problem, that may lead to trust a formally validated safety property that does not hold in real life, the realization of an experiment that implements a formal verification for neural networks in an autonomous driving application that is to the best of our knowledge the closest to industrial use. For this application, we chose to work with an ACC (Adaptive Cruise Control) function, which is an autonomous driving function that performs the longitudinal control of a vehicle. The experiment is conducted with the use of a simulator and a neural network formal verification tool. The other contributions of the thesis are the following: theoretical example of the erroneous neural network model problem and a practical example in our autonomous driving experiment, a proposal of detection and recovery mechanisms as a solution to the erroneous model problem mentioned above, an implementation of these detection and recovery mechanisms in our autonomous driving experiment and a discussion about difficulties and possible processes for the implementation of formal verification for neural networks that we developed during our experiments

    EgoCOL: Egocentric Camera pose estimation for Open-world 3D object Localization @Ego4D challenge 2023

    Full text link
    We present EgoCOL, an egocentric camera pose estimation method for open-world 3D object localization. Our method leverages sparse camera pose reconstructions in a two-fold manner, video and scan independently, to estimate the camera pose of egocentric frames in 3D renders with high recall and precision. We extensively evaluate our method on the Visual Query (VQ) 3D object localization Ego4D benchmark. EgoCOL can estimate 62% and 59% more camera poses than the Ego4D baseline in the Ego4D Visual Queries 3D Localization challenge at CVPR 2023 in the val and test sets, respectively. Our code is publicly available at https://github.com/BCV-Uniandes/EgoCO

    Survey of computer vision algorithms and applications for unmanned aerial vehicles

    Get PDF
    This paper presents a complete review of computer vision algorithms and vision-based intelligent applications, that are developed in the field of the Unmanned Aerial Vehicles (UAVs) in the latest decade. During this time, the evolution of relevant technologies for UAVs; such as component miniaturization, the increase of computational capabilities, and the evolution of computer vision techniques have allowed an important advance in the development of UAVs technologies and applications. Particularly, computer vision technologies integrated in UAVs allow to develop cutting-edge technologies to cope with aerial perception difficulties; such as visual navigation algorithms, obstacle detection and avoidance and aerial decision-making. All these expert technologies have developed a wide spectrum of application for UAVs, beyond the classic military and defense purposes. Unmanned Aerial Vehicles and Computer Vision are common topics in expert systems, so thanks to the recent advances in perception technologies, modern intelligent applications are developed to enhance autonomous UAV positioning, or automatic algorithms to avoid aerial collisions, among others. Then, the presented survey is based on artificial perception applications that represent important advances in the latest years in the expert system field related to the Unmanned Aerial Vehicles. In this paper, the most significant advances in this field are presented, able to solve fundamental technical limitations; such as visual odometry, obstacle detection, mapping and localization, et cetera. Besides, they have been analyzed based on their capabilities and potential utility. Moreover, the applications and UAVs are divided and categorized according to different criteria.This research is supported by the Spanish Government through the CICYT projects (TRA2015-63708-R and TRA2013-48314-C3-1-R)

    On Patching Learning Discrepancies in Neural Network Training

    Get PDF
    Neural network\u27s ability to model data patterns proved to be immensely useful in a plethora of practical applications. However, using the physical world\u27s data can be problematic since it is often cluttered, crowded with scattered insignificant patterns, contain unusual compositions, and widely infiltrated with biases and imbalances. Consequently, training a neural network to find meaningful patterns in seas of chaotic data points becomes virtually as hard as finding a needle in a haystack. Specifically, attempting to simulate real-world multi-modal noisy distributions with high precision leads the network to learn an ill-informed inference distribution. In this work, we discuss four techniques to mitigate common discrepancies between real-world representations and the training distribution learned by the network. Namely, we address the techniques of Diverse sampling, objective generalization, domain, and task adaptation being introduced as priors in learning the primary objective. For each of these techniques, we contrast the basic training where no prior is applied to the learning with our proposed method and show the advantage of guiding the training distribution to the critical patterns in real-world data using our suggested approaches. We examine those discrepancy-mitigation techniques on a variety of vision tasks ranging from image generation and retrieval to video summarization and actionness ranking
    corecore