9,403 research outputs found

    Temporal Model Adaptation for Person Re-Identification

    Full text link
    Person re-identification is an open and challenging problem in computer vision. Majority of the efforts have been spent either to design the best feature representation or to learn the optimal matching metric. Most approaches have neglected the problem of adapting the selected features or the learned model over time. To address such a problem, we propose a temporal model adaptation scheme with human in the loop. We first introduce a similarity-dissimilarity learning method which can be trained in an incremental fashion by means of a stochastic alternating directions methods of multipliers optimization procedure. Then, to achieve temporal adaptation with limited human effort, we exploit a graph-based approach to present the user only the most informative probe-gallery matches that should be used to update the model. Results on three datasets have shown that our approach performs on par or even better than state-of-the-art approaches while reducing the manual pairwise labeling effort by about 80%

    Building with Drones: Accurate 3D Facade Reconstruction using MAVs

    Full text link
    Automatic reconstruction of 3D models from images using multi-view Structure-from-Motion methods has been one of the most fruitful outcomes of computer vision. These advances combined with the growing popularity of Micro Aerial Vehicles as an autonomous imaging platform, have made 3D vision tools ubiquitous for large number of Architecture, Engineering and Construction applications among audiences, mostly unskilled in computer vision. However, to obtain high-resolution and accurate reconstructions from a large-scale object using SfM, there are many critical constraints on the quality of image data, which often become sources of inaccuracy as the current 3D reconstruction pipelines do not facilitate the users to determine the fidelity of input data during the image acquisition. In this paper, we present and advocate a closed-loop interactive approach that performs incremental reconstruction in real-time and gives users an online feedback about the quality parameters like Ground Sampling Distance (GSD), image redundancy, etc on a surface mesh. We also propose a novel multi-scale camera network design to prevent scene drift caused by incremental map building, and release the first multi-scale image sequence dataset as a benchmark. Further, we evaluate our system on real outdoor scenes, and show that our interactive pipeline combined with a multi-scale camera network approach provides compelling accuracy in multi-view reconstruction tasks when compared against the state-of-the-art methods.Comment: 8 Pages, 2015 IEEE International Conference on Robotics and Automation (ICRA '15), Seattle, WA, US

    FAST ROTATED BOUNDING BOX ANNOTATIONS FOR OBJECT DETECTION

    Get PDF
    Traditionally, object detection models use a large amount of annotated data and axis-aligned bounding boxes (AABBs) are often chosen as the image annotation technique for both training and predictions. The purpose of annotating the objects in the images is to indicate the regions of interest with the corresponding labels. Accurate object annotations help the computer vision models to understand the distinct patterns of the image features to recognize and localize different classes of objects. However, AABBs are often a poor fit for elongated object instances. It’s also challenging to localize objects with AABBs in densely packed aerial images because of overlapping adjacent bounding boxes. Alternatively, using rectangular annotations that can be oriented diagonally, also known as rotated bounding boxes (RBB), can provide a much tighter fit for elongated objects and reduce the potential bounding box overlap between adjacent objects. However, RBBs are much more time-consuming and tedious to annotate than AABBs for large datasets. In this work, we propose a novel annotation tool named as FastRoLabelImg (Fast Rotated LabelImg) for producing high-quality RBB annotations with low time and effort. The tool generates accurate RBB proposals for objects of interest as the annotator makes progress through the dataset. It can also adapt available AABBs to generate RBB proposals. Furthermore, a multipoint box drawing system is provided to reduce manual RBB annotation time compared to the existing methods. Across three diverse datasets, we show that the proposal generation methods can achieve a maximum of 88.9% manual workload reduction. We also show that our proposed manual annotation method is twice as fast as the existing system with the same accuracy by conducting a participant study. Lastly, we publish the RBB annotations for two public datasets in order to motivate future research that will contribute in developing more competent object detection algorithms capable of RBB predictions

    Editing faces in videos

    Get PDF
    Editing faces in movies is of interest in the special effects industry. We aim at producing effects such as the addition of accessories interacting correctly with the face or replacing the face of a stuntman with the face of the main actor. The system introduced in this thesis is based on a 3D generative face model. Using a 3D model makes it possible to edit the face in the semantic space of pose, expression, and identity instead of pixel space, and due to its 3D nature allows a modelling of the light interaction. In our system we first reconstruct the 3D face, which is deforming because of expressions and speech, the lighting, and the camera in all frames of a monocular input video. The face is then edited by substituting expressions or identities with those of another video sequence or by adding virtual objects into the scene. The manipulated 3D scene is rendered back into the original video, correctly simulating the interaction of the light with the deformed face and virtual objects. We describe all steps necessary to build and apply the system. This includes registration of training faces to learn a generative face model, semi-automatic annotation of the input video, fitting of the face model to the input video, editing of the fit, and rendering of the resulting scene. While describing the application we introduce a host of new methods, each of which is of interest on its own. We start with a new method to register 3D face scans to use as training data for the face model. For video preprocessing a new interest point tracking and 2D Active Appearance Model fitting technique is proposed. For robust fitting we introduce background modelling, model-based stereo techniques, and a more accurate light model

    Analysis domain model for shared virtual environments

    Get PDF
    The field of shared virtual environments, which also encompasses online games and social 3D environments, has a system landscape consisting of multiple solutions that share great functional overlap. However, there is little system interoperability between the different solutions. A shared virtual environment has an associated problem domain that is highly complex raising difficult challenges to the development process, starting with the architectural design of the underlying system. This paper has two main contributions. The first contribution is a broad domain analysis of shared virtual environments, which enables developers to have a better understanding of the whole rather than the part(s). The second contribution is a reference domain model for discussing and describing solutions - the Analysis Domain Model

    Learning Multi-Modal Self-Awareness Models Empowered by Active Inference for Autonomous Vehicles

    Get PDF
    For autonomous agents to coexist with the real world, it is essential to anticipate the dynamics and interactions in their surroundings. Autonomous agents can use models of the human brain to learn about responding to the actions of other participants in the environment and proactively coordinates with the dynamics. Modeling brain learning procedures is challenging for multiple reasons, such as stochasticity, multi-modality, and unobservant intents. A neglected problem has long been understanding and processing environmental perception data from the multisensorial information referring to the cognitive psychology level of the human brain process. The key to solving this problem is to construct a computing model with selective attention and self-learning ability for autonomous driving, which is supposed to possess the mechanism of memorizing, inferring, and experiential updating, enabling it to cope with the changes in an external world. Therefore, a practical self-driving approach should be open to more than just the traditional computing structure of perception, planning, decision-making, and control. It is necessary to explore a probabilistic framework that goes along with human brain attention, reasoning, learning, and decisionmaking mechanism concerning interactive behavior and build an intelligent system inspired by biological intelligence. This thesis presents a multi-modal self-awareness module for autonomous driving systems. The techniques proposed in this research are evaluated on their ability to model proper driving behavior in dynamic environments, which is vital in autonomous driving for both action planning and safe navigation. First, this thesis adapts generative incremental learning to the problem of imitation learning. It extends the imitation learning framework to work in the multi-agent setting where observations gathered from multiple agents are used to inform the training process of a learning agent, which tracks a dynamic target. Since driving has associated rules, the second part of this thesis introduces a method to provide optimal knowledge to the imitation learning agent through an active inference approach. Active inference is the selective information method gathering during prediction to increase a predictive machine learning model’s prediction performance. Finally, to address the inference complexity and solve the exploration-exploitation dilemma in unobserved environments, an exploring action-oriented model is introduced by pulling together imitation learning and active inference methods inspired by the brain learning procedure

    Learning Multi-Modal Self-Awareness Models Empowered by Active Inference for Autonomous Vehicles

    Get PDF
    Mención Internacional en el título de doctorFor autonomous agents to coexist with the real world, it is essential to anticipate the dynamics and interactions in their surroundings. Autonomous agents can use models of the human brain to learn about responding to the actions of other participants in the environment and proactively coordinates with the dynamics. Modeling brain learning procedures is challenging for multiple reasons, such as stochasticity, multi-modality, and unobservant intents. A neglected problem has long been understanding and processing environmental perception data from the multisensorial information referring to the cognitive psychology level of the human brain process. The key to solving this problem is to construct a computing model with selective attention and self-learning ability for autonomous driving, which is supposed to possess the mechanism of memorizing, inferring, and experiential updating, enabling it to cope with the changes in an external world. Therefore, a practical selfdriving approach should be open to more than just the traditional computing structure of perception, planning, decision-making, and control. It is necessary to explore a probabilistic framework that goes along with human brain attention, reasoning, learning, and decisionmaking mechanism concerning interactive behavior and build an intelligent system inspired by biological intelligence. This thesis presents a multi-modal self-awareness module for autonomous driving systems. The techniques proposed in this research are evaluated on their ability to model proper driving behavior in dynamic environments, which is vital in autonomous driving for both action planning and safe navigation. First, this thesis adapts generative incremental learning to the problem of imitation learning. It extends the imitation learning framework to work in the multi-agent setting where observations gathered from multiple agents are used to inform the training process of a learning agent, which tracks a dynamic target. Since driving has associated rules, the second part of this thesis introduces a method to provide optimal knowledge to the imitation learning agent through an active inference approach. Active inference is the selective information method gathering during prediction to increase a predictive machine learning model’s prediction performance. Finally, to address the inference complexity and solve the exploration-exploitation dilemma in unobserved environments, an exploring action-oriented model is introduced by pulling together imitation learning and active inference methods inspired by the brain learning procedure.Programa de Doctorado en Ingeniería Eléctrica, Electrónica y Automática por la Universidad Carlos III de MadridPresidente: Marco Carli.- Secretario: Víctor González Castro.- Vocal: Nicola Conc
    • …
    corecore