55 research outputs found

    Coarse-to-Fine Adaptive People Detection for Video Sequences by Maximizing Mutual Information

    Full text link
    Applying people detectors to unseen data is challenging since patterns distributions, such as viewpoints, motion, poses, backgrounds, occlusions and people sizes, may significantly differ from the ones of the training dataset. In this paper, we propose a coarse-to-fine framework to adapt frame by frame people detectors during runtime classification, without requiring any additional manually labeled ground truth apart from the offline training of the detection model. Such adaptation make use of multiple detectors mutual information, i.e., similarities and dissimilarities of detectors estimated and agreed by pair-wise correlating their outputs. Globally, the proposed adaptation discriminates between relevant instants in a video sequence, i.e., identifies the representative frames for an adaptation of the system. Locally, the proposed adaptation identifies the best configuration (i.e., detection threshold) of each detector under analysis, maximizing the mutual information to obtain the detection threshold of each detector. The proposed coarse-to-fine approach does not require training the detectors for each new scenario and uses standard people detector outputs, i.e., bounding boxes. The experimental results demonstrate that the proposed approach outperforms state-of-the-art detectors whose optimal threshold configurations are previously determined and fixed from offline training dataThis work has been partially supported by the Spanish government under the project TEC2014-53176-R (HAVideo

    Systematic Adaptation of Communication-focused Machine Learning Models from Real to Virtual Environments for Human-Robot Collaboration

    Full text link
    Virtual reality has proved to be useful in applications in several fields ranging from gaming, medicine, and training to development of interfaces that enable human-robot collaboration. It empowers designers to explore applications outside of the constraints posed by the real world environment and develop innovative solutions and experiences. Hand gestures recognition which has been a topic of much research and subsequent commercialization in the real world has been possible because of the creation of large, labelled datasets. In order to utilize the power of natural and intuitive hand gestures in the virtual domain for enabling embodied teleoperation of collaborative robots, similarly large datasets must be created so as to keep the working interface easy to learn and flexible enough to add more gestures. Depending on the application, this may be computationally or economically prohibitive. Thus, the adaptation of trained deep learning models that perform well in the real environment to the virtual may be a solution to this challenge. This paper presents a systematic framework for the real to virtual adaptation using limited size of virtual dataset along with guidelines for creating a curated dataset. Finally, while hand gestures have been considered as the communication mode, the guidelines and recommendations presented are generic. These are applicable to other modes such as body poses and facial expressions which have large datasets available in the real domain which must be adapted to the virtual one

    ParGANDA: Making Synthetic Pedestrians A Reality For Object Detection

    Full text link
    Object detection is the key technique to a number of Computer Vision applications, but it often requires large amounts of annotated data to achieve decent results. Moreover, for pedestrian detection specifically, the collected data might contain some personally identifiable information (PII), which is highly restricted in many countries. This label intensive and privacy concerning task has recently led to an increasing interest in training the detection models using synthetically generated pedestrian datasets collected with a photo-realistic video game engine. The engine is able to generate unlimited amounts of data with precise and consistent annotations, which gives potential for significant gains in the real-world applications. However, the use of synthetic data for training introduces a synthetic-to-real domain shift aggravating the final performance. To close the gap between the real and synthetic data, we propose to use a Generative Adversarial Network (GAN), which performsparameterized unpaired image-to-image translation to generate more realistic images. The key benefit of using the GAN is its intrinsic preference of low-level changes to geometric ones, which means annotations of a given synthetic image remain accurate even after domain translation is performed thus eliminating the need for labeling real data. We extensively experimented with the proposed method using MOTSynth dataset to train and MOT17 and MOT20 detection datasets to test, with experimental results demonstrating the effectiveness of this method. Our approach not only produces visually plausible samples but also does not require any labels of the real domain thus making it applicable to the variety of downstream tasks

    Unsupervised Domain Adaptation by Backpropagation

    Full text link
    Top-performing deep architectures are trained on massive amounts of labeled data. In the absence of labeled data for a certain task, domain adaptation often provides an attractive option given that labeled data of similar nature but from a different domain (e.g. synthetic images) are available. Here, we propose a new approach to domain adaptation in deep architectures that can be trained on large amount of labeled data from the source domain and large amount of unlabeled data from the target domain (no labeled target-domain data is necessary). As the training progresses, the approach promotes the emergence of "deep" features that are (i) discriminative for the main learning task on the source domain and (ii) invariant with respect to the shift between the domains. We show that this adaptation behaviour can be achieved in almost any feed-forward model by augmenting it with few standard layers and a simple new gradient reversal layer. The resulting augmented architecture can be trained using standard backpropagation. Overall, the approach can be implemented with little effort using any of the deep-learning packages. The method performs very well in a series of image classification experiments, achieving adaptation effect in the presence of big domain shifts and outperforming previous state-of-the-art on Office datasets

    A Comprehensive Survey on Test-Time Adaptation under Distribution Shifts

    Full text link
    Machine learning methods strive to acquire a robust model during training that can generalize well to test samples, even under distribution shifts. However, these methods often suffer from a performance drop due to unknown test distributions. Test-time adaptation (TTA), an emerging paradigm, has the potential to adapt a pre-trained model to unlabeled data during testing, before making predictions. Recent progress in this paradigm highlights the significant benefits of utilizing unlabeled data for training self-adapted models prior to inference. In this survey, we divide TTA into several distinct categories, namely, test-time (source-free) domain adaptation, test-time batch adaptation, online test-time adaptation, and test-time prior adaptation. For each category, we provide a comprehensive taxonomy of advanced algorithms, followed by a discussion of different learning scenarios. Furthermore, we analyze relevant applications of TTA and discuss open challenges and promising areas for future research. A comprehensive list of TTA methods can be found at \url{https://github.com/tim-learn/awesome-test-time-adaptation}.Comment: Discussions, comments, and questions are all welcomed in \url{https://github.com/tim-learn/awesome-test-time-adaptation

    Co-Training for Unsupervised Domain Adaptation of Semantic Segmentation Models

    Get PDF
    Semantic image segmentation is a central and challenging task in autonomous driving, addressed by training deep models. Since this training draws to a curse of human-based image labeling, using synthetic images with automatically generated labels together with unlabeled real-world images is a promising alternative. This implies to address an unsupervised domain adaptation (UDA) problem. In this paper, we propose a new co-training procedure for synth-to-real UDA of semantic segmentation models. It consists of a self-training stage, which provides two domain-adapted models, and a model collaboration loop for the mutual improvement of these two models. These models are then used to provide the final semantic segmentation labels (pseudo-labels) for the real-world images. The overall procedure treats the deep models as black boxes and drives their collaboration at the level of pseudo-labeled target images, i.e., neither modifying loss functions is required, nor explicit feature alignment. We test our proposal on standard synthetic and real-world datasets for on-board semantic segmentation. Our procedure shows improvements ranging from ~13 to ~26 mIoU points over baselines, so establishing new state-of-the-art results

    Multimodal perception for autonomous driving

    Get PDF
    Mención Internacional en el título de doctorAutonomous driving is set to play an important role among intelligent transportation systems in the coming decades. The advantages of its large-scale implementation –reduced accidents, shorter commuting times, or higher fuel efficiency– have made its development a priority for academia and industry. However, there is still a long way to go to achieve full self-driving vehicles, capable of dealing with any scenario without human intervention. To this end, advances in control, navigation and, especially, environment perception technologies are yet required. In particular, the detection of other road users that may interfere with the vehicle’s trajectory is a key element, since it allows to model the current traffic situation and, thus, to make decisions accordingly. The objective of this thesis is to provide solutions to some of the main challenges of on-board perception systems, such as extrinsic calibration of sensors, object detection, and deployment on real platforms. First, a calibration method for obtaining the relative transformation between pairs of sensors is introduced, eliminating the complex manual adjustment of these parameters. The algorithm makes use of an original calibration pattern and supports LiDARs, and monocular and stereo cameras. Second, different deep learning models for 3D object detection using LiDAR data in its bird’s eye view projection are presented. Through a novel encoding, the use of architectures tailored to image detection is proposed to process the 3D information of point clouds in real time. Furthermore, the effectiveness of using this projection together with image features is analyzed. Finally, a method to mitigate the accuracy drop of LiDARbased detection networks when deployed in ad-hoc configurations is introduced. For this purpose, the simulation of virtual signals mimicking the specifications of the desired real device is used to generate new annotated datasets that can be used to train the models. The performance of the proposed methods is evaluated against other existing alternatives using reference benchmarks in the field of computer vision (KITTI and nuScenes) and through experiments in open traffic with an automated vehicle. The results obtained demonstrate the relevance of the presented work and its suitability for commercial use.La conducción autónoma está llamada a jugar un papel importante en los sistemas inteligentes de transporte de las próximas décadas. Las ventajas de su implementación a larga escala –disminución de accidentes, reducción del tiempo de trayecto, u optimización del consumo– han convertido su desarrollo en una prioridad para la academia y la industria. Sin embargo, todavía hay un largo camino por delante hasta alcanzar una automatización total, capaz de enfrentarse a cualquier escenario sin intervención humana. Para ello, aún se requieren avances en las tecnologías de control, navegación y, especialmente, percepción del entorno. Concretamente, la detección de otros usuarios de la carretera que puedan interferir en la trayectoria del vehículo es una pieza fundamental para conseguirlo, puesto que permite modelar el estado actual del tráfico y tomar decisiones en consecuencia. El objetivo de esta tesis es aportar soluciones a algunos de los principales retos de los sistemas de percepción embarcados, como la calibración extrínseca de los sensores, la detección de objetos, y su despliegue en plataformas reales. En primer lugar, se introduce un método para la obtención de la transformación relativa entre pares de sensores, eliminando el complejo ajuste manual de estos parámetros. El algoritmo hace uso de un patrón de calibración propio y da soporte a cámaras monoculares, estéreo, y LiDAR. En segundo lugar, se presentan diferentes modelos de aprendizaje profundo para la detección de objectos en 3D utilizando datos de escáneres LiDAR en su proyección en vista de pájaro. A través de una nueva codificación, se propone la utilización de arquitecturas de detección en imagen para procesar en tiempo real la información tridimensional de las nubes de puntos. Además, se analiza la efectividad del uso de esta proyección junto con características procedentes de imágenes. Por último, se introduce un método para mitigar la pérdida de precisión de las redes de detección basadas en LiDAR cuando son desplegadas en configuraciones ad-hoc. Para ello, se plantea la simulación de señales virtuales con las características del modelo real que se quiere utilizar, generando así nuevos conjuntos anotados para entrenar los modelos. El rendimiento de los métodos propuestos es evaluado frente a otras alternativas existentes haciendo uso de bases de datos de referencia en el campo de la visión por computador (KITTI y nuScenes), y mediante experimentos en tráfico abierto empleando un vehículo automatizado. Los resultados obtenidos demuestran la relevancia de los trabajos presentados y su viabilidad para un uso comercial.Programa de Doctorado en Ingeniería Eléctrica, Electrónica y Automática por la Universidad Carlos III de MadridPresidente: Jesús García Herrero.- Secretario: Ignacio Parra Alonso.- Vocal: Gustavo Adolfo Peláez Coronad

    Sim-real joint reinforcement transfer for 3D indoor navigation

    Full text link
    © 2019 IEEE. There has been an increasing interest in 3D indoor navigation, where a robot in an environment moves to a target according to an instruction. To deploy a robot for navigation in the physical world, lots of training data is required to learn an effective policy. It is quite labour intensive to obtain sufficient real environment data for training robots while synthetic data is much easier to construct by render-ing. Though it is promising to utilize the synthetic environments to facilitate navigation training in the real world, real environment are heterogeneous from synthetic environment in two aspects. First, the visual representation of the two environments have significant variances. Second, the houseplans of these two environments are quite different. There-fore two types of information,i.e. visual representation and policy behavior, need to be adapted in the reinforce mentmodel. The learning procedure of visual representation and that of policy behavior are presumably reciprocal. We pro-pose to jointly adapt visual representation and policy behavior to leverage the mutual impacts of environment and policy. Specifically, our method employs an adversarial feature adaptation model for visual representation transfer anda policy mimic strategy for policy behavior imitation. Experiment shows that our method outperforms the baseline by 19.47% without any additional human annotations
    corecore