4,241 research outputs found
Self-Supervised Geometric Correspondence for Category-Level 6D Object Pose Estimation in the Wild
While 6D object pose estimation has wide applications across computer vision
and robotics, it remains far from being solved due to the lack of annotations.
The problem becomes even more challenging when moving to category-level 6D
pose, which requires generalization to unseen instances. Current approaches are
restricted by leveraging annotations from simulation or collected from humans.
In this paper, we overcome this barrier by introducing a self-supervised
learning approach trained directly on large-scale real-world object videos for
category-level 6D pose estimation in the wild. Our framework reconstructs the
canonical 3D shape of an object category and learns dense correspondences
between input images and the canonical shape via surface embedding. For
training, we propose novel geometrical cycle-consistency losses which construct
cycles across 2D-3D spaces, across different instances and different time
steps. The learned correspondence can be applied for 6D pose estimation and
other downstream tasks such as keypoint transfer. Surprisingly, our method,
without any human annotations or simulators, can achieve on-par or even better
performance than previous supervised or semi-supervised methods on in-the-wild
images. Our project page is: https://kywind.github.io/self-pose .Comment: Project page: https://kywind.github.io/self-pos
Trustworthy Representation Learning Across Domains
As AI systems have obtained significant performance to be deployed widely in
our daily live and human society, people both enjoy the benefits brought by
these technologies and suffer many social issues induced by these systems. To
make AI systems good enough and trustworthy, plenty of researches have been
done to build guidelines for trustworthy AI systems. Machine learning is one of
the most important parts for AI systems and representation learning is the
fundamental technology in machine learning. How to make the representation
learning trustworthy in real-world application, e.g., cross domain scenarios,
is very valuable and necessary for both machine learning and AI system fields.
Inspired by the concepts in trustworthy AI, we proposed the first trustworthy
representation learning across domains framework which includes four concepts,
i.e, robustness, privacy, fairness, and explainability, to give a comprehensive
literature review on this research direction. Specifically, we first introduce
the details of the proposed trustworthy framework for representation learning
across domains. Second, we provide basic notions and comprehensively summarize
existing methods for the trustworthy framework from four concepts. Finally, we
conclude this survey with insights and discussions on future research
directions.Comment: 38 pages, 15 figure
A Comprehensive Survey on Test-Time Adaptation under Distribution Shifts
Machine learning methods strive to acquire a robust model during training
that can generalize well to test samples, even under distribution shifts.
However, these methods often suffer from a performance drop due to unknown test
distributions. Test-time adaptation (TTA), an emerging paradigm, has the
potential to adapt a pre-trained model to unlabeled data during testing, before
making predictions. Recent progress in this paradigm highlights the significant
benefits of utilizing unlabeled data for training self-adapted models prior to
inference. In this survey, we divide TTA into several distinct categories,
namely, test-time (source-free) domain adaptation, test-time batch adaptation,
online test-time adaptation, and test-time prior adaptation. For each category,
we provide a comprehensive taxonomy of advanced algorithms, followed by a
discussion of different learning scenarios. Furthermore, we analyze relevant
applications of TTA and discuss open challenges and promising areas for future
research. A comprehensive list of TTA methods can be found at
\url{https://github.com/tim-learn/awesome-test-time-adaptation}.Comment: Discussions, comments, and questions are all welcomed in
\url{https://github.com/tim-learn/awesome-test-time-adaptation
Localization and Mapping for Self-Driving Vehicles:A Survey
The upsurge of autonomous vehicles in the automobile industry will lead to better driving experiences while also enabling the users to solve challenging navigation problems. Reaching such capabilities will require significant technological attention and the flawless execution of various complex tasks, one of which is ensuring robust localization and mapping. Recent surveys have not provided a meaningful and comprehensive description of the current approaches in this field. Accordingly, this review is intended to provide adequate coverage of the problems affecting autonomous vehicles in this area, by examining the most recent methods for mapping and localization as well as related feature extraction and data security problems. First, a discussion of the contemporary methods of extracting relevant features from equipped sensors and their categorization as semantic, non-semantic, and deep learning methods is presented. We conclude that representativeness, low cost, and accessibility are crucial constraints in the choice of the methods to be adopted for localization and mapping tasks. Second, the survey focuses on methods to build a vehicle’s environment map, considering both the commercial and the academic solutions available. The analysis proposes a difference between two types of environment, known and unknown, and develops solutions in each case. Third, the survey explores different approaches to vehicles’ localization and also classifies them according to their mathematical characteristics and priorities. Each section concludes by presenting the related challenges and some future directions. The article also highlights the security problems likely to be encountered in self-driving vehicles, with an assessment of possible defense mechanisms that could prevent security attacks in vehicles. Finally, the article ends with a debate on the potential impacts of autonomous driving, spanning energy consumption and emission reduction, sound and light pollution, integration into smart cities, infrastructure optimization, and software refinement. This thorough investigation aims to foster a comprehensive understanding of the diverse implications of autonomous driving across various domains
Touch and Go: Learning from Human-Collected Vision and Touch
The ability to associate touch with sight is essential for tasks that require
physically interacting with objects in the world. We propose a dataset with
paired visual and tactile data called Touch and Go, in which human data
collectors probe objects in natural environments using tactile sensors, while
simultaneously recording egocentric video. In contrast to previous efforts,
which have largely been confined to lab settings or simulated environments, our
dataset spans a large number of "in the wild" objects and scenes. To
demonstrate our dataset's effectiveness, we successfully apply it to a variety
of tasks: 1) self-supervised visuo-tactile feature learning, 2) tactile-driven
image stylization, i.e., making the visual appearance of an object more
consistent with a given tactile signal, and 3) predicting future frames of a
tactile signal from visuo-tactile inputs.Comment: Accepted by NeurIPS 2022 Track of Datasets and Benchmark
Bridging the gap between reconstruction and synthesis
Aplicat embargament des de la data de defensa fins el 15 de gener de 20223D reconstruction and image synthesis are two of the main pillars in computer vision. Early works focused on simple tasks such as multi-view reconstruction and texture synthesis. With the spur of Deep Learning, the field has rapidly progressed, making it possible to achieve more complex and high level tasks. For example, the 3D reconstruction results of traditional multi-view approaches are currently obtained with single view methods. Similarly, early pattern based texture synthesis works have resulted in techniques that allow generating novel high-resolution images.
In this thesis we have developed a hierarchy of tools that cover all these range of problems, lying at the intersection of computer vision, graphics and machine learning. We tackle the problem of 3D reconstruction and synthesis in the wild. Importantly, we advocate for a paradigm in which not everything should be learned. Instead of applying Deep Learning naively we propose novel representations, layers and architectures that directly embed prior 3D geometric knowledge for the task of 3D reconstruction and synthesis. We apply these techniques to problems including scene/person reconstruction and photo-realistic rendering. We first address methods to reconstruct a scene and the clothed people in it while estimating the camera position. Then, we tackle image and video synthesis for clothed people in the wild. Finally, we bridge the gap between reconstruction and synthesis under the umbrella of a unique novel formulation. Extensive experiments conducted along this thesis show that the proposed techniques improve the performance of Deep Learning models in terms of the quality of the reconstructed 3D shapes / synthesised images, while reducing the amount of supervision and training data required to train them.
In summary, we provide a variety of low, mid and high level algorithms that can be used to incorporate prior knowledge into different stages of the Deep Learning pipeline and improve performance in tasks of 3D reconstruction and image synthesis.La reconstrucció 3D i la sÃntesi d'imatges són dos dels pilars fonamentals en visió per computador. Els estudis previs es centren en tasques senzilles com la reconstrucció amb informació multi-cà mera i la sÃntesi de textures. Amb l'aparició del "Deep Learning", aquest camp ha progressat rà pidament, fent possible assolir tasques molt més complexes. Per exemple, per obtenir una reconstrucció 3D, tradicionalment s'utilitzaven mètodes multi-cà mera, en canvi ara, es poden obtenir a partir d'una sola imatge. De la mateixa manera, els primers treballs de sÃntesi de textures basats en patrons han donat lloc a tècniques que permeten generar noves imatges completes en alta resolució. En aquesta tesi, hem desenvolupat una sèrie d'eines que cobreixen tot aquest ventall de problemes, situats en la intersecció entre la visió per computador, els grà fics i l'aprenentatge automà tic. Abordem el problema de la reconstrucció i la sÃntesi 3D en el món real. És important destacar que defensem un paradigma on no tot s'ha d'aprendre. Enlloc d'aplicar el "Deep Learning" de forma naïve, proposem representacions novedoses i arquitectures que incorporen directament els coneixements geomètrics ja existents per a aconseguir la reconstrucció 3D i la sÃntesi d'imatges. Nosaltres apliquem aquestes tècniques a problemes com ara la reconstrucció d'escenes/persones i a la renderització d'imatges fotorealistes. Primer abordem els mètodes per reconstruir una escena, les persones vestides que hi ha i la posició de la cà mera. A continuació, abordem la sÃntesi d'imatges i vÃdeos de persones vestides en situacions quotidianes. I finalment, aconseguim, a través d'una nova formulació única, connectar la reconstrucció amb la sÃntesi. Els experiments realitzats al llarg d'aquesta tesi demostren que les tècniques proposades milloren el rendiment dels models de "Deepp Learning" pel que fa a la qualitat de les reconstruccions i les imatges sintetitzades alhora que redueixen la quantitat de dades necessà ries per entrenar-los. En resum, proporcionem una varietat d'algoritmes de baix, mitjà i alt nivell que es poden utilitzar per incorporar els coneixements previs a les diferents etapes del "Deep Learning" i millorar el rendiment en tasques de reconstrucció 3D i sÃntesi d'imatges.Postprint (published version
- …