17 research outputs found
Unsupervised landmark discovery via self-training correspondence
Object parts, also known as landmarks, convey information about an object’s shape and spatial configuration in 3D space, especially for deformable objects. The goal of landmark detection is to have a model that, for a particular object instance, can estimate the locations of its parts. Research in this field is mainly driven by supervised approaches, where a sufficient amount of human-annotated data is available. As annotating landmarks for all objects is impractical, this thesis focuses on learning landmark detectors without supervision. Despite good performance on limited scenarios (objects showcasing minor rigid deformation), unsupervised landmark discovery mostly remains an open problem. Existing work fails to capture semantic landmarks, i.e. points similar to the ones assigned by human annotators and may not generalise well to highly articulated objects like the human body, complicated backgrounds or large viewpoint variations.
In this thesis, we propose a novel self-training framework for the discovery of unsupervised landmarks. Contrary to existing methods that build on auxiliary tasks such as image generation or equivariance, we depart from generic keypoints and train a landmark detector and descriptor to improve itself, tuning the keypoints into distinctive landmarks. We propose an iterative algorithm that alternates between producing new pseudo-labels through feature clustering and learning distinctive features for each pseudo-class through contrastive learning. Our detector can discover highly semantic landmarks, that are more flexible in terms of capturing large viewpoint changes and out-of-plane rotations (3D rotations). New state-of-the-art performance is achieved in multiple challenging datasets
Unsupervised landmark discovery via self-training correspondence
Object parts, also known as landmarks, convey information about an object’s shape and spatial configuration in 3D space, especially for deformable objects. The goal of landmark detection is to have a model that, for a particular object instance, can estimate the locations of its parts. Research in this field is mainly driven by supervised approaches, where a sufficient amount of human-annotated data is available. As annotating landmarks for all objects is impractical, this thesis focuses on learning landmark detectors without supervision. Despite good performance on limited scenarios (objects showcasing minor rigid deformation), unsupervised landmark discovery mostly remains an open problem. Existing work fails to capture semantic landmarks, i.e. points similar to the ones assigned by human annotators and may not generalise well to highly articulated objects like the human body, complicated backgrounds or large viewpoint variations.
In this thesis, we propose a novel self-training framework for the discovery of unsupervised landmarks. Contrary to existing methods that build on auxiliary tasks such as image generation or equivariance, we depart from generic keypoints and train a landmark detector and descriptor to improve itself, tuning the keypoints into distinctive landmarks. We propose an iterative algorithm that alternates between producing new pseudo-labels through feature clustering and learning distinctive features for each pseudo-class through contrastive learning. Our detector can discover highly semantic landmarks, that are more flexible in terms of capturing large viewpoint changes and out-of-plane rotations (3D rotations). New state-of-the-art performance is achieved in multiple challenging datasets
TransCAD: A Hierarchical Transformer for CAD Sequence Inference from Point Clouds
peer reviewed3D reverse engineering, in which a CAD model is inferred given a 3D scan of a
physical object, is a research direction that offers many promising practical
applications. This paper proposes TransCAD, an end-to-end transformer-based
architecture that predicts the CAD sequence from a point cloud. TransCAD
leverages the structure of CAD sequences by using a hierarchical learning
strategy. A loop refiner is also introduced to regress sketch primitive
parameters. Rigorous experimentation on the DeepCAD and Fusion360 datasets show
that TransCAD achieves state-of-the-art results. The result analysis is
supported with a proposed metric for CAD sequence, the mean Average Precision
of CAD Sequence, that addresses the limitations of existing metrics.IF/17052459/CASCADESBRIDGES2021/IS/16849599/FREE-3
SHARP Challenge 2023: Solving CAD History and pArameters Recovery from Point clouds and 3D scans. Overview, Datasets, Metrics, and Baselines.
peer reviewedRecent breakthroughs in geometric Deep Learning (DL) and the availability of large Computer-Aided Design (CAD) datasets have advanced the research on learning CAD modeling processes and relating them to real objects. In this context, 3D reverse engineering of CAD models from 3D scans is considered to be one of the most sought-after goals for the CAD industry. However, recent efforts assume multiple simplifications limiting the applications in real-world settings. The SHARP Challenge 2023 aims at pushing the research a step closer to the real-world scenario of CAD reverse engineering from 3D scans through dedicated datasets and tracks. In this paper, we define the proposed SHARP 2023 tracks, describe the provided datasets, and propose a set of baseline methods along with suitable evaluation metrics to assess the performance of the track solutions. All proposed datasets along with useful routines and the evaluation metrics are publicly available
Seamless fusion: multi-modal localization for first responders in challenging environments
In dynamic and unpredictable environments, the precise localization of first responders and rescuers is crucial for effective incident response. This paper introduces a novel approach leveraging three complementary localization modalities: visual-based, Galileo-based, and inertial-based. Each modality contributes uniquely to the final Fusion tool, facilitating seamless indoor and outdoor localization, offering a robust and accurate localization solution without reliance on pre-existing infrastructure, essential for maintaining responder safety and optimizing operational effectiveness. The visual-based localization method utilizes an RGB camera coupled with a modified implementation of the ORB-SLAM2 method, enabling operation with or without prior area scanning. The Galileo-based localization method employs a lightweight prototype equipped with a high-accuracy GNSS receiver board, tailored to meet the specific needs of first responders. The inertial-based localization method utilizes sensor fusion, primarily leveraging smartphone inertial measurement units, to predict and adjust first responders’ positions incrementally, compensating for the GPS signal attenuation indoors. A comprehensive validation test involving various environmental conditions was carried out to demonstrate the efficacy of the proposed fused localization tool. Our results show that our proposed solution always provides a location regardless of the conditions (indoors, outdoors, etc.), with an overall mean error of 1.73 m
Large expert-curated database for benchmarking document similarity detection in biomedical literature search
Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe
Convolutive Audio Source Separation Using Robust ICA and Reduced Likelihood Ratio Jump
Part 5: AIRuleBased Modeling (AIRUMO)International audienceAudio source separation is the task of isolating sound sources that are active simultaneously in a room captured by a set of microphones. Convolutive audio source separation of equal number of sources and microphones has a number of shortcomings including the complexity of frequency-domain ICA, the permutation ambiguity and the problem’s scalabity with increasing number of sensors. In this paper, the authors propose a multiple-microphone audio source separation algorithm based on a previous work of Mitianoudis and Davies [1]. Complex FastICA is substituted by Robust ICA increasing robustness and performance. Permutation ambiguity is solved using the Likelihood Ration Jump solution, which is now modified to decrease computational complexity in the case of multiple microphones
From Keypoints to Object Landmarks via Self-Training Correspondence: A novel approach to Unsupervised Landmark Discovery
This paper proposes a novel paradigm for the unsupervised learning of object
landmark detectors. Contrary to existing methods that build on auxiliary tasks
such as image generation or equivariance, we propose a self-training approach
where, departing from generic keypoints, a landmark detector and descriptor is
trained to improve itself, tuning the keypoints into distinctive landmarks. To
this end, we propose an iterative algorithm that alternates between producing
new pseudo-labels through feature clustering and learning distinctive features
for each pseudo-class through contrastive learning. With a shared backbone for
the landmark detector and descriptor, the keypoint locations progressively
converge to stable landmarks, filtering those less stable. Compared to previous
works, our approach can learn points that are more flexible in terms of
capturing large viewpoint changes. We validate our method on a variety of
difficult datasets, including LS3D, BBCPose, Human3.6M and PennAction,
achieving new state of the art results