689 research outputs found
Deep Shape Matching
We cast shape matching as metric learning with convolutional networks. We
break the end-to-end process of image representation into two parts. Firstly,
well established efficient methods are chosen to turn the images into edge
maps. Secondly, the network is trained with edge maps of landmark images, which
are automatically obtained by a structure-from-motion pipeline. The learned
representation is evaluated on a range of different tasks, providing
improvements on challenging cases of domain generalization, generic
sketch-based image retrieval or its fine-grained counterpart. In contrast to
other methods that learn a different model per task, object category, or
domain, we use the same network throughout all our experiments, achieving
state-of-the-art results in multiple benchmarks.Comment: ECCV 201
Multimodal Deep Learning for Robust RGB-D Object Recognition
Robust object recognition is a crucial ingredient of many, if not all,
real-world robotics applications. This paper leverages recent progress on
Convolutional Neural Networks (CNNs) and proposes a novel RGB-D architecture
for object recognition. Our architecture is composed of two separate CNN
processing streams - one for each modality - which are consecutively combined
with a late fusion network. We focus on learning with imperfect sensor data, a
typical problem in real-world robotics tasks. For accurate learning, we
introduce a multi-stage training methodology and two crucial ingredients for
handling depth data with CNNs. The first, an effective encoding of depth
information for CNNs that enables learning without the need for large depth
datasets. The second, a data augmentation scheme for robust learning with depth
images by corrupting them with realistic noise patterns. We present
state-of-the-art results on the RGB-D object dataset and show recognition in
challenging RGB-D real-world noisy settings.Comment: Final version submitted to IROS'2015, results unchanged,
reformulation of some text passages in abstract and introductio
Gesture Recognition in Robotic Surgery: a Review
OBJECTIVE: Surgical activity recognition is a fundamental step in computer-assisted interventions. This paper reviews the state-of-the-art in methods for automatic recognition of fine-grained gestures in robotic surgery focusing on recent data-driven approaches and outlines the open questions and future research directions. METHODS: An article search was performed on 5 bibliographic databases with combinations of the following search terms: robotic, robot-assisted, JIGSAWS, surgery, surgical, gesture, fine-grained, surgeme, action, trajectory, segmentation, recognition, parsing. Selected articles were classified based on the level of supervision required for training and divided into different groups representing major frameworks for time series analysis and data modelling. RESULTS: A total of 52 articles were reviewed. The research field is showing rapid expansion, with the majority of articles published in the last 4 years. Deep-learning-based temporal models with discriminative feature extraction and multi-modal data integration have demonstrated promising results on small surgical datasets. Currently, unsupervised methods perform significantly less well than the supervised approaches. CONCLUSION: The development of large and diverse open-source datasets of annotated demonstrations is essential for development and validation of robust solutions for surgical gesture recognition. While new strategies for discriminative feature extraction and knowledge transfer, or unsupervised and semi-supervised approaches, can mitigate the need for data and labels, they have not yet been demonstrated to achieve comparable performance. Important future research directions include detection and forecast of gesture-specific errors and anomalies. SIGNIFICANCE: This paper is a comprehensive and structured analysis of surgical gesture recognition methods aiming to summarize the status of this rapidly evolving field
Hybrid Representation Learning for Cognitive Diagnosis in Late-Life Depression Over 5 Years with Structural MRI
Late-life depression (LLD) is a highly prevalent mood disorder occurring in
older adults and is frequently accompanied by cognitive impairment (CI).
Studies have shown that LLD may increase the risk of Alzheimer's disease (AD).
However, the heterogeneity of presentation of geriatric depression suggests
that multiple biological mechanisms may underlie it. Current biological
research on LLD progression incorporates machine learning that combines
neuroimaging data with clinical observations. There are few studies on incident
cognitive diagnostic outcomes in LLD based on structural MRI (sMRI). In this
paper, we describe the development of a hybrid representation learning (HRL)
framework for predicting cognitive diagnosis over 5 years based on T1-weighted
sMRI data. Specifically, we first extract prediction-oriented MRI features via
a deep neural network, and then integrate them with handcrafted MRI features
via a Transformer encoder for cognitive diagnosis prediction. Two tasks are
investigated in this work, including (1) identifying cognitively normal
subjects with LLD and never-depressed older healthy subjects, and (2)
identifying LLD subjects who developed CI (or even AD) and those who stayed
cognitively normal over five years. To the best of our knowledge, this is among
the first attempts to study the complex heterogeneous progression of LLD based
on task-oriented and handcrafted MRI features. We validate the proposed HRL on
294 subjects with T1-weighted MRIs from two clinically harmonized studies.
Experimental results suggest that the HRL outperforms several classical machine
learning and state-of-the-art deep learning methods in LLD identification and
prediction tasks
Chasing a consistent picture for dark matter direct searches
In this paper we assess the present status of dark matter direct searches by
means of Bayesian statistics. We consider three particle physics models for
spin-independent dark matter interaction with nuclei: elastic, inelastic and
isospin violating scattering. We shortly present the state of the art for the
three models, marginalising over experimental systematics and astrophysical
uncertainties. Whatever the scenario is, XENON100 appears to challenge the
detection region of DAMA, CoGeNT and CRESST. The first aim of this study is to
rigorously quantify the significance of the inconsistency between XENON100 data
and the combined set of detection (DAMA, CoGeNT and CRESST together),
performing two statistical tests based on the Bayesian evidence. We show that
XENON100 and the combined set are inconsistent at least at 2 sigma level in all
scenarios but inelastic scattering, for which the disagreement drops to 1 sigma
level. Secondly we consider only the combined set and hunt the best particle
physics model that accounts for the events, using Bayesian model comparison.
The outcome between elastic and isospin violating scattering is inconclusive,
with the odds 2:1, while inelastic scattering is disfavoured with the odds of
1:32 because of CoGeNT data. Our results are robust under reasonable prior
assumptions. We conclude that the simple elastic scattering remains the best
model to explain the detection regions, since the data do not support extra
free parameters. Present direct searches therefore are not able to constrain
the particle physics interaction of the dark matter. The outcome of consistency
tests implies that either a better understanding of astrophysical and
experimental uncertainties is needed, either the dark matter theoretical model
is at odds with the data.Comment: 18 pages, 8 figures and 7 tables; minor revisions following referee
report. Accepted for publication in Phys.Rev.
Characterization of wastewater methane emission sources with computer vision and remote sensing
Methane emissions are responsible for at least one-third of the total anthropogenic climate forcing and current estimations expect a significant increase in these emissions in the next decade. Consequently, methane offers a unique opportunity to mitigate climate change while addressing energy supply problems. From the five primary methane sources, residual water treatment provided 7% of the emissions in 2010. This ratio will undoubtedly increase with global population growth. Therefore, locating sources of methane emissions is a crucial step in characterizing the current distribution of GHG better. Nevertheless, there is a lack of comprehensive global and uniform databases to bind those emissions to concrete sources and there is no automatic method to accurately locate sparse human infrastructures such as wastewater treatment plants (WWTPs). WWTP detection is an open problem posing many obstacles due to the lack of freely accessible high-resolution imagery, and the variety of real-world morphologies and sizes. In this work, we tackle this state-of-the-art complex problem and go one step forward by trying to infer capacity using one end-to-end Deep Learning architecture and multi-modal remote sensing data. This goal has a groundbreaking potential impact, as it could help estimate mapped methane emissions for improving emission inventories and future scenarios prediction. We will address the problem as a combination of two parallel inference exercises by proposing a novel network to combine multimodal data based on the hypothesis that the location and the capacity can be inferred based on characteristics such as the plant situation, size, morphology, and proximity to water bodies or population centers. We explore technical documentation and literature to develop these hypotheses and validate their soundness with data analysis. To validate the architecture and the hypotheses, we develop a model and a dataset in parallel with a series of ablation tests. The process is facilitated by an automatic pipeline, also developed in this work, to create datasets and validate models leveraging those datasets. We test the best-obtained model at scale on a mosaic composed of satellite imagery covering the region of Catalonia. The goal is to find plants not previously labeled but present in wastewater treatment plant (WWTP) databases and to compare the distribution and magnitude of the inferred capacity with the ground truth. Results show that we can achieve state-of-the-art results by locating more than half of the labeled plants with the same precision ratio and by only using orthophotos from multispectral imagery. Moreover, we demonstrate that additional data sources related to water basins and population are valuable resources that the model can exploit to infer WWTP capacity. During the process, we also demonstrate the benefit of using negative instances to train our model and the impact of using an appropriate loss function such as Dice's loss
- …