689 research outputs found

    Deep Shape Matching

    Full text link
    We cast shape matching as metric learning with convolutional networks. We break the end-to-end process of image representation into two parts. Firstly, well established efficient methods are chosen to turn the images into edge maps. Secondly, the network is trained with edge maps of landmark images, which are automatically obtained by a structure-from-motion pipeline. The learned representation is evaluated on a range of different tasks, providing improvements on challenging cases of domain generalization, generic sketch-based image retrieval or its fine-grained counterpart. In contrast to other methods that learn a different model per task, object category, or domain, we use the same network throughout all our experiments, achieving state-of-the-art results in multiple benchmarks.Comment: ECCV 201

    Multimodal Deep Learning for Robust RGB-D Object Recognition

    Full text link
    Robust object recognition is a crucial ingredient of many, if not all, real-world robotics applications. This paper leverages recent progress on Convolutional Neural Networks (CNNs) and proposes a novel RGB-D architecture for object recognition. Our architecture is composed of two separate CNN processing streams - one for each modality - which are consecutively combined with a late fusion network. We focus on learning with imperfect sensor data, a typical problem in real-world robotics tasks. For accurate learning, we introduce a multi-stage training methodology and two crucial ingredients for handling depth data with CNNs. The first, an effective encoding of depth information for CNNs that enables learning without the need for large depth datasets. The second, a data augmentation scheme for robust learning with depth images by corrupting them with realistic noise patterns. We present state-of-the-art results on the RGB-D object dataset and show recognition in challenging RGB-D real-world noisy settings.Comment: Final version submitted to IROS'2015, results unchanged, reformulation of some text passages in abstract and introductio

    Gesture Recognition in Robotic Surgery: a Review

    Get PDF
    OBJECTIVE: Surgical activity recognition is a fundamental step in computer-assisted interventions. This paper reviews the state-of-the-art in methods for automatic recognition of fine-grained gestures in robotic surgery focusing on recent data-driven approaches and outlines the open questions and future research directions. METHODS: An article search was performed on 5 bibliographic databases with combinations of the following search terms: robotic, robot-assisted, JIGSAWS, surgery, surgical, gesture, fine-grained, surgeme, action, trajectory, segmentation, recognition, parsing. Selected articles were classified based on the level of supervision required for training and divided into different groups representing major frameworks for time series analysis and data modelling. RESULTS: A total of 52 articles were reviewed. The research field is showing rapid expansion, with the majority of articles published in the last 4 years. Deep-learning-based temporal models with discriminative feature extraction and multi-modal data integration have demonstrated promising results on small surgical datasets. Currently, unsupervised methods perform significantly less well than the supervised approaches. CONCLUSION: The development of large and diverse open-source datasets of annotated demonstrations is essential for development and validation of robust solutions for surgical gesture recognition. While new strategies for discriminative feature extraction and knowledge transfer, or unsupervised and semi-supervised approaches, can mitigate the need for data and labels, they have not yet been demonstrated to achieve comparable performance. Important future research directions include detection and forecast of gesture-specific errors and anomalies. SIGNIFICANCE: This paper is a comprehensive and structured analysis of surgical gesture recognition methods aiming to summarize the status of this rapidly evolving field

    Hybrid Representation Learning for Cognitive Diagnosis in Late-Life Depression Over 5 Years with Structural MRI

    Full text link
    Late-life depression (LLD) is a highly prevalent mood disorder occurring in older adults and is frequently accompanied by cognitive impairment (CI). Studies have shown that LLD may increase the risk of Alzheimer's disease (AD). However, the heterogeneity of presentation of geriatric depression suggests that multiple biological mechanisms may underlie it. Current biological research on LLD progression incorporates machine learning that combines neuroimaging data with clinical observations. There are few studies on incident cognitive diagnostic outcomes in LLD based on structural MRI (sMRI). In this paper, we describe the development of a hybrid representation learning (HRL) framework for predicting cognitive diagnosis over 5 years based on T1-weighted sMRI data. Specifically, we first extract prediction-oriented MRI features via a deep neural network, and then integrate them with handcrafted MRI features via a Transformer encoder for cognitive diagnosis prediction. Two tasks are investigated in this work, including (1) identifying cognitively normal subjects with LLD and never-depressed older healthy subjects, and (2) identifying LLD subjects who developed CI (or even AD) and those who stayed cognitively normal over five years. To the best of our knowledge, this is among the first attempts to study the complex heterogeneous progression of LLD based on task-oriented and handcrafted MRI features. We validate the proposed HRL on 294 subjects with T1-weighted MRIs from two clinically harmonized studies. Experimental results suggest that the HRL outperforms several classical machine learning and state-of-the-art deep learning methods in LLD identification and prediction tasks

    Chasing a consistent picture for dark matter direct searches

    Full text link
    In this paper we assess the present status of dark matter direct searches by means of Bayesian statistics. We consider three particle physics models for spin-independent dark matter interaction with nuclei: elastic, inelastic and isospin violating scattering. We shortly present the state of the art for the three models, marginalising over experimental systematics and astrophysical uncertainties. Whatever the scenario is, XENON100 appears to challenge the detection region of DAMA, CoGeNT and CRESST. The first aim of this study is to rigorously quantify the significance of the inconsistency between XENON100 data and the combined set of detection (DAMA, CoGeNT and CRESST together), performing two statistical tests based on the Bayesian evidence. We show that XENON100 and the combined set are inconsistent at least at 2 sigma level in all scenarios but inelastic scattering, for which the disagreement drops to 1 sigma level. Secondly we consider only the combined set and hunt the best particle physics model that accounts for the events, using Bayesian model comparison. The outcome between elastic and isospin violating scattering is inconclusive, with the odds 2:1, while inelastic scattering is disfavoured with the odds of 1:32 because of CoGeNT data. Our results are robust under reasonable prior assumptions. We conclude that the simple elastic scattering remains the best model to explain the detection regions, since the data do not support extra free parameters. Present direct searches therefore are not able to constrain the particle physics interaction of the dark matter. The outcome of consistency tests implies that either a better understanding of astrophysical and experimental uncertainties is needed, either the dark matter theoretical model is at odds with the data.Comment: 18 pages, 8 figures and 7 tables; minor revisions following referee report. Accepted for publication in Phys.Rev.

    Characterization of wastewater methane emission sources with computer vision and remote sensing

    Get PDF
    Methane emissions are responsible for at least one-third of the total anthropogenic climate forcing and current estimations expect a significant increase in these emissions in the next decade. Consequently, methane offers a unique opportunity to mitigate climate change while addressing energy supply problems. From the five primary methane sources, residual water treatment provided 7% of the emissions in 2010. This ratio will undoubtedly increase with global population growth. Therefore, locating sources of methane emissions is a crucial step in characterizing the current distribution of GHG better. Nevertheless, there is a lack of comprehensive global and uniform databases to bind those emissions to concrete sources and there is no automatic method to accurately locate sparse human infrastructures such as wastewater treatment plants (WWTPs). WWTP detection is an open problem posing many obstacles due to the lack of freely accessible high-resolution imagery, and the variety of real-world morphologies and sizes. In this work, we tackle this state-of-the-art complex problem and go one step forward by trying to infer capacity using one end-to-end Deep Learning architecture and multi-modal remote sensing data. This goal has a groundbreaking potential impact, as it could help estimate mapped methane emissions for improving emission inventories and future scenarios prediction. We will address the problem as a combination of two parallel inference exercises by proposing a novel network to combine multimodal data based on the hypothesis that the location and the capacity can be inferred based on characteristics such as the plant situation, size, morphology, and proximity to water bodies or population centers. We explore technical documentation and literature to develop these hypotheses and validate their soundness with data analysis. To validate the architecture and the hypotheses, we develop a model and a dataset in parallel with a series of ablation tests. The process is facilitated by an automatic pipeline, also developed in this work, to create datasets and validate models leveraging those datasets. We test the best-obtained model at scale on a mosaic composed of satellite imagery covering the region of Catalonia. The goal is to find plants not previously labeled but present in wastewater treatment plant (WWTP) databases and to compare the distribution and magnitude of the inferred capacity with the ground truth. Results show that we can achieve state-of-the-art results by locating more than half of the labeled plants with the same precision ratio and by only using orthophotos from multispectral imagery. Moreover, we demonstrate that additional data sources related to water basins and population are valuable resources that the model can exploit to infer WWTP capacity. During the process, we also demonstrate the benefit of using negative instances to train our model and the impact of using an appropriate loss function such as Dice's loss
    corecore