14,193 research outputs found

    A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community

    Full text link
    In recent years, deep learning (DL), a re-branding of neural networks (NNs), has risen to the top in numerous areas, namely computer vision (CV), speech recognition, natural language processing, etc. Whereas remote sensing (RS) possesses a number of unique challenges, primarily related to sensors and applications, inevitably RS draws from many of the same theories as CV; e.g., statistics, fusion, and machine learning, to name a few. This means that the RS community should be aware of, if not at the leading edge of, of advancements like DL. Herein, we provide the most comprehensive survey of state-of-the-art RS DL research. We also review recent new developments in the DL field that can be used in DL for RS. Namely, we focus on theories, tools and challenges for the RS community. Specifically, we focus on unsolved challenges and opportunities as it relates to (i) inadequate data sets, (ii) human-understandable solutions for modelling physical phenomena, (iii) Big Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and learning algorithms for spectral, spatial and temporal data, (vi) transfer learning, (vii) an improved theoretical understanding of DL systems, (viii) high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote Sensin

    Temporal Segmentation of Surgical Sub-tasks through Deep Learning with Multiple Data Sources

    Get PDF
    Many tasks in robot-assisted surgeries (RAS) can be represented by finite-state machines (FSMs), where each state represents either an action (such as picking up a needle) or an observation (such as bleeding). A crucial step towards the automation of such surgical tasks is the temporal perception of the current surgical scene, which requires a real-time estimation of the states in the FSMs. The objective of this work is to estimate the current state of the surgical task based on the actions performed or events occurred as the task progresses. We propose Fusion-KVE, a unified surgical state estimation model that incorporates multiple data sources including the Kinematics, Vision, and system Events. Additionally, we examine the strengths and weaknesses of different state estimation models in segmenting states with different representative features or levels of granularity. We evaluate our model on the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS), as well as a more complex dataset involving robotic intra-operative ultrasound (RIOUS) imaging, created using the da Vinci® Xi surgical system. Our model achieves a superior frame-wise state estimation accuracy up to 89.4%, which improves the state-of-the-art surgical state estimation models in both JIGSAWS suturing dataset and our RIOUS dataset

    Multiscale Discriminant Saliency for Visual Attention

    Full text link
    The bottom-up saliency, an early stage of humans' visual attention, can be considered as a binary classification problem between center and surround classes. Discriminant power of features for the classification is measured as mutual information between features and two classes distribution. The estimated discrepancy of two feature classes very much depends on considered scale levels; then, multi-scale structure and discriminant power are integrated by employing discrete wavelet features and Hidden markov tree (HMT). With wavelet coefficients and Hidden Markov Tree parameters, quad-tree like label structures are constructed and utilized in maximum a posterior probability (MAP) of hidden class variables at corresponding dyadic sub-squares. Then, saliency value for each dyadic square at each scale level is computed with discriminant power principle and the MAP. Finally, across multiple scales is integrated the final saliency map by an information maximization rule. Both standard quantitative tools such as NSS, LCC, AUC and qualitative assessments are used for evaluating the proposed multiscale discriminant saliency method (MDIS) against the well-know information-based saliency method AIM on its Bruce Database wity eye-tracking data. Simulation results are presented and analyzed to verify the validity of MDIS as well as point out its disadvantages for further research direction.Comment: 16 pages, ICCSA 2013 - BIOCA sessio

    Monte Carlo optimization of decentralized estimation networks over directed acyclic graphs under communication constraints

    Get PDF
    Motivated by the vision of sensor networks, we consider decentralized estimation networks over bandwidth–limited communication links, and are particularly interested in the tradeoff between the estimation accuracy and the cost of communications due to, e.g., energy consumption. We employ a class of in–network processing strategies that admits directed acyclic graph representations and yields a tractable Bayesian risk that comprises the cost of communications and estimation error penalty. This perspective captures a broad range of possibilities for processing under network constraints and enables a rigorous design problem in the form of constrained optimization. A similar scheme and the structures exhibited by the solutions have been previously studied in the context of decentralized detection. Under reasonable assumptions, the optimization can be carried out in a message passing fashion. We adopt this framework for estimation, however, the corresponding optimization scheme involves integral operators that cannot be evaluated exactly in general. We develop an approximation framework using Monte Carlo methods and obtain particle representations and approximate computational schemes for both the in–network processing strategies and their optimization. The proposed Monte Carlo optimization procedure operates in a scalable and efficient fashion and, owing to the non-parametric nature, can produce results for any distributions provided that samples can be produced from the marginals. In addition, this approach exhibits graceful degradation of the estimation accuracy asymptotically as the communication becomes more costly, through a parameterized Bayesian risk

    A markovian approach to unsupervised change detection with multiresolution and multimodality SAR data

    Get PDF
    In the framework of synthetic aperture radar (SAR) systems, current satellite missions make it possible to acquire images at very high and multiple spatial resolutions with short revisit times. This scenario conveys a remarkable potential in applications to, for instance, environmental monitoring and natural disaster recovery. In this context, data fusion and change detection methodologies play major roles. This paper proposes an unsupervised change detection algorithmfor the challenging case of multimodal SAR data collected by sensors operating atmultiple spatial resolutions. The method is based on Markovian probabilistic graphical models, graph cuts, linear mixtures, generalized Gaussian distributions, Gram-Charlier approximations, maximum likelihood and minimum mean squared error estimation. It benefits from the SAR images acquired at multiple spatial resolutions and with possibly different modalities on the considered acquisition times to generate an output change map at the finest observed resolution. This is accomplished by modeling the statistics of the data at the various spatial scales through appropriate generalized Gaussian distributions and by iteratively estimating a set of virtual images that are defined on the pixel grid at the finest resolution and would be collected if all the sensors could work at that resolution. A Markov random field framework is adopted to address the detection problem by defining an appropriate multimodal energy function that is minimized using graph cuts

    Discriminatively Trained Latent Ordinal Model for Video Classification

    Full text link
    We study the problem of video classification for facial analysis and human action recognition. We propose a novel weakly supervised learning method that models the video as a sequence of automatically mined, discriminative sub-events (eg. onset and offset phase for "smile", running and jumping for "highjump"). The proposed model is inspired by the recent works on Multiple Instance Learning and latent SVM/HCRF -- it extends such frameworks to model the ordinal aspect in the videos, approximately. We obtain consistent improvements over relevant competitive baselines on four challenging and publicly available video based facial analysis datasets for prediction of expression, clinical pain and intent in dyadic conversations and on three challenging human action datasets. We also validate the method with qualitative results and show that they largely support the intuitions behind the method.Comment: Paper accepted in IEEE TPAMI. arXiv admin note: substantial text overlap with arXiv:1604.0150
    • …
    corecore