6,171 research outputs found

    Cumulative object categorization in clutter

    Get PDF
    In this paper we present an approach based on scene- or part-graphs for geometrically categorizing touching and occluded objects. We use additive RGBD feature descriptors and hashing of graph configuration parameters for describing the spatial arrangement of constituent parts. The presented experiments quantify that this method outperforms our earlier part-voting and sliding window classification. We evaluated our approach on cluttered scenes, and by using a 3D dataset containing over 15000 Kinect scans of over 100 objects which were grouped into general geometric categories. Additionally, color, geometric, and combined features were compared for categorization tasks

    Unsupervised Learning of Semantic Audio Representations

    Full text link
    Even in the absence of any explicit semantic annotation, vast collections of audio recordings provide valuable information for learning the categorical structure of sounds. We consider several class-agnostic semantic constraints that apply to unlabeled nonspeech audio: (i) noise and translations in time do not change the underlying sound category, (ii) a mixture of two sound events inherits the categories of the constituents, and (iii) the categories of events in close temporal proximity are likely to be the same or related. Without labels to ground them, these constraints are incompatible with classification loss functions. However, they may still be leveraged to identify geometric inequalities needed for triplet loss-based training of convolutional neural networks. The result is low-dimensional embeddings of the input spectrograms that recover 41% and 84% of the performance of their fully-supervised counterparts when applied to downstream query-by-example sound retrieval and sound event classification tasks, respectively. Moreover, in limited-supervision settings, our unsupervised embeddings double the state-of-the-art classification performance.Comment: Submitted to ICASSP 201

    Fusion of Learned Multi-Modal Representations and Dense Trajectories for Emotional Analysis in Videos

    Get PDF
    When designing a video affective content analysis algorithm, one of the most important steps is the selection of discriminative features for the effective representation of video segments. The majority of existing affective content analysis methods either use low-level audio-visual features or generate handcrafted higher level representations based on these low-level features. We propose in this work to use deep learning methods, in particular convolutional neural networks (CNNs), in order to automatically learn and extract mid-level representations from raw data. To this end, we exploit the audio and visual modality of videos by employing Mel-Frequency Cepstral Coefficients (MFCC) and color values in the HSV color space. We also incorporate dense trajectory based motion features in order to further enhance the performance of the analysis. By means of multi-class support vector machines (SVMs) and fusion mechanisms, music video clips are classified into one of four affective categories representing the four quadrants of the Valence-Arousal (VA) space. Results obtained on a subset of the DEAP dataset show (1) that higher level representations perform better than low-level features, and (2) that incorporating motion information leads to a notable performance gain, independently from the chosen representation

    Anatomical Priors in Convolutional Networks for Unsupervised Biomedical Segmentation

    Full text link
    We consider the problem of segmenting a biomedical image into anatomical regions of interest. We specifically address the frequent scenario where we have no paired training data that contains images and their manual segmentations. Instead, we employ unpaired segmentation images to build an anatomical prior. Critically these segmentations can be derived from imaging data from a different dataset and imaging modality than the current task. We introduce a generative probabilistic model that employs the learned prior through a convolutional neural network to compute segmentations in an unsupervised setting. We conducted an empirical analysis of the proposed approach in the context of structural brain MRI segmentation, using a multi-study dataset of more than 14,000 scans. Our results show that an anatomical prior can enable fast unsupervised segmentation which is typically not possible using standard convolutional networks. The integration of anatomical priors can facilitate CNN-based anatomical segmentation in a range of novel clinical problems, where few or no annotations are available and thus standard networks are not trainable. The code is freely available at http://github.com/adalca/neuron.Comment: Presented at CVPR 2018. IEEE CVPR proceedings pp. 9290-929

    A Survey on Deep Learning in Medical Image Analysis

    Full text link
    Deep learning algorithms, in particular convolutional networks, have rapidly become a methodology of choice for analyzing medical images. This paper reviews the major deep learning concepts pertinent to medical image analysis and summarizes over 300 contributions to the field, most of which appeared in the last year. We survey the use of deep learning for image classification, object detection, segmentation, registration, and other tasks and provide concise overviews of studies per application area. Open challenges and directions for future research are discussed.Comment: Revised survey includes expanded discussion section and reworked introductory section on common deep architectures. Added missed papers from before Feb 1st 201
    corecore