3,306 research outputs found

    A Deep and Autoregressive Approach for Topic Modeling of Multimodal Data

    Full text link
    Topic modeling based on latent Dirichlet allocation (LDA) has been a framework of choice to deal with multimodal data, such as in image annotation tasks. Another popular approach to model the multimodal data is through deep neural networks, such as the deep Boltzmann machine (DBM). Recently, a new type of topic model called the Document Neural Autoregressive Distribution Estimator (DocNADE) was proposed and demonstrated state-of-the-art performance for text document modeling. In this work, we show how to successfully apply and extend this model to multimodal data, such as simultaneous image classification and annotation. First, we propose SupDocNADE, a supervised extension of DocNADE, that increases the discriminative power of the learned hidden topic features and show how to employ it to learn a joint representation from image visual words, annotation words and class label information. We test our model on the LabelMe and UIUC-Sports data sets and show that it compares favorably to other topic models. Second, we propose a deep extension of our model and provide an efficient way of training the deep model. Experimental results show that our deep model outperforms its shallow version and reaches state-of-the-art performance on the Multimedia Information Retrieval (MIR) Flickr data set.Comment: 24 pages, 10 figures. A version has been accepted by TPAMI on Aug 4th, 2015. Add footnote about how to train the model in practice in Section 5.1. arXiv admin note: substantial text overlap with arXiv:1305.530

    Higher-order Occurrence Pooling on Mid- and Low-level Features: Visual Concept Detection

    Get PDF
    In object recognition, the Bag-of-Words model assumes: i) extraction of local descriptors from images, ii) embedding these descriptors by a coder to a given visual vocabulary space which results in so-called mid-level features, iii) extracting statistics from mid-level features with a pooling operator that aggregates occurrences of visual words in images into so-called signatures. As the last step aggregates only occurrences of visual words, it is called as First-order Occurrence Pooling. This paper investigates higher-order approaches. We propose to aggregate over co-occurrences of visual words, derive Bag-of-Words with Second- and Higher-order Occurrence Pooling based on linearisation of so-called Minor Polynomial Kernel, and extend this model to work with adequate pooling operators. For bi- and multi-modal coding, a novel higher-order fusion is derived. We show that the well-known Spatial Pyramid Matching and related methods constitute its special cases. Moreover, we propose Third-order Occurrence Pooling directly on local image descriptors and a novel pooling operator that removes undesired correlation from the image signatures. Finally, Uni- and Bi-modal First-, Second-, and Third-order Occurrence Pooling are evaluated given various coders and pooling operators. The proposed methods are compared to other approaches (e.g. Fisher Vector Encoding) in the same testbed and attain state-of-the-art results

    A deep representation for depth images from synthetic data

    Full text link
    Convolutional Neural Networks (CNNs) trained on large scale RGB databases have become the secret sauce in the majority of recent approaches for object categorization from RGB-D data. Thanks to colorization techniques, these methods exploit the filters learned from 2D images to extract meaningful representations in 2.5D. Still, the perceptual signature of these two kind of images is very different, with the first usually strongly characterized by textures, and the second mostly by silhouettes of objects. Ideally, one would like to have two CNNs, one for RGB and one for depth, each trained on a suitable data collection, able to capture the perceptual properties of each channel for the task at hand. This has not been possible so far, due to the lack of a suitable depth database. This paper addresses this issue, proposing to opt for synthetically generated images rather than collecting by hand a 2.5D large scale database. While being clearly a proxy for real data, synthetic images allow to trade quality for quantity, making it possible to generate a virtually infinite amount of data. We show that the filters learned from such data collection, using the very same architecture typically used on visual data, learns very different filters, resulting in depth features (a) able to better characterize the different facets of depth images, and (b) complementary with respect to those derived from CNNs pre-trained on 2D datasets. Experiments on two publicly available databases show the power of our approach

    Local Defects in colloidal quantum dot thin films measured via spatially resolved multi-modal optoelectronic spectroscopy.

    Full text link
    The morphology, chemical composition, and electronic uniformity of thin-film solution-processed optoelectronics are believed to greatly affect device performance. Although scanning probe microscopies can address variations on the micrometer scale, the field of view is still limited to well under the typical device area, as well as the size of extrinsic defects introduced during fabrication. Herein, a micrometer-resolution 2D characterization method with millimeter-scale field of view is demonstrated, which simultaneously collects photoluminescence spectra, photocurrent transients, and photovoltage transients. This high-resolution morphology mapping is used to quantify the distribution and strength of the local optoelectronic property variations in colloidal quantum dot solar cells due to film defects, physical damage, and contaminants across nearly the entire test device area, and the extent to which these variations account for overall performance losses. It is found that macroscopic defects have effects that are confined to their localized areas, rarely prove fatal for device performance, and are largely not responsible for device shunting. Moreover, quantitative analysis based on statistical partitioning methods of such data is used to show how defect identification can be automated while identifying variations in underlying properties such as mobilities and recombination strengths and the mechanisms by which they govern device behavior.DMR-1807342 - National Science Foundation; Hopkins Extreme Materials InstituteAccepted manuscrip

    A multimodal neuroimaging classifier for alcohol dependence

    Get PDF
    With progress in magnetic resonance imaging technology and a broader dissemination of state-of-the-art imaging facilities, the acquisition of multiple neuroimaging modalities is becoming increasingly feasible. One particular hope associated with multimodal neuroimaging is the development of reliable data-driven diagnostic classifiers for psychiatric disorders, yet previous studies have often failed to find a benefit of combining multiple modalities. As a psychiatric disorder with established neurobiological effects at several levels of description, alcohol dependence is particularly well-suited for multimodal classification. To this aim, we developed a multimodal classification scheme and applied it to a rich neuroimaging battery (structural, functional task-based and functional resting-state data) collected in a matched sample of alcohol-dependent patients (N = 119) and controls (N = 97). We found that our classification scheme yielded 79.3% diagnostic accuracy, which outperformed the strongest individual modality - grey-matter density - by 2.7%. We found that this moderate benefit of multimodal classification depended on a number of critical design choices: a procedure to select optimal modality-specific classifiers, a fine-grained ensemble prediction based on cross-modal weight matrices and continuous classifier decision values. We conclude that the combination of multiple neuroimaging modalities is able to moderately improve the accuracy of machine-learning-based diagnostic classification in alcohol dependence

    A multimodal neuroimaging classifier for alcohol dependence

    Get PDF
    With progress in magnetic resonance imaging technology and a broader dissemination of state-of-the-art imaging facilities, the acquisition of multiple neuroimaging modalities is becoming increasingly feasible. One particular hope associated with multimodal neuroimaging is the development of reliable data-driven diagnostic classifiers for psychiatric disorders, yet previous studies have often failed to find a benefit of combining multiple modalities. As a psychiatric disorder with established neurobiological effects at several levels of description, alcohol dependence is particularly well-suited for multimodal classification. To this aim, we developed a multimodal classification scheme and applied it to a rich neuroimaging battery (structural, functional task-based and functional resting-state data) collected in a matched sample of alcohol-dependent patients (N = 119) and controls (N = 97). We found that our classification scheme yielded 79.3% diagnostic accuracy, which outperformed the strongest individual modality - grey-matter density - by 2.7%. We found that this moderate benefit of multimodal classification depended on a number of critical design choices: a procedure to select optimal modality-specific classifiers, a fine-grained ensemble prediction based on cross-modal weight matrices and continuous classifier decision values. We conclude that the combination of multiple neuroimaging modalities is able to moderately improve the accuracy of machine-learning-based diagnostic classification in alcohol dependence
    • …
    corecore