1,210 research outputs found

    Learning Contextualized Semantics from Co-occurring Terms via a Siamese Architecture

    Get PDF
    One of the biggest challenges in Multimedia information retrieval and understanding is to bridge the semantic gap by properly modeling concept semantics in context. The presence of out of vocabulary (OOV) concepts exacerbates this difficulty. To address the semantic gap issues, we formulate a problem on learning contextualized semantics from descriptive terms and propose a novel Siamese architecture to model the contextualized semantics from descriptive terms. By means of pattern aggregation and probabilistic topic models, our Siamese architecture captures contextualized semantics from the co-occurring descriptive terms via unsupervised learning, which leads to a concept embedding space of the terms in context. Furthermore, the co-occurring OOV concepts can be easily represented in the learnt concept embedding space. The main properties of the concept embedding space are demonstrated via visualization. Using various settings in semantic priming, we have carried out a thorough evaluation by comparing our approach to a number of state-of-the-art methods on six annotation corpora in different domains, i.e., MagTag5K, CAL500 and Million Song Dataset in the music domain as well as Corel5K, LabelMe and SUNDatabase in the image domain. Experimental results on semantic priming suggest that our approach outperforms those state-of-the-art methods considerably in various aspects

    Music classification by low-rank semantic mappings

    Get PDF
    A challenging open question in music classification is which music representation (i.e., audio features) and which machine learning algorithm is appropriate for a specific music classification task. To address this challenge, given a number of audio feature vectors for each training music recording that capture the different aspects of music (i.e., timbre, harmony, etc.), the goal is to find a set of linear mappings from several feature spaces to the semantic space spanned by the class indicator vectors. These mappings should reveal the common latent variables, which characterize a given set of classes and simultaneously define a multi-class linear classifier that classifies the extracted latent common features. Such a set of mappings is obtained, building on the notion of the maximum margin matrix factorization, by minimizing a weighted sum of nuclear norms. Since the nuclear norm imposes rank constraints to the learnt mappings, the proposed method is referred to as low-rank semantic mappings (LRSMs). The performance of the LRSMs in music genre, mood, and multi-label classification is assessed by conducting extensive experiments on seven manually annotated benchmark datasets. The reported experimental results demonstrate the superiority of the LRSMs over the classifiers that are compared to. Furthermore, the best reported classification results are comparable with or slightly superior to those obtained by the state-of-the-art task-specific music classification methods

    Modelling Digital Media Objects

    Get PDF

    Retrieval and Annotation of Music Using Latent Semantic Models

    Get PDF
    PhDThis thesis investigates the use of latent semantic models for annotation and retrieval from collections of musical audio tracks. In particular latent semantic analysis (LSA) and aspect models (or probabilistic latent semantic analysis, pLSA) are used to index words in descriptions of music drawn from hundreds of thousands of social tags. A new discrete audio feature representation is introduced to encode musical characteristics of automatically-identified regions of interest within each track, using a vocabulary of audio muswords. Finally a joint aspect model is developed that can learn from both tagged and untagged tracks by indexing both conventional words and muswords. This model is used as the basis of a music search system that supports query by example and by keyword, and of a simple probabilistic machine annotation system. The models are evaluated by their performance in a variety of realistic retrieval and annotation tasks, motivated by applications including playlist generation, internet radio streaming, music recommendation and catalogue searchEngineering and Physical Sciences Research Counci

    TagBook: A Semantic Video Representation without Supervision for Event Detection

    Get PDF
    We consider the problem of event detection in video for scenarios where only few, or even zero examples are available for training. For this challenging setting, the prevailing solutions in the literature rely on a semantic video representation obtained from thousands of pre-trained concept detectors. Different from existing work, we propose a new semantic video representation that is based on freely available social tagged videos only, without the need for training any intermediate concept detectors. We introduce a simple algorithm that propagates tags from a video's nearest neighbors, similar in spirit to the ones used for image retrieval, but redesign it for video event detection by including video source set refinement and varying the video tag assignment. We call our approach TagBook and study its construction, descriptiveness and detection performance on the TRECVID 2013 and 2014 multimedia event detection datasets and the Columbia Consumer Video dataset. Despite its simple nature, the proposed TagBook video representation is remarkably effective for few-example and zero-example event detection, even outperforming very recent state-of-the-art alternatives building on supervised representations.Comment: accepted for publication as a regular paper in the IEEE Transactions on Multimedi

    Music classification by low-rank semantic mappings

    Get PDF
    A challenging open question in music classification is which music representation (i.e., audio features) and which machine learning algorithm is appropriate for a specific music classification task. To address this challenge, given a number of audio feature vectors for each training music recording that capture the different aspects of music (i.e., timbre, harmony, etc.), the goal is to find a set of linear mappings from several feature spaces to the semantic space spanned by the class indicator vectors. These mappings should reveal the common latent variables, which characterize a given set of classes and simultaneously define a multi-class linear classifier that classifies the extracted latent common features. Such a set of mappings is obtained, building on the notion of the maximum margin matrix factorization, by minimizing a weighted sum of nuclear norms. Since the nuclear norm imposes rank constraints to the learnt mappings, the proposed method is referred to as low-rank semantic mappings (LRSMs). The performance of the LRSMs in music genre, mood, and multi-label classification is assessed by conducting extensive experiments on seven manually annotated benchmark datasets. The reported experimental results demonstrate the superiority of the LRSMs over the classifiers that are compared to. Furthermore, the best reported classification results are comparable with or slightly superior to those obtained by the state-of-the-art task-specific music classification methods

    Smartphone picture organization: a hierarchical approach

    Get PDF
    We live in a society where the large majority of the population has a camera-equipped smartphone. In addition, hard drives and cloud storage are getting cheaper and cheaper, leading to a tremendous growth in stored personal photos. Unlike photo collections captured by a digital camera, which typically are pre-processed by the user who organizes them into event-related folders, smartphone pictures are automatically stored in the cloud. As a consequence, photo collections captured by a smartphone are highly unstructured and because smartphones are ubiquitous, they present a larger variability compared to pictures captured by a digital camera. To solve the need of organizing large smartphone photo collections automatically, we propose here a new methodology for hierarchical photo organization into topics and topic-related categories. Our approach successfully estimates latent topics in the pictures by applying probabilistic Latent Semantic Analysis, and automatically assigns a name to each topic by relying on a lexical database. Topic-related categories are then estimated by using a set of topic-specific Convolutional Neuronal Networks. To validate our approach, we ensemble and make public a large dataset of more than 8,000 smartphone pictures from 40 persons. Experimental results demonstrate major user satisfaction with respect to state of the art solutions in terms of organization.Peer ReviewedPreprin
    • …
    corecore