241 research outputs found

    Classification of Overlapped Audio Events Based on AT, PLSA, and the Combination of Them

    Get PDF
    Audio event classification, as an important part of Computational Auditory Scene Analysis, has attracted much attention. Currently, the classification technology is mature enough to classify isolated audio events accurately, but for overlapped audio events, it performs much worse. While in real life, most audio documents would have certain percentage of overlaps, and so the overlap classification problem is an important part of audio classification. Nowadays, the work on overlapped audio event classification is still scarce, and most existing overlap classification systems can only recognize one audio event for an overlap. In this paper, in order to deal with overlaps, we innovatively introduce the author-topic (AT) model which was first proposed for text analysis into audio classification, and innovatively combine it with PLSA (Probabilistic Latent Semantic Analysis). We propose 4 systems, i.e. AT, PLSA, AT-PLSA and PLSA-AT, to classify overlaps. The 4 proposed systems have the ability to recognize two or more audio events for an overlap. The experimental results show that the 4 systems perform well in classifying overlapped audio events, whether it is the overlap in training set or the overlap out of training set. Also they perform well in classifying isolated audio events

    Smartphone picture organization: a hierarchical approach

    Get PDF
    We live in a society where the large majority of the population has a camera-equipped smartphone. In addition, hard drives and cloud storage are getting cheaper and cheaper, leading to a tremendous growth in stored personal photos. Unlike photo collections captured by a digital camera, which typically are pre-processed by the user who organizes them into event-related folders, smartphone pictures are automatically stored in the cloud. As a consequence, photo collections captured by a smartphone are highly unstructured and because smartphones are ubiquitous, they present a larger variability compared to pictures captured by a digital camera. To solve the need of organizing large smartphone photo collections automatically, we propose here a new methodology for hierarchical photo organization into topics and topic-related categories. Our approach successfully estimates latent topics in the pictures by applying probabilistic Latent Semantic Analysis, and automatically assigns a name to each topic by relying on a lexical database. Topic-related categories are then estimated by using a set of topic-specific Convolutional Neuronal Networks. To validate our approach, we ensemble and make public a large dataset of more than 8,000 smartphone pictures from 40 persons. Experimental results demonstrate major user satisfaction with respect to state of the art solutions in terms of organization.Peer ReviewedPreprin

    Enhanced Random Forest with Image/Patch-Level Learning for Image Understanding

    Full text link
    Image understanding is an important research domain in the computer vision due to its wide real-world applications. For an image understanding framework that uses the Bag-of-Words model representation, the visual codebook is an essential part. Random forest (RF) as a tree-structure discriminative codebook has been a popular choice. However, the performance of the RF can be degraded if the local patch labels are poorly assigned. In this paper, we tackle this problem by a novel way to update the RF codebook learning for a more discriminative codebook with the introduction of the soft class labels, estimated from the pLSA model based on a feedback scheme. The feedback scheme is performed on both the image and patch levels respectively, which is in contrast to the state- of-the-art RF codebook learning that focused on either image or patch level only. Experiments on 15-Scene and C-Pascal datasets had shown the effectiveness of the proposed method in image understanding task.Comment: Accepted in ICPR 2014 (Oral

    Incremental probabilistic Latent Semantic Analysis for video retrieval

    Get PDF
    Recent research trends in Content-based Video Retrieval have shown topic models as an effective tool to deal with the semantic gap challenge. In this scenario, this paper has a dual target: (1) it is aimed at studying how the use of different topic models (pLSA, LDA and FSTM) affects video retrieval performance; (2) a novel incremental topic model (IpLSA) is presented in order to cope with incremental scenarios in an effective and efficient way. A comprehensive comparison among these four topic models using two different retrieval systems and two reference benchmarking video databases is provided. Experiments revealed that pLSA is the best model in sparse conditions, LDA tend to outperform the rest of the models in a dense space and IpLSA is able to work properly in both cases

    Characterisation and adaptive learning in interactive video retrieval

    Get PDF
    El objetivo principal de esta tesis consiste en utilizar eficazmente los modelos de tópicos latentes para afrontar el problema de la recuperación automática de vídeo. Concretamente, se pretende mejorar tanto a nivel de eficiencia como a nivel de precisión el actual estado del arte en materia de los sitemas de recuperación automática de vídeo. En general, los modelos de tópicos latentes son un conjunto de herramientas estadísticas que permiten extraer los patrones generadores de una colección de datos. Tradicionalmente, este tipo de técnicas no han sido consideradas de gran utilidad para los sistemas de recuperación automática de vídeo debido a su alto coste computacional y a la propia complejidad del espacio de tópicos en el ámbito de la información visual.In this work, we are interested in the use of latent topics to overcome the current limitations in CBVR. Despite the potential of topic models to uncover the hidden structure of a collection, they have traditionally been unable to provide a competitive advantage in CBVR because of the high computational cost of their algorithms and the complexity of the latent space in the visual domain. Throughout this thesis we focus on designing new models and tools based on topic models to take advantage of the latent space in CBVR. Specifically, we have worked in four different areas within the retrieval process: vocabulary reduction, encoding, modelling and ranking, being our most important contributions related to both modelling and ranking

    An empirical study of inter-concept similarities in multimedia ontologies

    Get PDF
    Generic concept detection has been a widely studied topic in recent research on multimedia analysis and retrieval, but the issue of how to exploit the structure of a multimedia ontology as well as different inter-concept relations, has not received similar attention. In this paper, we present results from our empirical analysis of different types of similarity among semantic concepts in two multimedia ontologies, LSCOM-Lite and CDVP-206. The results show promise that the proposed methods may be helpful in providing insight into the existing inter-concept relations within an ontology and selecting the most facilitating set of concepts and hierarchical relations. Such an analysis as this can be utilized in various tasks such as building more reliable concept detectors and designing large-scale ontologies

    Semantic multimedia analysis using knowledge and context

    Get PDF
    PhDThe difficulty of semantic multimedia analysis can be attributed to the extended diversity in form and appearance exhibited by the majority of semantic concepts and the difficulty to express them using a finite number of patterns. In meeting this challenge there has been a scientific debate on whether the problem should be addressed from the perspective of using overwhelming amounts of training data to capture all possible instantiations of a concept, or from the perspective of using explicit knowledge about the concepts’ relations to infer their presence. In this thesis we address three problems of pattern recognition and propose solutions that combine the knowledge extracted implicitly from training data with the knowledge provided explicitly in structured form. First, we propose a BNs modeling approach that defines a conceptual space where both domain related evi- dence and evidence derived from content analysis can be jointly considered to support or disprove a hypothesis. The use of this space leads to sig- nificant gains in performance compared to analysis methods that can not handle combined knowledge. Then, we present an unsupervised method that exploits the collective nature of social media to automatically obtain large amounts of annotated image regions. By proving that the quality of the obtained samples can be almost as good as manually annotated images when working with large datasets, we significantly contribute towards scal- able object detection. Finally, we introduce a method that treats images, visual features and tags as the three observable variables of an aspect model and extracts a set of latent topics that incorporates the semantics of both visual and tag information space. By showing that the cross-modal depen- dencies of tagged images can be exploited to increase the semantic capacity of the resulting space, we advocate the use of all existing information facets in the semantic analysis of social media
    corecore