57,403 research outputs found

    Effective Graph-Based Content--Based Image Retrieval Systems for Large-Scale and Small-Scale Image Databases

    Get PDF
    This dissertation proposes two novel manifold graph-based ranking systems for Content-Based Image Retrieval (CBIR). The two proposed systems exploit the synergism between relevance feedback-based transductive short-term learning and semantic feature-based long-term learning to improve retrieval performance. Proposed systems first apply the active learning mechanism to construct users\u27 relevance feedback log and extract high-level semantic features for each image. These systems then create manifold graphs by incorporating both the low-level visual similarity and the high-level semantic similarity to achieve more meaningful structures for the image space. Finally, asymmetric relevance vectors are created to propagate relevance scores of labeled images to unlabeled images via manifold graphs. The extensive experimental results demonstrate two proposed systems outperform the other state-of-the-art CBIR systems in the context of both correct and erroneous users\u27 feedback

    Content-Based Image Retrieval using Deep Learning

    Get PDF
    A content-based image retrieval (CBIR) system works on the low-level visual features of a user input query image, which makes it difficult for the users to formulate the query and also does not give satisfactory retrieval results. In the past image annotation was proposed as the best possible system for CBIR which works on the principle of automatically assigning keywords to images that help image retrieval users to query images based on these keywords. Image annotation is often regarded as the problem of image classification where the images are represented by some low-level features and the mapping between low-level features and high-level concepts (class labels) is done by some supervised learning algorithms. In a CBIR system learning of effective feature representations and similarity measures is very important for the retrieval performance. Semantic gap has been the key challenge in the past for this problem. A semantic gap exists between low-level image pixels captured by machines and the high-level semantics perceived by humans. Machine learning has been exploited to bridge this gap in the long term. The recent successes of deep learning techniques especially Convolutional Neural Networks (CNN) in solving computer vision applications has inspired me to work on this thesis so as to solve the problem of CBIR using a dataset of annotated images

    Deep learning for content-based image retrieval: A comprehensive study

    Get PDF
    Learning effective feature representations and similarity measures are crucial to the retrieval performance of a content-based image retrieval (CBIR) system. Despite extensive research efforts for decades, it remains one of the most challenging open problems that considerably hinders the successes of real-world CBIR sys-tems. The key challenge has been attributed to the well-known “se-mantic gap ” issue that exists between low-level image pixels cap-tured by machines and high-level semantic concepts perceived by human. Among various techniques, machine learning has been ac-tively investigated as a possible direction to bridge the semantic gap in the long term. Inspired by recent successes of deep learning tech-niques for computer vision and other applications, in this paper, we attempt to address an open problem: if deep learning is a hope for bridging the semantic gap in CBIR and how much improvements i

    Adaptive image retrieval using a graph model for semantic feature integration

    Get PDF
    The variety of features available to represent multimedia data constitutes a rich pool of information. However, the plethora of data poses a challenge in terms of feature selection and integration for effective retrieval. Moreover, to further improve effectiveness, the retrieval model should ideally incorporate context-dependent feature representations to allow for retrieval on a higher semantic level. In this paper we present a retrieval model and learning framework for the purpose of interactive information retrieval. We describe how semantic relations between multimedia objects based on user interaction can be learnt and then integrated with visual and textual features into a unified framework. The framework models both feature similarities and semantic relations in a single graph. Querying in this model is implemented using the theory of random walks. In addition, we present ideas to implement short-term learning from relevance feedback. Systematic experimental results validate the effectiveness of the proposed approach for image retrieval. However, the model is not restricted to the image domain and could easily be employed for retrieving multimedia data (and even a combination of different domains, eg images, audio and text documents)

    Interactive retrieval of video using pre-computed shot-shot similarities

    Get PDF
    A probabilistic framework for content-based interactive video retrieval is described. The developed indexing of video fragments originates from the probability of the user's positive judgment about key-frames of video shots. Initial estimates of the probabilities are obtained from low-level feature representation. Only statistically significant estimates are picked out, the rest are replaced by an appropriate constant allowing efficient access at search time without loss of search quality and leading to improvement in most experiments. With time, these probability estimates are updated from the relevance judgment of users performing searches, resulting in further substantial increases in mean average precision

    EGO: a personalised multimedia management tool

    Get PDF
    The problems of Content-Based Image Retrieval (CBIR) sys- tems can be attributed to the semantic gap between the low-level data representation and the high-level concepts the user associates with images, on the one hand, and the time-varying and often vague nature of the underlying information need, on the other. These problems can be addressed by improving the interaction between the user and the system. In this paper, we sketch the development of CBIR interfaces, and introduce our view on how to solve some of the problems of the studied interfaces. To address the semantic gap and long-term multifaceted information needs, we propose a "retrieval in context" system. EGO is a tool for the management of image collections, supporting the user through personalisation and adaptation. We will describe how it learns from the user's personal organisation, allowing it to recommend relevant images to the user. The recommendation algorithm is detailed, which is based on relevance feedback techniques

    Deep Cross-Modal Correlation Learning for Audio and Lyrics in Music Retrieval

    Get PDF
    Deep cross-modal learning has successfully demonstrated excellent performance in cross-modal multimedia retrieval, with the aim of learning joint representations between different data modalities. Unfortunately, little research focuses on cross-modal correlation learning where temporal structures of different data modalities such as audio and lyrics should be taken into account. Stemming from the characteristic of temporal structures of music in nature, we are motivated to learn the deep sequential correlation between audio and lyrics. In this work, we propose a deep cross-modal correlation learning architecture involving two-branch deep neural networks for audio modality and text modality (lyrics). Data in different modalities are converted to the same canonical space where inter modal canonical correlation analysis is utilized as an objective function to calculate the similarity of temporal structures. This is the first study that uses deep architectures for learning the temporal correlation between audio and lyrics. A pre-trained Doc2Vec model followed by fully-connected layers is used to represent lyrics. Two significant contributions are made in the audio branch, as follows: i) We propose an end-to-end network to learn cross-modal correlation between audio and lyrics, where feature extraction and correlation learning are simultaneously performed and joint representation is learned by considering temporal structures. ii) As for feature extraction, we further represent an audio signal by a short sequence of local summaries (VGG16 features) and apply a recurrent neural network to compute a compact feature that better learns temporal structures of music audio. Experimental results, using audio to retrieve lyrics or using lyrics to retrieve audio, verify the effectiveness of the proposed deep correlation learning architectures in cross-modal music retrieval
    • …
    corecore