42,587 research outputs found

    Content-based image retrieval and its benefits for the stock photography market

    Get PDF
    The development of powerful low-cost desktop computer systems has changed the pre-press business where tight deadlines must be met per sistently. An increasing number of newspapers and magazines are acquiring, handling, and storing images digitally while the use of hardcopies and slides decreases. Today\u27s computers and high capacity storage-media enable stock pho tography agencies to build digital image databases, giving users fast access to large numbers of images. However, the transition from analog to digital image archives imposes new problems: with thousands of images at hand, the search for a particular image may turn into the search for the needle in a haystack. The first image Database Management Systems (DBMSs) were extended text DBMSs, which stored the image data along with a set of manually entered descriptive keywords. The major problem with this approach is that there is no generally agreed-upon language to describe images. Even sophis ticated DBMSs are unable to detect synonyms; hence, an image described with certain properties such as curvy may not be found if a user enters wavy as a search criterion. Furthermore, some image properties are hard to describe with keywords. A search is likely to fail if properties were not described at the database population stage when images are added to the database. Finally, assigning a sufficient set of keywords to every image adds a tremendous amount of labor to the population stage. Research at many scientific institutions and companies is geared towards overcoming the shortcomings of image DBMSs with keyword-based search engines. Pattern recognition which allows for comparing images based on their visual content is being introduced to image DBMSs, improving the accuracy of search engines. Sketches, sample images, and other means of describing the visual content of images may be used as search criteria in addition to keywords. This thesis project summarizes the basics of pattern recognition and its applications in image database management for contentbased image retrieval. The purpose of this thesis project is to determine the impact of contentbased image retrieval on the stock photography market in the near future. In order to obtain the necessary information, two different questionnaires were sent out to a number of selected stock photography agencies, newspapers, and magazines. The evaluation of the replies was conducted for the three groups separately. The replies from stock photography agencies showed a high interest in digital image archives. They also showed concerns about increased overhead with digital archives. The estimated amount of work required for categoriz ing images and assigning keywords ranged from fifty to ninety percent as compared to ten to fifty percent for scanning. All survey participants agreed that pattern recognition can improve the accuracy of keyword-based search engines. However, they all denied that this approach would reduce the need for assigning keywords. Different needs could be determined for newspaper and magazines. Newspapers rely heavily on keywords since images are often chosen based upon the circumstances under which they were taken while their visual con tent may be secondary. Therefore, newspapers\u27 profits from content-based image retrieval are minute. For magazines, the visual content of images seemed to have a higher priority and the appreciation for corresponding search capabilities was accordingly higher. To summarize, users of digital image archives can profit from contentbased image retrieval if the visual content is an important issue. For image providers, there are a number of reasons that delay the transition to contentbased image retrieval. Currently, there is only one shrink-wrapped commer cial product available that meets the needs of stock photography agencies. This product requires additional work for fully exhausting its capabilities. Finally, many companies have already built their image database and the transition to another system is time-consuming, expensive, and risky

    TRECVid 2006 experiments at Dublin City University

    Get PDF
    In this paper we describe our retrieval system and experiments performed for the automatic search task in TRECVid 2006. We submitted the following six automatic runs: • F A 1 DCU-Base 6: Baseline run using only ASR/MT text features. • F A 2 DCU-TextVisual 2: Run using text and visual features. • F A 2 DCU-TextVisMotion 5: Run using text, visual, and motion features. • F B 2 DCU-Visual-LSCOM 3: Text and visual features combined with concept detectors. • F B 2 DCU-LSCOM-Filters 4: Text, visual, and motion features with concept detectors. • F B 2 DCU-LSCOM-2 1: Text, visual, motion, and concept detectors with negative concepts. The experiments were designed both to study the addition of motion features and separately constructed models for semantic concepts, to runs using only textual and visual features, as well as to establish a baseline for the manually-assisted search runs performed within the collaborative K-Space project and described in the corresponding TRECVid 2006 notebook paper. The results of the experiments indicate that the performance of automatic search can be improved with suitable concept models. This, however, is very topic-dependent and the questions of when to include such models and which concept models should be included, remain unanswered. Secondly, using motion features did not lead to performance improvement in our experiments. Finally, it was observed that our text features, despite displaying a rather poor performance overall, may still be useful even for generic search topics

    Beyond English text: Multilingual and multimedia information retrieval.

    Get PDF
    Non

    Content-based access to digital video: the FĂ­schlĂĄr system and the TREC video track

    Get PDF
    This short paper presents an overview of the FĂ­schlĂĄr system - an operational digital library of several hundred hours of video content at Dublin City University which is used by over 1,000 users daily, for a variety of applications. The paper describes how FĂ­schlĂĄr operates and the services that it provides for users. Following that, the second part of the paper gives an outline of the TREC Video Retrieval track, a benchmarking exercise for information retrieval from video content currently in operation, summarising the operational details of how the benchmarking exercise is operating

    Deep Cross-Modal Correlation Learning for Audio and Lyrics in Music Retrieval

    Get PDF
    Deep cross-modal learning has successfully demonstrated excellent performance in cross-modal multimedia retrieval, with the aim of learning joint representations between different data modalities. Unfortunately, little research focuses on cross-modal correlation learning where temporal structures of different data modalities such as audio and lyrics should be taken into account. Stemming from the characteristic of temporal structures of music in nature, we are motivated to learn the deep sequential correlation between audio and lyrics. In this work, we propose a deep cross-modal correlation learning architecture involving two-branch deep neural networks for audio modality and text modality (lyrics). Data in different modalities are converted to the same canonical space where inter modal canonical correlation analysis is utilized as an objective function to calculate the similarity of temporal structures. This is the first study that uses deep architectures for learning the temporal correlation between audio and lyrics. A pre-trained Doc2Vec model followed by fully-connected layers is used to represent lyrics. Two significant contributions are made in the audio branch, as follows: i) We propose an end-to-end network to learn cross-modal correlation between audio and lyrics, where feature extraction and correlation learning are simultaneously performed and joint representation is learned by considering temporal structures. ii) As for feature extraction, we further represent an audio signal by a short sequence of local summaries (VGG16 features) and apply a recurrent neural network to compute a compact feature that better learns temporal structures of music audio. Experimental results, using audio to retrieve lyrics or using lyrics to retrieve audio, verify the effectiveness of the proposed deep correlation learning architectures in cross-modal music retrieval

    The TREC2001 video track: information retrieval on digital video information

    Get PDF
    The development of techniques to support content-based access to archives of digital video information has recently started to receive much attention from the research community. During 2001, the annual TREC activity, which has been benchmarking the performance of information retrieval techniques on a range of media for 10 years, included a ”track“ or activity which allowed investigation into approaches to support searching through a video library. This paper is not intended to provide a comprehensive picture of the different approaches taken by the TREC2001 video track participants but instead we give an overview of the TREC video search task and a thumbnail sketch of the approaches taken by different groups. The reason for writing this paper is to highlight the message from the TREC video track that there are now a variety of approaches available for searching and browsing through digital video archives, that these approaches do work, are scalable to larger archives and can yield useful retrieval performance for users. This has important implications in making digital libraries of video information attainable
    • …
    corecore