25 research outputs found

    ViTS: Video tagging system from massive web multimedia collections

    Get PDF
    The popularization of multimedia content on the Web has arised the need to automatically understand, index and retrieve it. In this paper we present ViTS, an automatic Video Tagging System which learns from videos, their web context and comments shared on social networks. ViTS analyses massive multimedia collections by Internet crawling, and maintains a knowledge base that updates in real time with no need of human supervision. As a result, each video is indexed with a rich set of labels and linked with other related contents. ViTS is an industrial product under exploitation with a vocabulary of over 2.5M concepts, capable of indexing more than 150k videos per month. We compare the quality and completeness of our tags with respect to the ones in the YouTube-8M dataset, and we show how ViTS enhances the semantic annotation of the videos with a larger number of labels (10.04 tags/video), with an accuracy of 80,87%.Postprint (published version

    Joint Inference in Weakly-Annotated Image Datasets via Dense Correspondence

    Get PDF
    We present a principled framework for inferring pixel labels in weakly-annotated image datasets. Most previous, example-based approaches to computer vision rely on a large corpus of densely labeled images. However, for large, modern image datasets, such labels are expensive to obtain and are often unavailable. We establish a large-scale graphical model spanning all labeled and unlabeled images, then solve it to infer pixel labels jointly for all images in the dataset while enforcing consistent annotations over similar visual patterns. This model requires significantly less labeled data and assists in resolving ambiguities by propagating inferred annotations from images with stronger local visual evidences to images with weaker local evidences. We apply our proposed framework to two computer vision problems, namely image annotation with semantic segmentation, and object discovery and co-segmentation (segmenting multiple images containing a common object). Extensive numerical evaluations and comparisons show that our method consistently outperforms the state-of-the-art in automatic annotation and semantic labeling, while requiring significantly less labeled data. In contrast to previous co-segmentation techniques, our method manages to discover and segment objects well even in the presence of substantial amounts of noise images (images not containing the common object), as typical for datasets collected from Internet search

    Overview of the ImageCLEF 2014 Scalable Concept Image Annotation Task

    Full text link
    [EN] The ImageCLEF 2014 Scalable Concept Image Annotation task was the third edition of a challenge aimed at developing more scalable image annotation systems. Unlike traditional image annotation challenges, which rely on a set of manually annotated images as training data, the participants were only allowed to use data and/or resources that as new concepts to detect are introduced do not require significant human effort (such as hand labeling). The participants were provided with web data consisting of 500,000 images, which included textual features obtained from the web pages on which the images appeared, as well as various visual features extracted from the images themselves. To optimize their systems, the participants were provided with a development set of 1,940 samples and its corresponding hand labeled ground truth for 107 concepts. The performance of the submissions was measured using a test set of 7,291 samples which was hand labeled for 207 concepts among which 100 were new concepts unseen during development. In total 11 teams participated in the task submitting overall 58 system runs. Thanks to the larger amount of unseen concepts in the results the generalization of the systems has been more clearly observed and thus demonstrating the potential for scalability.The authors are very grateful with the CLEF initiative for supporting Image CLEF.The research leading to these results has received funding from the European Union’s Seventh Framework Programme (FP7/2007-2013) under the tranScriptorium project (#600707) and from the Spanish MEC under the STraDA project (TIN2012-37475-C02-01).Villegas Santamaría, M.; Paredes Palacios, R. (2014). Overview of the ImageCLEF 2014 Scalable Concept Image Annotation Task. CEUR Workshop Proceedings. 1180:308-328. http://hdl.handle.net/10251/61152S308328118

    Information fusion in content based image retrieval: A comprehensive overview

    Get PDF
    An ever increasing part of communication between persons involve the use of pictures, due to the cheap availability of powerful cameras on smartphones, and the cheap availability of storage space. The rising popularity of social networking applications such as Facebook, Twitter, Instagram, and of instant messaging applications, such as WhatsApp, WeChat, is the clear evidence of this phenomenon, due to the opportunity of sharing in real-time a pictorial representation of the context each individual is living in. The media rapidly exploited this phenomenon, using the same channel, either to publish their reports, or to gather additional information on an event through the community of users. While the real-time use of images is managed through metadata associated with the image (i.e., the timestamp, the geolocation, tags, etc.), their retrieval from an archive might be far from trivial, as an image bears a rich semantic content that goes beyond the description provided by its metadata. It turns out that after more than 20 years of research on Content-Based Image Retrieval (CBIR), the giant increase in the number and variety of images available in digital format is challenging the research community. It is quite easy to see that any approach aiming at facing such challenges must rely on different image representations that need to be conveniently fused in order to adapt to the subjectivity of image semantics. This paper offers a journey through the main information fusion ingredients that a recipe for the design of a CBIR system should include to meet the demanding needs of users

    Large-scale interactive exploratory visual search

    Get PDF
    Large scale visual search has been one of the challenging issues in the era of big data. It demands techniques that are not only highly effective and efficient but also allow users conveniently express their information needs and refine their intents. In this thesis, we focus on developing an exploratory framework for large scale visual search. We also develop a number of enabling techniques in this thesis, including compact visual content representation for scalable search, near duplicate video shot detection, and action based event detection. We propose a novel scheme for extremely low bit rate visual search, which sends compressed visual words consisting of vocabulary tree histogram and descriptor orientations rather than descriptors. Compact representation of video data is achieved through identifying keyframes of a video which can also help users comprehend visual content efficiently. We propose a novel Bag-of-Importance model for static video summarization. Near duplicate detection is one of the key issues for large scale visual search, since there exist a large number nearly identical images and videos. We propose an improved near-duplicate video shot detection approach for more effective shot representation. Event detection has been one of the solutions for bridging the semantic gap in visual search. We particular focus on human action centred event detection. We propose an enhanced sparse coding scheme to model human actions. Our proposed approach is able to significantly reduce computational cost while achieving recognition accuracy highly comparable to the state-of-the-art methods. At last, we propose an integrated solution for addressing the prime challenges raised from large-scale interactive visual search. The proposed system is also one of the first attempts for exploratory visual search. It provides users more robust results to satisfy their exploring experiences

    大規模な説明文つき画像を用いたキーフレーズ推定に基づく画像説明文の自動生成

    Get PDF
    学位の種別:課程博士University of Tokyo(東京大学

    Techniques For Boosting The Performance In Content-based Image Retrieval Systems

    Get PDF
    Content-Based Image Retrieval has been an active research area for decades. In a CBIR system, one or more images are used as query to search for similar images. The similarity is measured on the low level features, such as color, shape, edge, texture. First, each image is processed and visual features are extract. Therefore each image becomes a point in the feature space. Then, if two images are close to each other in the feature space, they are considered similar. That is, the k nearest neighbors are considered the most similar images to the query image. In this K-Nearest Neighbor (k-NN) model, semantically similar images are assumed to be clustered together in a single neighborhood in the high-dimensional feature space. Unfortunately semantically similar images with different appearances are often clustered into distinct neighborhoods, which might scatter in the feature space. Hence, confinement of the search results to a single neighborhood is the latent reason of the low recall rate of typical nearest neighbor techniques. In this dissertation, a new image retrieval technique - the Query Decomposition (QD) model is introduced. QD facilitates retrieval of semantically similar images from multiple neighborhoods in the feature space and hence bridges the semantic gap between the images’ low-level feature and the high-level semantic meaning. In the QD model, a query may be decomposed into multiple subqueries based on the user’s relevance feedback to cover multiple image clusters which contain semantically similar images. The retrieval results are the k most similar images from multiple discontinuous relevant clusters. To apply the benifit from QD study, a mobile client-side relevance feedback study was conducted. With the proliferation of handheld devices, the demand of multimedia information retrieval on mobile devices has attracted more attention. A relevance feedback information retrieval process usually includes several rounds of query refinement. Each round incurs exchange of tens of images between the mobile device and the server. With limited wireless bandwidth, this process can incur substantial delay making the system unfriendly iii to use. The Relevance Feedback Support (RFS) structure that was designed in QD technique was adopted for Client-side Relevance Feedback (CRF). Since relevance feedback is done on client side, system response is instantaneous significantly enhancing system usability. Furthermore, since the server is not involved in relevance feedback processing, it is able to support thousands more users simultaneously. As the QD technique improves on the accuracy of CBIR systems, another study, which is called In-Memory relevance feedback is studied in this dissertation. In the study, we improved the efficiency of the CBIR systems. Current methods rely on searching the database, stored on disks, in each round of relevance feedback. This strategy incurs long delay making relevance feedback less friendly to the user, especially for very large databases. Thus, scalability is a limitation of existing solutions. The proposed in-memory relevance feedback technique substantially reduce the delay associated with feedback processing, and therefore improve system usability. A data-independent dimensionality-reduction technique is used to compress the metadata to build a small in-memory database to support relevance feedback operations with minimal disk accesses. The performance of this approach is compared with conventional relevance feedback techniques in terms of computation efficiency and retrieval accuracy. The results indicate that the new technique substantially reduces response time for user feedback while maintaining the quality of the retrieval. In the previous studies, the QD technique relies on a pre-defined Relevance Support Support structure. As the result and user experience indicated that the structure might confine the search range and affect the result. In this dissertation, a novel Multiple Direction Search framework for semi-automatic annotation propagation is studied. In this system, the user interacts with the system to provide example images and the corresponding annotations during the annotation propagation process. In each iteration, the example images are dynamically clustered and the corresponding annotations are propagated separately to each cluster: images in the local neighborhood are annotated. Furthermore, some of those images are returned to the user for further annotation. As the user marks more images, iv the annotation process goes into multiple directions in the feature space. The query movements can be treated as multiple path navigation. Each path could be further split based on the user’s input. In this manner, the system provides accurate annotation assistance to the user - images with the same semantic meaning but different visual characteristics can be handled effectively. From comprehensive experiments on Corel and U. of Washington image databases, the proposed technique shows accuracy and efficiency on annotating image databases

    Fall 2021 Supplement to Brauneis & Schechter, Copyright: A Contemporary Approach

    Get PDF
    This Fall 2021 Supplement is the product of our effort to capture important developments in copyright law since the publication of the second edition of Copyright: A Contemporary Approach. It includes two new principal cases, both Supreme Court decisions: the 2021 fair use decision in Google LLC v. Oracle America, Inc., and the 2020 decision about copyright protection for state statutes in Georgia v. Public.Resources.Org. The supplement also includes notes on many other cases, and a few new features that we thought would enhance study of U.S. copyright law. In light of the passage of the Music Modernization Act in October 2018, we have completely revised Chapter 12.E., on digital audio transmission rights, and Chapter 12.F., on rights in pre-1972 sound recordings. The new Chapter 12.E. in this supplement, “Digital Streaming of Music After the Musical Works Modernization Act,” now consists of a general introduction to copyright and the streaming of music, covering both rights in sound recordings and rights in musical works, and all of the relevant exclusive rights

    Fall 2023 Supplement to Brauneis & Schechter, Copyright: A Contemporary Approach

    Get PDF
    This Fall 2023 Supplement is the product of our effort to capture important developments in copyright law since the publication of the second edition of Copyright: A Contemporary Approach. It includes three Supreme Court decisions as principal cases: the fair use cases of Google LLC v. Oracle America, Inc. (p. 23) and Andy Warhol Foundation v. Goldsmith (p. 41) and the 2020 decision about copyright protection for state statutes, Georgia v. Public.Resources.Org (p. 74).. (Because there are now so many Supreme Court fair use cases to cover, this supplement also includes a note on Harper & Row, Publishers v. Nation Enterprises (pp. 13-14), as an option to replace its treatment as a principal case in the second edition of the casebook. The supplement also includes notes on many other cases, and a few new features that we thought would enhance study of U.S. copyright law. It includes new material on copyright and artificial intelligence, both on the issue of AI authorship, (see the new notes on page 7-9), and the issue of infringement and fair use in training generative AI models (see the new feature on p. 21). Because the Copyright Claims Board (“CCB”) opened up its doors for business in June 2022, we have included a new section at the end of Chapter 6 on the CASE Act and CCB proceedings (p. 67). We have also completely revised Chapter 12.E., on digital audio transmission rights, and Chapter 12.F., on rights in pre-1972 sound recordings. The new Chapter 12.E. in this supplement, “Digital Streaming of Music After the Musical Works Modernization Act” (p. 101), now consists of a general introduction to copyright and the streaming of music, covering both rights in sound recordings and rights in musical works, and all of the relevant exclusive rights

    Copyright: A Contemporary Approach

    Get PDF
    This Fall 2022 Supplement is the product of our effort to capture important developments in copyright law since the publication of the second edition of Copyright: A Contemporary Approach. It includes three new principal cases. The first two are Supreme Court decisions: the 2021 fair use decision in Google LLC v. Oracle America, Inc. (p. 18), and the 2020 decision about copyright protection for state statutes in Georgia v. Public.Resources.Org (p. 58).. The third is an excerpt from the Second Circuit’s fair use decision in Andy Warhol Foundation v. Goldsmith (p.37), a decision that the Supreme Court has decided to review, with oral argument scheduled for October 12, 2022. The portion of this opinion on “transformativeness” is likely making a one-time appearance in the supplement, to be replaced by the Supreme Court decision when it is issued, but we thought some folks would like to teach the Goldsmith case in the fall as the Supreme Court is considering it.The supplement also includes notes on many other cases, and a few new features that we thought would enhance study of U.S. copyright law. Because the Copyright Claims Board (“CCB”) opened up its doors for business this June, we have included a new section at the end of Chapter 6 on the CASE Act and CCB proceedings (p. 50). We have also completely revised Chapter 12.E., on digital audio transmission rights, and Chapter 12.F., on rights in pre-1972 sound recordings. The new Chapter 12.E. in this supplement, “Digital Streaming of Music After the Musical Works Modernization Act” (p. 84), now consists of a general introduction to copyright and the streaming of music, covering both rights in sound recordings and rights in musical works, and all of the relevant exclusive rights
    corecore