34 research outputs found
VISIR : visual and semantic image label refinement
The social media explosion has populated the Internet with a wealth of images. There are two existing paradigms for image retrieval: 1) content-based image retrieval (CBIR), which has traditionally used visual features for similarity search (e.g., SIFT features), and 2) tag-based image retrieval (TBIR), which has relied on user tagging (e.g., Flickr tags). CBIR now gains semantic expressiveness by advances in deep-learning-based detection of visual labels. TBIR benefits from query-and-click logs to automatically infer more informative labels. However, learning-based tagging still yields noisy labels and is restricted to concrete objects, missing out on generalizations and abstractions. Click-based tagging is limited to terms that appear in the textual context of an image or in queries that lead to a click. This paper addresses the above limitations by semantically refining and expanding the labels suggested by learning-based object detection. We consider the semantic coherence between the labels for different objects, leverage lexical and commonsense knowledge, and cast the label assignment into a constrained optimization problem solved by an integer linear program. Experiments show that our method, called VISIR, improves the quality of the state-of-the-art visual labeling tools like LSDA and YOLO
Socializing the Semantic Gap: A Comparative Survey on Image Tag Assignment, Refinement and Retrieval
Where previous reviews on content-based image retrieval emphasize on what can
be seen in an image to bridge the semantic gap, this survey considers what
people tag about an image. A comprehensive treatise of three closely linked
problems, i.e., image tag assignment, refinement, and tag-based image retrieval
is presented. While existing works vary in terms of their targeted tasks and
methodology, they rely on the key functionality of tag relevance, i.e.
estimating the relevance of a specific tag with respect to the visual content
of a given image and its social context. By analyzing what information a
specific method exploits to construct its tag relevance function and how such
information is exploited, this paper introduces a taxonomy to structure the
growing literature, understand the ingredients of the main works, clarify their
connections and difference, and recognize their merits and limitations. For a
head-to-head comparison between the state-of-the-art, a new experimental
protocol is presented, with training sets containing 10k, 100k and 1m images
and an evaluation on three test sets, contributed by various research groups.
Eleven representative works are implemented and evaluated. Putting all this
together, the survey aims to provide an overview of the past and foster
progress for the near future.Comment: to appear in ACM Computing Survey
Less is MORE: a MultimOdal system for tag REfinement
With the proliferation of image-based social media, an ex-tremely large amount of multimodal data is being produced. Very oftenimage contents are published together with a set of user defined meta-data such as tags and textual descriptions. Despite being very useful toenhance traditional image retrieval, user defined tags on social mediahave been proven to be noneffective to index images because they areinfluenced by personal experiences of the owners as well as their will ofpromoting the published contents. To be analyzed and indexed, multi-modal data require algorithms able to jointly deal with textual and visualdata. This research presents a multimodal approach to the problem of tagrefinement, which consists in separating the relevant descriptors (tags)of images from noisy ones. The proposed method exploits both Natu-ral Language Processing (NLP) and Computer Vision (CV) techniquesbased on deep learning to find a match between the textual informationand visual content of social media posts. Textual semantic features arerepresented with (multilingual) word embeddings, while visual ones areobtained with image classification. The proposed system is evaluated ona manually annotated Italian dataset extracted from Instagram achieving68% of weighted F1-scor
Learning to recommend descriptive tags for questions in social forums
10.1145/2559157ACM Transactions on Information Systems321-ATIS
Image Understanding by Socializing the Semantic Gap
Several technological developments like the Internet, mobile devices and Social Networks have spurred the sharing of images in unprecedented volumes, making tagging and commenting a common habit. Despite the recent progress in image analysis, the problem of Semantic Gap still hinders machines in fully understand the rich semantic of a shared photo. In this book, we tackle this problem by exploiting social network contributions. A comprehensive treatise of three linked problems on image annotation is presented, with a novel experimental protocol used to test eleven state-of-the-art methods. Three novel approaches to annotate, under stand the sentiment and predict the popularity of an image are presented. We conclude with the many challenges and opportunities ahead for the multimedia community