1,005 research outputs found
Socializing the Semantic Gap: A Comparative Survey on Image Tag Assignment, Refinement and Retrieval
Where previous reviews on content-based image retrieval emphasize on what can
be seen in an image to bridge the semantic gap, this survey considers what
people tag about an image. A comprehensive treatise of three closely linked
problems, i.e., image tag assignment, refinement, and tag-based image retrieval
is presented. While existing works vary in terms of their targeted tasks and
methodology, they rely on the key functionality of tag relevance, i.e.
estimating the relevance of a specific tag with respect to the visual content
of a given image and its social context. By analyzing what information a
specific method exploits to construct its tag relevance function and how such
information is exploited, this paper introduces a taxonomy to structure the
growing literature, understand the ingredients of the main works, clarify their
connections and difference, and recognize their merits and limitations. For a
head-to-head comparison between the state-of-the-art, a new experimental
protocol is presented, with training sets containing 10k, 100k and 1m images
and an evaluation on three test sets, contributed by various research groups.
Eleven representative works are implemented and evaluated. Putting all this
together, the survey aims to provide an overview of the past and foster
progress for the near future.Comment: to appear in ACM Computing Survey
Exploratory Browsing
In recent years the digital media has influenced many areas of our life. The transition from analogue to digital has substantially changed our ways of dealing with media collections. Today‟s interfaces for managing digital media mainly offer fixed linear models corresponding to the underlying technical concepts (folders, events, albums, etc.), or the metaphors borrowed from the analogue counterparts (e.g., stacks, film rolls). However, people‟s mental interpretations of their media collections often go beyond the scope of linear scan. Besides explicit search with specific goals, current interfaces can not sufficiently support the explorative and often non-linear behavior. This dissertation presents an exploration of interface design to enhance the browsing experience with media collections. The main outcome of this thesis is a new model of Exploratory Browsing to guide the design of interfaces to support the full range of browsing activities, especially the Exploratory Browsing.
We define Exploratory Browsing as the behavior when the user is uncertain about her or his targets and needs to discover areas of interest (exploratory), in which she or he can explore in detail and possibly find some acceptable items (browsing). According to the browsing objectives, we group browsing activities into three categories: Search Browsing, General Purpose Browsing and Serendipitous Browsing. In the context of this thesis, Exploratory Browsing refers to the latter two browsing activities, which goes beyond explicit search with specific objectives.
We systematically explore the design space of interfaces to support the Exploratory Browsing experience. Applying the methodology of User-Centered Design, we develop eight prototypes, covering two main usage contexts of browsing with personal collections and in online communities.
The main studied media types are photographs and music.
The main contribution of this thesis lies in deepening the understanding of how people‟s exploratory behavior has an impact on the interface design. This thesis contributes to the field of interface design for media collections in several aspects. With the goal to inform the interface design to support the Exploratory Browsing experience with media collections, we present a model of Exploratory Browsing, covering the full range of exploratory activities around media collections. We investigate this model in different usage contexts and develop eight prototypes. The substantial implications gathered during the development and evaluation of these prototypes inform the further refinement of our model: We uncover the underlying transitional relations between browsing activities and discover several stimulators to encourage a fluid and effective activity transition. Based on this model, we propose a catalogue of general interface characteristics, and employ this catalogue as criteria to analyze the effectiveness of our prototypes. We also present several general suggestions for designing interfaces for media collections
Image Understanding by Socializing the Semantic Gap
Several technological developments like the Internet, mobile devices and Social Networks have spurred the sharing of images in unprecedented volumes, making tagging and commenting a common habit. Despite the recent progress in image analysis, the problem of Semantic Gap still hinders machines in fully understand the rich semantic of a shared photo. In this book, we tackle this problem by exploiting social network contributions. A comprehensive treatise of three linked problems on image annotation is presented, with a novel experimental protocol used to test eleven state-of-the-art methods. Three novel approaches to annotate, under stand the sentiment and predict the popularity of an image are presented. We conclude with the many challenges and opportunities ahead for the multimedia community
Textual Query Based Image Retrieval
As digital cameras becoming popular and mobile phones are increased very fast so that consumers photos are increased. So that retrieving the appropriate image depending on content or text based image retrieval techniques has become very vast. Content-based image retrieval, a technique which uses visual contents to search images from large scale image databases according to users interests, has been an active and fast advancing research area semantic gap between the low-level visual features and the high-level semantic concepts. Real-time textual query-based personal photo retrieval system by leveraging millions of Web images and their associated rich textual descriptions. Then user provides a textual query. Our system generates the inverted file to automatically find the positive Web images that are related to the textual query as well as the negative Web images that are irrelevant to the textual query. For that purpose we use k-Nearest Neighbor (kNN), Decision stumps, and linear SVM, to rank personal photos. For improvement of the photo retrieval performance, we have used two relevance feedback methods via cross-domain learning, which effectively utilize both the Web images and personal images.
DOI: 10.17762/ijritcc2321-8169.15032
Recommended from our members
MC2: MPEG-7 content modelling communities
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel UniversityThe use of multimedia content on the web has grown significantly in recent years. Websites such as Facebook, YouTube and Flickr cater for enormous amounts of multimedia content uploaded by users. This vast amount of multimedia content requires comprehensive content modelling otherwise
retrieving relevant content will be challenging. Modelling multimedia content can be an extremely time consuming task that may seem impossible particularly when undertaken by individual users. However, the advent of Web 2.0 and associated communities, such as YouTube and Flickr, has
shown that users appear to be more willing to collaborate in order to take on enormous tasks such as multimedia content modelling. Harnessing the power of communities to achieve comprehensive content modelling is the primary focus of this research.
The aim of this thesis is to explore collaborative multimedia content modelling and in particular the effectiveness of existing multimedia content modelling tools, taking into account the key development challenges of existing collaborative content modelling research and the associated
modelling tools. Four research objectives are pursued in order to achieve this; first, design a user experiment to study users’ tagging behaviour with existing multimedia tagging tools and identify any relationships between such user behaviour; second, design and develop a framework for MPEG-7 content modelling communities based on the results of the experiment; third, implement an online
service as a proof of concept of the framework; fourth, validate the framework through the online service during a repeat of the initial user experiment.
This research contributes first, a conceptual model of user behaviour visualised as a fuzzy cognitive
map and, second, an MPEG-7 framework for multimedia content modelling communities (MC2) and its proof of concept as an online service. The fuzzy cognitive model embodies relationships between user tagging behaviour and context and provides an understanding of user priorities in the description of content features and the relationships that exist between them. The MC2 framework,
developed based on the fuzzy cognitive model, is deep-rooted in user content modelling behaviour and content preferences. A proof of concept of the MC2 framework is implemented as an online service in which all metadata is modelled using MPEG-7. The online service is validated, first, empirically with the same group of users and through the same experiment that led to the development of the fuzzy cognitive model and, second, functionally against the folksonomy and MPEG-7 content modelling tools used in the initial experiment. The validation demonstrates that MC2 has the advantages without the shortcomings of existing multimedia tagging tools by harnessing the ease of use of folksonomy tools while producing comprehensive structured metadata.Supported by UK Engineering and Physical Sciences Research Council (EPSRC
Georeferencing flickr resources based on textual meta-data
The task of automatically estimating the location of web resources is of central importance in location-based services on the Web. Much attention has been focused on Flickr photos and videos, for which it was found that language modeling approaches are particularly suitable. In particular, state-of-the art systems for georeferencing Flickr photos tend to cluster the locations on Earth in a relatively small set of disjoint regions, apply feature selection to identify location-relevant tags, then use a form of text classification to identify which area is most likely to contain the true location of the resource, and finally attempt to find an appropriate location within the identified area. In this paper, we present a systematic discussion of each of the aforementioned components, based on the lessons we have learned from participating in the 2010 and 2011 editions of MediaEval’s Placing Task. Extensive experimental results allow us to analyze why certain methods work well on this task and show that a median error of just over 1 km can be achieved on a standard benchmark test set
Semantically-enhanced image tagging system
In multimedia databases, data are images, audio, video, texts, etc. Research interests in these types of databases have increased in the last decade or so, especially with the advent of the Internet and Semantic Web. Fundamental research issues vary from unified data modelling, retrieval of data items and dynamic nature of updates.
The thesis builds on findings in Semantic Web and retrieval techniques and explores novel tagging methods for identifying data items. Tagging systems have become popular which enable the users to add tags to Internet resources such as images, video and audio to make them more manageable. Collaborative tagging is concerned with the relationship between people and resources.
Most of these resources have metadata in machine processable format and enable users to use free- text keywords (so-called tags) as search techniques. This research references some tagging systems, e.g. Flicker, delicious and myweb2.0. The limitation with such techniques includes polysemy (one word and different meaning), synonymy (different words and one meaning), different lexical forms (singular, plural, and conjugated words) and misspelling errors or alternate spellings. The work presented in this thesis introduces semantic characterization of web resources that describes the structure and organization of tagging, aiming to extend the existing Multimedia Query using similarity measures to cater for collaborative tagging. In addition, we discuss the semantic difficulties of tagging systems, suggesting improvements in their accuracies.
The scope of our work is classified as follows:
(i) Increase the accuracy and confidence of multimedia tagging systems.
(ii) Increase the similarity measures of images by integrating varieties of measures.
To address the first shortcoming, we use the WordNet based on a tagging system for social sharing and retrieval of images as a semantic lingual ontology resource. For the second shortcoming we use the similarity measures in different ways to recognise the multimedia tagging system.
Fundamental to our work is the novel information model that we have constructed for our computation. This is based on the fact that an image is a rich object that can be characterised and formulated in n-dimensions, each dimension contains valuable information that will help in increasing the accuracy of the search. For example an image of a tree in a forest contains more information than an image of the same tree but in a different environment.
In this thesis we characterise a data item (an image) by a primary description, followed by n-secondary descriptions. As n increases, the accuracy of the search improves. We give various techniques to analyse data and its associated query.
To increase the accuracy of the tagging system we have performed different experiments on many images using similarity measures and various techniques from VoI (Value of Information).
The findings have shown the linkage/integration between similarity measures and that VoI improves searches and helps/guides a tagger in choosing the most adequate of tags
- …