27 research outputs found

    Instagram hashtags as a source of semantic information for Automatic Image Annotation

    No full text
    Billion digital images are uploaded every single day on the Internet and especially on social media. It is vital to develop effective and efficient methods that allow the retrieval of those images according to users' demands. Among the approaches that have been proposed for digital image retrieval is Automatic Image Annotation (AIA). AIA techniques automatically learn the visual representation of semantic concepts from a number of image samples, and use these concept models for tagging new images. Learning good concept models requires representative pairs of image-tags. Manual annotation is a hard and time-consuming task since a large number of images are necessary to create effective concept models. Moreover, human judgment may contain errors and subjectivity. Therefore, it is highly desirable to find ways for automatically creating training examples, i.e., pairs of images and tags. Contemporary social media, such as Instagram, contain images and associated hashtags, providing a source of indirect annotation. Instagram is a photo-oriented social media platform where users upload images and describe them with hashtags; thus, it might be a rich source for automatically creating pairs of image-tags for AIA. The thesis focuses on investigating Instagram images and hashtags as a field for AIA purposes. This primary research question is further analyzed through several studies: we define the portion of Instagram hashtags that are related to the visual content of images they accompany and we develop a methodology to locate stophashtags, i.e., common non-descriptive hashtags. We also employ the HITS algorithm in a crowdsourcing environment in order to filter Instagram hashtags and locate the ones that correspond to the visual content of Instagram images they accompany. Topic modelling of Instagram hashtags is introduced as a means for retrieving Instagram images in the traditional text-based information retrieval approach while transfer learning, utilizing filtered Instagram data (pairs of images and hashtags) is applied for a content-based image retrieval scenario.Complete

    Defining and Identifying Stophashtags in Instagram

    No full text
    Instagram could be considered as a tagged image dataset since it is reach in tags -known as hashtags- accompanying photos and, in addition, the tags are provided by photo owners/creators, thus, express in higher accuracy the meaning/message of the photos. However, as we showed in a previous study, only 30 % of Instagram hashtags are related with the visual content of the accompanied photos while the remaining 70 % are either related with other meta-communicative functions of the photo owner/creator or they are simply noise and are used mainly to increase photo’s localization and searchability. In this study we call the latter category of Instagram hashtags as ‘stophashtags’, inspired from the term ‘stopwords’ which is used in the field of computational linguistics to refer to common and non-descriptive words found in almost every text document, and we provide a theoretical and empirical framework through which stophashtags can be identified. We show that, in contrary to descriptive hashtags, stophashtags are characterized by high normalized subject (hashtag) frequency on irrelevant subject categories while normalized image frequency is also high

    Filtering Instagram Hashtags through crowdtagging and the HITS algorithm

    No full text
    Instagram is a rich source for mining descriptive tags for images and multimedia in general. The tags-image pairs can be used to train automatic image annotation (AIA) systems in accordance with the learning by example paradigm. In previous studies, we had concluded that, on average, 20% of the Instagram hashtags are related to the actual visual content of the image they accompany, i.e., they are descriptive hashtags, while there are many irrelevant hashtags, i.e., stop-hashtags, that are used across totally different images just for gathering clicks and for searchability enhancement. In this paper, we present a novel methodology, based on the principles of collective intelligence that helps in locating those hashtags. In particular, we show that the application of a modified version of the well-known hyperlink-induced topic search (HITS) algorithm, in a crowdtagging context, provides an effective and consistent way for finding pairs of Instagram images and hashtags, which lead to representative and noise-free training sets for content-based image retrieval. As a proof of concept, we used the crowdsourcing platform Figure-eight to allow collective intelligence to be gathered in the form of tag selection (crowdtagging) for Instagram hashtags. The crowdtagging data of Figure-eight are used to form bipartite graphs in which the first type of nodes corresponds to the annotators and the second type to the hashtags they selected. The HITS algorithm is first used to rank the annotators in terms of their effectiveness in the crowdtagging task and then to identify the right hashtags per image

    Topic Identification of Instagram Hashtag Sets for Image Tagging: An Empirical Assessment

    No full text
    Images are an important part of collection items in any digital library. Mining information from social media networks, and especially the Instagram, for Image description has recently gained increased research interest. In the current study we extend previous work on the use of topic modelling for mining tags from Instagram hashtags for image content description. We examine whether the hashtags accompanying Instagram photos, collected via a common query hashtag (called ‘subject’ hereafter), vary in a statistically significant manner depending on the similarity of their visual content. In the experiment we use the topics mined from Instagram hashtags from a set of Instagram images corresponding to 26 different query hashtags and classified into two categories per subject, named as ‘relevant’ and ‘irrelevant’ depending on the similarity of their visual content. Two different set of users, namely trained students and generic crowd, assess the topics presented to them as word clouds. To invest whether there is significant difference between the word clouds of the images considered as visually relevant to the query subject compared to those considered visually irrelevant. At the same time we investigate whether the word cloud interpretations of trained students and generic crowd differ. The data collected through this empirical study are analyzed with use of independent samples t-test and Pearson rho. We conclude that the word clouds of the relevant Instagram images are much more easily interpretable by both the trained students and the crowd. The results also show some interesting variations across subjects which are analysed and discussed in detail throughout the paper. At the same time the interpretations of trained students and the generic crowd are highly correlated, denoting that no specific training is required to mine relevant tags from Instagram hashtags to describe the accompanied Instagram photos

    Topic Identification of Instagram Hashtag Sets for Image Tagging: An Empirical Assessment

    No full text
    Images are an important part of collection items in any digital library. Mining information from social media networks, and especially the Instagram, for Image description has recently gained increased research interest. In the current study we extend previous work on the use of topic modelling for mining tags from Instagram hashtags for image content description. We examine whether the hashtags accompanying Instagram photos, collected via a common query hashtag (called ‘subject’ hereafter), vary in a statistically significant manner depending on the similarity of their visual content. In the experiment we use the topics mined from Instagram hashtags from a set of Instagram images corresponding to 26 different query hashtags and classified into two categories per subject, named as ‘relevant’ and ‘irrelevant’ depending on the similarity of their visual content. Two different set of users, namely trained students and generic crowd, assess the topics presented to them as word clouds. To invest whether there is significant difference between the word clouds of the images considered as visually relevant to the query subject compared to those considered visually irrelevant. At the same time we investigate whether the word cloud interpretations of trained students and generic crowd differ. The data collected through this empirical study are analyzed with use of independent samples t-test and Pearson rho. We conclude that the word clouds of the relevant Instagram images are much more easily interpretable by both the trained students and the crowd. The results also show some interesting variations across subjects which are analysed and discussed in detail throughout the paper. At the same time the interpretations of trained students and the generic crowd are highly correlated, denoting that no specific training is required to mine relevant tags from Instagram hashtags to describe the accompanied Instagram photos

    Evaluating the descriptive power of Instagram hashtags

    Get PDF
    Image tagging is an essential step for developing Automatic Image Annotation (AIA) methods that are based on the learning by example paradigm. However, manual image annotation, even for creating training sets for machine learning algorithms, requires hard effort and contains human judgment errors and subjectivity. Thus, alternative ways for automatically creating training examples, i.e., pairs of images and tags, are pursued. In this work, we investigate whether tags accompanying photos in the Instagram can be considered as image annotation metadata. If such a claim is proved then Instagram could be used as a very rich, easy to collect automatically, source of training data for the development of AIA techniques. Our hypothesis is that Instagram hashtags, and especially those provided by the photo owner/creator, express more accurately the content of a photo compared to the tags assigned to a photo during explicit image annotation processes like crowdsourcing. In this context, we explore the descriptive power of hashtags by examining whether other users would use the same, with the owner, hashtags to annotate an image. For this purpose 1000 Instagram images were collected and one to four hashtags, considered as the most descriptive ones for the image in question, were chosen among the hashtags used by the photo owner. An online database was constructed to generate online questionnaires containing 20 images each, which were distributed to experiment participants so they can choose the best suitable hashtag for every image according to their interpretation. Results show that an average of 66% of the participants hashtag choices coincide with those suggested by the photo owners; thus, an initial evidence towards our hypothesis confirmation can be claimed
    corecore