93 research outputs found

    Less is MORE: a MultimOdal system for tag REfinement

    Get PDF
    With the proliferation of image-based social media, an ex-tremely large amount of multimodal data is being produced. Very oftenimage contents are published together with a set of user defined meta-data such as tags and textual descriptions. Despite being very useful toenhance traditional image retrieval, user defined tags on social mediahave been proven to be noneffective to index images because they areinfluenced by personal experiences of the owners as well as their will ofpromoting the published contents. To be analyzed and indexed, multi-modal data require algorithms able to jointly deal with textual and visualdata. This research presents a multimodal approach to the problem of tagrefinement, which consists in separating the relevant descriptors (tags)of images from noisy ones. The proposed method exploits both Natu-ral Language Processing (NLP) and Computer Vision (CV) techniquesbased on deep learning to find a match between the textual informationand visual content of social media posts. Textual semantic features arerepresented with (multilingual) word embeddings, while visual ones areobtained with image classification. The proposed system is evaluated ona manually annotated Italian dataset extracted from Instagram achieving68% of weighted F1-scor

    Can machines sense irony? : exploring automatic irony detection on social media

    Get PDF

    INEX Tweet Contextualization Task: Evaluation, Results and Lesson Learned

    Get PDF
    Microblogging platforms such as Twitter are increasingly used for on-line client and market analysis. This motivated the proposal of a new track at CLEF INEX lab of Tweet Contextualization. The objective of this task was to help a user to understand a tweet by providing him with a short explanatory summary (500 words). This summary should be built automatically using resources like Wikipedia and generated by extracting relevant passages and aggregating them into a coherent summary. Running for four years, results show that the best systems combine NLP techniques with more traditional methods. More precisely the best performing systems combine passage retrieval, sentence segmentation and scoring, named entity recognition, text part-of-speech (POS) analysis, anaphora detection, diversity content measure as well as sentence reordering. This paper provides a full summary report on the four-year long task. While yearly overviews focused on system results, in this paper we provide a detailed report on the approaches proposed by the participants and which can be considered as the state of the art for this task. As an important result from the 4 years competition, we also describe the open access resources that have been built and collected. The evaluation measures for automatic summarization designed in DUC or MUC were not appropriate to evaluate tweet contextualization, we explain why and depict in detailed the LogSim measure used to evaluate informativeness of produced contexts or summaries. Finally, we also mention the lessons we learned and that it is worth considering when designing a task
    • …
    corecore