5,432 research outputs found
Detecting Sarcasm in Multimodal Social Platforms
Sarcasm is a peculiar form of sentiment expression, where the surface
sentiment differs from the implied sentiment. The detection of sarcasm in
social media platforms has been applied in the past mainly to textual
utterances where lexical indicators (such as interjections and intensifiers),
linguistic markers, and contextual information (such as user profiles, or past
conversations) were used to detect the sarcastic tone. However, modern social
media platforms allow to create multimodal messages where audiovisual content
is integrated with the text, making the analysis of a mode in isolation
partial. In our work, we first study the relationship between the textual and
visual aspects in multimodal posts from three major social media platforms,
i.e., Instagram, Tumblr and Twitter, and we run a crowdsourcing task to
quantify the extent to which images are perceived as necessary by human
annotators. Moreover, we propose two different computational frameworks to
detect sarcasm that integrate the textual and visual modalities. The first
approach exploits visual semantics trained on an external dataset, and
concatenates the semantics features with state-of-the-art textual features. The
second method adapts a visual neural network initialized with parameters
trained on ImageNet to multimodal sarcastic posts. Results show the positive
effect of combining modalities for the detection of sarcasm across platforms
and methods.Comment: 10 pages, 3 figures, final version published in the Proceedings of
ACM Multimedia 201
How to present ideas in idea crowdsourcing communities? Pathways for idea convergence and divergence performances
Currently, idea crowdsourcing communities are widely used for solving problems and fostering innovation. However, when encountering substantial ideas delivered by idea crowdsourcing communities, individuals are hard to generate novel ideas (i.e., idea divergence) and evaluate the appropriateness of the delivered idea (i.e., idea convergence). To address this challenge, platform operators tend to improve the idea presentation design. However, the effectiveness of these idea presentation designs for idea crowdsourcing community users remains unclear. Therefore, we tend to uncover the influencing mechanism of four types of idea presentation design (idea tree, slides, lists, and grids) on idea divergence and convergence outcomes. Accordingly, we adopt dual pathway to creativity model as our theoretical framework and propose an experimental research design. This study will provide insights into platform attribute design and design strategies for improving idea divergence and convergence outcomes
RGB2LIDAR: Towards Solving Large-Scale Cross-Modal Visual Localization
We study an important, yet largely unexplored problem of large-scale
cross-modal visual localization by matching ground RGB images to a
geo-referenced aerial LIDAR 3D point cloud (rendered as depth images). Prior
works were demonstrated on small datasets and did not lend themselves to
scaling up for large-scale applications. To enable large-scale evaluation, we
introduce a new dataset containing over 550K pairs (covering 143 km^2 area) of
RGB and aerial LIDAR depth images. We propose a novel joint embedding based
method that effectively combines the appearance and semantic cues from both
modalities to handle drastic cross-modal variations. Experiments on the
proposed dataset show that our model achieves a strong result of a median rank
of 5 in matching across a large test set of 50K location pairs collected from a
14km^2 area. This represents a significant advancement over prior works in
performance and scale. We conclude with qualitative results to highlight the
challenging nature of this task and the benefits of the proposed model. Our
work provides a foundation for further research in cross-modal visual
localization.Comment: ACM Multimedia 202
Automated Fact-Checking for Assisting Human Fact-Checkers
The reporting and analysis of current events around the globe has expanded
from professional, editor-lead journalism all the way to citizen journalism.
Politicians and other key players enjoy direct access to their audiences
through social media, bypassing the filters of official cables or traditional
media. However, the multiple advantages of free speech and direct communication
are dimmed by the misuse of the media to spread inaccurate or misleading
claims. These phenomena have led to the modern incarnation of the fact-checker
-- a professional whose main aim is to examine claims using available evidence
to assess their veracity. As in other text forensics tasks, the amount of
information available makes the work of the fact-checker more difficult. With
this in mind, starting from the perspective of the professional fact-checker,
we survey the available intelligent technologies that can support the human
expert in the different steps of her fact-checking endeavor. These include
identifying claims worth fact-checking; detecting relevant previously
fact-checked claims; retrieving relevant evidence to fact-check a claim; and
actually verifying a claim. In each case, we pay attention to the challenges in
future work and the potential impact on real-world fact-checking.Comment: fact-checking, fact-checkers, check-worthiness, detecting previously
fact-checked claims, evidence retrieva
Retrieval-augmented Image Captioning
Inspired by retrieval-augmented language generation and pretrained Vision and
Language (V&L) encoders, we present a new approach to image captioning that
generates sentences given the input image and a set of captions retrieved from
a datastore, as opposed to the image alone. The encoder in our model jointly
processes the image and retrieved captions using a pretrained V&L BERT, while
the decoder attends to the multimodal encoder representations, benefiting from
the extra textual evidence from the retrieved captions. Experimental results on
the COCO dataset show that image captioning can be effectively formulated from
this new perspective. Our model, named EXTRA, benefits from using captions
retrieved from the training dataset, and it can also benefit from using an
external dataset without the need for retraining. Ablation studies show that
retrieving a sufficient number of captions (e.g., k=5) can improve captioning
quality. Our work contributes towards using pretrained V&L encoders for
generative tasks, instead of standard classification tasks
Hi, how can I help you?: Automating enterprise IT support help desks
Question answering is one of the primary challenges of natural language
understanding. In realizing such a system, providing complex long answers to
questions is a challenging task as opposed to factoid answering as the former
needs context disambiguation. The different methods explored in the literature
can be broadly classified into three categories namely: 1) classification
based, 2) knowledge graph based and 3) retrieval based. Individually, none of
them address the need of an enterprise wide assistance system for an IT support
and maintenance domain. In this domain the variance of answers is large ranging
from factoid to structured operating procedures; the knowledge is present
across heterogeneous data sources like application specific documentation,
ticket management systems and any single technique for a general purpose
assistance is unable to scale for such a landscape. To address this, we have
built a cognitive platform with capabilities adopted for this domain. Further,
we have built a general purpose question answering system leveraging the
platform that can be instantiated for multiple products, technologies in the
support domain. The system uses a novel hybrid answering model that
orchestrates across a deep learning classifier, a knowledge graph based context
disambiguation module and a sophisticated bag-of-words search system. This
orchestration performs context switching for a provided question and also does
a smooth hand-off of the question to a human expert if none of the automated
techniques can provide a confident answer. This system has been deployed across
675 internal enterprise IT support and maintenance projects.Comment: To appear in IAAI 201
Applying Facets of Work as a Source of Knowledge and Insight for Requirements Determination
This conceptual contribution explains how the idea of “facets of work” can bring more knowledge and richer, more evocative ideas to the development of system requirements in organizational settings. Focusing on facets of work potentially provides useful guidance without requiring unnecessary details, precision, and notation. A background section summarizes how the current research emerged from partial overlaps between separate research efforts. Table 1 identifies 18 facets of work. Five other tables look at a subset of the facets to illustrate concepts associated with specific facets, common success factors and tradeoffs, sub-facets and other topics. Use of the same subset of the facets to classify quotations from a case study demonstrates the broad relevance of the approach
Recommended from our members
JuxtaLearn D3.2 Performance Framework
This deliverable, D3.2, for Work Package 3 incorporating the pedagogy from WP2 and orchestration factors mapped in D3.1 reviews aspects of performance in the context of participative video making. It reviews literature on curiosity and engagement characteristics of interaction mechanisms for public displays and anticipates requirements for social network analysis of relevant public videos from WP6 task 6.3. Thus, to support JuxtaLearn performance it proposes a reflective performance framework that encompasses the material environment and objects required, the participants, and the knowledge needed
- …