140 research outputs found
Adapting Visual Question Answering Models for Enhancing Multimodal Community Q&A Platforms
Question categorization and expert retrieval methods have been crucial for
information organization and accessibility in community question & answering
(CQA) platforms. Research in this area, however, has dealt with only the text
modality. With the increasing multimodal nature of web content, we focus on
extending these methods for CQA questions accompanied by images. Specifically,
we leverage the success of representation learning for text and images in the
visual question answering (VQA) domain, and adapt the underlying concept and
architecture for automated category classification and expert retrieval on
image-based questions posted on Yahoo! Chiebukuro, the Japanese counterpart of
Yahoo! Answers.
To the best of our knowledge, this is the first work to tackle the
multimodality challenge in CQA, and to adapt VQA models for tasks on a more
ecologically valid source of visual questions. Our analysis of the differences
between visual QA and community QA data drives our proposal of novel
augmentations of an attention method tailored for CQA, and use of auxiliary
tasks for learning better grounding features. Our final model markedly
outperforms the text-only and VQA model baselines for both tasks of
classification and expert retrieval on real-world multimodal CQA data.Comment: Submitted for review at CIKM 201
Cultures in Community Question Answering
CQA services are collaborative platforms where users ask and answer
questions. We investigate the influence of national culture on people's online
questioning and answering behavior. For this, we analyzed a sample of 200
thousand users in Yahoo Answers from 67 countries. We measure empirically a set
of cultural metrics defined in Geert Hofstede's cultural dimensions and Robert
Levine's Pace of Life and show that behavioral cultural differences exist in
community question answering platforms. We find that national cultures differ
in Yahoo Answers along a number of dimensions such as temporal predictability
of activities, contribution-related behavioral patterns, privacy concerns, and
power inequality.Comment: Published in the proceedings of the 26th ACM Conference on Hypertext
and Social Media (HT'15
Large Scale Question Paraphrase Retrieval with Smoothed Deep Metric Learning
The goal of a Question Paraphrase Retrieval (QPR) system is to retrieve
equivalent questions that result in the same answer as the original question.
Such a system can be used to understand and answer rare and noisy
reformulations of common questions by mapping them to a set of canonical forms.
This has large-scale applications for community Question Answering (cQA) and
open-domain spoken language question answering systems. In this paper we
describe a new QPR system implemented as a Neural Information Retrieval (NIR)
system consisting of a neural network sentence encoder and an approximate
k-Nearest Neighbour index for efficient vector retrieval. We also describe our
mechanism to generate an annotated dataset for question paraphrase retrieval
experiments automatically from question-answer logs via distant supervision. We
show that the standard loss function in NIR, triplet loss, does not perform
well with noisy labels. We propose smoothed deep metric loss (SDML) and with
our experiments on two QPR datasets we show that it significantly outperforms
triplet loss in the noisy label setting
Detecting collusive spamming activities in community question answering
Community Question Answering (CQA) portals provide rich sources of information on a variety of topics. However, the authenticity and quality of questions and answers (Q&As) has proven hard to control. In a troubling direction, the widespread growth of crowdsourcing websites has created a large-scale, potentially difficult-to-detect workforce to manipulate malicious contents in CQA. The crowd workers who join the same crowdsourcing task about promotion campaigns in CQA collusively manipulate deceptive Q&As for promoting a target (product or service). The collusive spamming group can fully control the sentiment of the target. How to utilize the structure and the attributes for detecting manipulated Q&As? How to detect the collusive group and leverage the group information for the detection task?
To shed light on these research questions, we propose a unified framework to tackle the challenge of detecting collusive spamming activities of CQA. First, we interpret the questions and answers in CQA as two independent networks. Second, we detect collusive question groups and answer groups from these two networks respectively by measuring the similarity of the contents posted within a short duration. Third, using attributes (individual-level and group-level) and correlations (user-based and content-based), we proposed a combined factor graph model to detect deceptive Q&As simultaneously by combining two independent factor graphs. With a large-scale practical data set, we find that the proposed framework can detect deceptive contents at early stage, and outperforms a number of competitive baselines
Learning to predict closed questions on stack overflow
The paper deals with the problem of predicting whether the user’s question will be closed by the moderator on Stack Overflow, a popular question answering service devoted to software programming. The task along with data and evaluation metrics was offered as an open machine learning competition on Kaggle platform. To solve this problem, we employed a wide range of classification features related to users, their interactions, and post content. Classification was carried out using several machine learning methods. According to the results of the experiment, the most important features are characteristics of the user and topical features of the question. The best results were obtained using Vowpal Wabbit – an implementation of online learning based on stochastic gradient descent. Our results are among the best ones in overall ranking, although they were obtained after the official competition was over
Towards Automatic Evaluation of Health-Related CQA Data
The paper reports on evaluation of Russian community question answering (CQA) data in health domain. About 1,500 question-answer pairs were manually evaluated by medical professionals, in addition automatic evaluation based on reference disease-medicine pairs was performed. Although the results of the manual and automatic evaluation do not fully match, we find the method still promising and propose several improvements. Automatic processing can be used to dynamically monitor the quality of the CQA content and to compare different data sources. Moreover, the approach can be useful for symptomatic surveillance and health education campaigns.This work is partially supported by the Russian Foundation for Basic Research, project #14-07-00589 “Data Analysis and User Modelling in Narrow-Domain Social Media”. We also thank assessors who volunteered for the evaluation and Mail.Ru for granting us access to the data
Mining Duplicate Questions of Stack Overflow
There has a been a significant rise in the use of Community Question
Answering sites (CQAs) over the last decade owing primarily to their ability to
leverage the wisdom of the crowd. Duplicate questions have a crippling effect
on the quality of these sites. Tackling duplicate questions is therefore an
important step towards improving quality of CQAs. In this regard, we propose
two neural network based architectures for duplicate question detection on
Stack Overflow. We also propose explicitly modeling the code present in
questions to achieve results that surpass the state of the art
CollabCoder: A GPT-Powered Workflow for Collaborative Qualitative Analysis
The Collaborative Qualitative Analysis (CQA) process can be time-consuming
and resource-intensive, requiring multiple discussions among team members to
refine codes and ideas before reaching a consensus. To address these
challenges, we introduce CollabCoder, a system leveraging Large Language Models
(LLMs) to support three CQA stages: independent open coding, iterative
discussions, and the development of a final codebook. In the independent open
coding phase, CollabCoder provides AI-generated code suggestions on demand, and
allows users to record coding decision-making information (e.g. keywords and
certainty) as support for the process. During the discussion phase, CollabCoder
helps to build mutual understanding and productive discussion by sharing coding
decision-making information with the team. It also helps to quickly identify
agreements and disagreements through quantitative metrics, in order to build a
final consensus. During the code grouping phase, CollabCoder employs a top-down
approach for primary code group recommendations, reducing the cognitive burden
of generating the final codebook. An evaluation involving 16 users confirmed
the usability and effectiveness of CollabCoder and offered empirical insights
into the LLMs' roles in CQA
Recommended from our members
Essays on the interaction between users and information systems
The role of information systems has evolved from providing decision support into enabling the majority of our daily operations, and the way users interact with information systems has changed dramatically as a result. The goal of this dissertation is to study phenomena that stem from the close interaction between users and information systems using empirical methodologies.
The first essay of this dissertation focuses on the issue of sentiment manipulation. We show that strategic players might be incentivized to manufacture content on social media platforms and opinion forums, in the context of the movie industry. We then identify unusual patterns on Twitter that are consistent with sentiment manipulation.
We study the effectiveness of social media advertising in the second essay. Advertisers on popular social media platforms such as Facebook are able to publish ads with popularity and social information. We design and conduct a randomized field experiment to study the extent to which these types of information have an effect on ad performance.
In the third essay we study how individuals might be biased toward contents that appear to be written more politely. We use data from an online question answering platform, StackExchange, to show that an individual who posts a question on the platform tends to prefer polite answers to clear answers.Information, Risk, and Operations Management (IROM
- …