Search CORE

140 research outputs found

Adapting Visual Question Answering Models for Enhancing Multimodal Community Q&A Platforms

Author: Ma Lin
Malinowski Mateusz
Roberts Kirk
Stanley Clayton
Publication venue
Publication date: 25/05/2019
Field of study

Question categorization and expert retrieval methods have been crucial for information organization and accessibility in community question & answering (CQA) platforms. Research in this area, however, has dealt with only the text modality. With the increasing multimodal nature of web content, we focus on extending these methods for CQA questions accompanied by images. Specifically, we leverage the success of representation learning for text and images in the visual question answering (VQA) domain, and adapt the underlying concept and architecture for automated category classification and expert retrieval on image-based questions posted on Yahoo! Chiebukuro, the Japanese counterpart of Yahoo! Answers. To the best of our knowledge, this is the first work to tackle the multimodality challenge in CQA, and to adapt VQA models for tasks on a more ecologically valid source of visual questions. Our analysis of the differences between visual QA and community QA data drives our proposal of novel augmentations of an attention method tailored for CQA, and use of auxiliary tasks for learning better grounding features. Our final model markedly outperforms the text-only and VQA model baselines for both tasks of classification and expert retrieval on real-world multimodal CQA data.Comment: Submitted for review at CIKM 201

arXiv.org e-Print Archive

Crossref

Cultures in Community Question Answering

Author: Bonchi Francesco
Iamnitchi Adriana
Kayes Imrul
Kourtellis Nicolas
Quercia Daniele
Publication venue
Publication date: 01/01/2015
Field of study

CQA services are collaborative platforms where users ask and answer questions. We investigate the influence of national culture on people's online questioning and answering behavior. For this, we analyzed a sample of 200 thousand users in Yahoo Answers from 67 countries. We measure empirically a set of cultural metrics defined in Geert Hofstede's cultural dimensions and Robert Levine's Pace of Life and show that behavioral cultural differences exist in community question answering platforms. We find that national cultures differ in Yahoo Answers along a number of dimensions such as temporal predictability of activities, contribution-related behavioral patterns, privacy concerns, and power inequality.Comment: Published in the proceedings of the 26th ACM Conference on Hypertext and Social Media (HT'15

arXiv.org e-Print Archive

Crossref

Large Scale Question Paraphrase Retrieval with Smoothed Deep Metric Learning

Author: Bonadiman Daniele
Kumar Anjishnu
Mittal Arpit
Publication venue
Publication date: 01/01/2019
Field of study

The goal of a Question Paraphrase Retrieval (QPR) system is to retrieve equivalent questions that result in the same answer as the original question. Such a system can be used to understand and answer rare and noisy reformulations of common questions by mapping them to a set of canonical forms. This has large-scale applications for community Question Answering (cQA) and open-domain spoken language question answering systems. In this paper we describe a new QPR system implemented as a Neural Information Retrieval (NIR) system consisting of a neural network sentence encoder and an approximate k-Nearest Neighbour index for efficient vector retrieval. We also describe our mechanism to generate an annotated dataset for question paraphrase retrieval experiments automatically from question-answer logs via distant supervision. We show that the standard loss function in NIR, triplet loss, does not perform well with noisy labels. We propose smoothed deep metric loss (SDML) and with our experiments on two QPR datasets we show that it significantly outperforms triplet loss in the noisy label setting

arXiv.org e-Print Archive

Crossref

Detecting collusive spamming activities in community question answering

Author: Liu Yiqun
Liu Yuli
Ma Shaoping
Zhang Min
Zhou Ke
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/04/2017
Field of study

Community Question Answering (CQA) portals provide rich sources of information on a variety of topics. However, the authenticity and quality of questions and answers (Q&As) has proven hard to control. In a troubling direction, the widespread growth of crowdsourcing websites has created a large-scale, potentially difficult-to-detect workforce to manipulate malicious contents in CQA. The crowd workers who join the same crowdsourcing task about promotion campaigns in CQA collusively manipulate deceptive Q&As for promoting a target (product or service). The collusive spamming group can fully control the sentiment of the target. How to utilize the structure and the attributes for detecting manipulated Q&As? How to detect the collusive group and leverage the group information for the detection task? To shed light on these research questions, we propose a unified framework to tackle the challenge of detecting collusive spamming activities of CQA. First, we interpret the questions and answers in CQA as two independent networks. Second, we detect collusive question groups and answer groups from these two networks respectively by measuring the similarity of the contents posted within a short duration. Third, using attributes (individual-level and group-level) and correlations (user-based and content-based), we proposed a combined factor graph model to detect deceptive Q&As simultaneously by combining two independent factor graphs. With a large-scale practical data set, we find that the proposed framework can detect deceptive contents at early stage, and outperforms a number of competitive baselines

Nottingham ePrints

Nottingham eTheses

Crossref

Learning to predict closed questions on stack overflow

Author: Braslavski P.
Kuznetsov A.
Lezina G.
Браславский П. И.
Кузнецов А.
Лезина Г.
Publication venue: Казанский университет
Publication date: 01/01/2013
Field of study

The paper deals with the problem of predicting whether the user’s question will be closed by the moderator on Stack Overflow, a popular question answering service devoted to software programming. The task along with data and evaluation metrics was offered as an open machine learning competition on Kaggle platform. To solve this problem, we employed a wide range of classification features related to users, their interactions, and post content. Classification was carried out using several machine learning methods. According to the results of the experiment, the most important features are characteristics of the user and topical features of the question. The best results were obtained using Vowpal Wabbit – an implementation of online learning based on stochastic gradient descent. Our results are among the best ones in overall ranking, although they were obtained after the official competition was over

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin

Towards Automatic Evaluation of Health-Related CQA Data

Author: Beloborodov A.
Braslavski P.
Driker M.
Белобородов А. В.
Браславский П. И.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

The paper reports on evaluation of Russian community question answering (CQA) data in health domain. About 1,500 question-answer pairs were manually evaluated by medical professionals, in addition automatic evaluation based on reference disease-medicine pairs was performed. Although the results of the manual and automatic evaluation do not fully match, we find the method still promising and propose several improvements. Automatic processing can be used to dynamically monitor the quality of the CQA content and to compare different data sources. Moreover, the approach can be useful for symptomatic surveillance and health education campaigns.This work is partially supported by the Russian Foundation for Basic Research, project #14-07-00589 “Data Analysis and User Modelling in Narrow-Domain Social Media”. We also thank assessors who volunteered for the evaluation and Mail.Ru for granting us access to the data

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin

Mining Duplicate Questions of Stack Overflow

Author: Dheram Pranav
Kale Mihir
Parik Radhika
Rayasam Anirudha
Publication venue
Publication date: 04/10/2022
Field of study

There has a been a significant rise in the use of Community Question Answering sites (CQAs) over the last decade owing primarily to their ability to leverage the wisdom of the crowd. Duplicate questions have a crippling effect on the quality of these sites. Tackling duplicate questions is therefore an important step towards improving quality of CQAs. In this regard, we propose two neural network based architectures for duplicate question detection on Stack Overflow. We also propose explicitly modeling the code present in questions to achieve results that surpass the state of the art

arXiv.org e-Print Archive

CollabCoder: A GPT-Powered Workflow for Collaborative Qualitative Analysis

Author: Gao Jie
Guo Yuchen
Li Toby Jia-Jun
Lim Gionnieve
Perrault Simon Tangi
Zhang Tianqin
Zhang Zheng
Publication venue
Publication date: 18/04/2023
Field of study

The Collaborative Qualitative Analysis (CQA) process can be time-consuming and resource-intensive, requiring multiple discussions among team members to refine codes and ideas before reaching a consensus. To address these challenges, we introduce CollabCoder, a system leveraging Large Language Models (LLMs) to support three CQA stages: independent open coding, iterative discussions, and the development of a final codebook. In the independent open coding phase, CollabCoder provides AI-generated code suggestions on demand, and allows users to record coding decision-making information (e.g. keywords and certainty) as support for the process. During the discussion phase, CollabCoder helps to build mutual understanding and productive discussion by sharing coding decision-making information with the team. It also helps to quickly identify agreements and disagreements through quantitative metrics, in order to build a final consensus. During the code grouping phase, CollabCoder employs a top-down approach for primary code group recommendations, reducing the cognitive burden of generating the final codebook. An evaluation involving 16 users confirmed the usability and effectiveness of CollabCoder and offered empirical insights into the LLMs' roles in CQA

arXiv.org e-Print Archive

Recommended from our members

Essays on the interaction between users and information systems

Author: Lee Shun-Yang
Publication venue
Publication date: 27/09/2018
Field of study

The role of information systems has evolved from providing decision support into enabling the majority of our daily operations, and the way users interact with information systems has changed dramatically as a result. The goal of this dissertation is to study phenomena that stem from the close interaction between users and information systems using empirical methodologies. The first essay of this dissertation focuses on the issue of sentiment manipulation. We show that strategic players might be incentivized to manufacture content on social media platforms and opinion forums, in the context of the movie industry. We then identify unusual patterns on Twitter that are consistent with sentiment manipulation. We study the effectiveness of social media advertising in the second essay. Advertisers on popular social media platforms such as Facebook are able to publish ads with popularity and social information. We design and conduct a randomized field experiment to study the extent to which these types of information have an effect on ad performance. In the third essay we study how individuals might be biased toward contents that appear to be written more politely. We use data from an online question answering platform, StackExchange, to show that an individual who posts a question on the platform tends to prefer polite answers to clear answers.Information, Risk, and Operations Management (IROM

Texas ScholarWorks