92 research outputs found

    QDEE: Question Difficulty and Expertise Estimation in Community Question Answering Sites

    Full text link
    In this paper, we present a framework for Question Difficulty and Expertise Estimation (QDEE) in Community Question Answering sites (CQAs) such as Yahoo! Answers and Stack Overflow, which tackles a fundamental challenge in crowdsourcing: how to appropriately route and assign questions to users with the suitable expertise. This problem domain has been the subject of much research and includes both language-agnostic as well as language conscious solutions. We bring to bear a key language-agnostic insight: that users gain expertise and therefore tend to ask as well as answer more difficult questions over time. We use this insight within the popular competition (directed) graph model to estimate question difficulty and user expertise by identifying key hierarchical structure within said model. An important and novel contribution here is the application of "social agony" to this problem domain. Difficulty levels of newly posted questions (the cold-start problem) are estimated by using our QDEE framework and additional textual features. We also propose a model to route newly posted questions to appropriate users based on the difficulty level of the question and the expertise of the user. Extensive experiments on real world CQAs such as Yahoo! Answers and Stack Overflow data demonstrate the improved efficacy of our approach over contemporary state-of-the-art models. The QDEE framework also allows us to characterize user expertise in novel ways by identifying interesting patterns and roles played by different users in such CQAs.Comment: Accepted in the Proceedings of the 12th International AAAI Conference on Web and Social Media (ICWSM 2018). June 2018. Stanford, CA, US

    Expert recommendation via tensor factorization with regularizing hierarchical topical relationships

    Full text link
    © Springer Nature Switzerland AG 2018. Knowledge acquisition and exchange are generally crucial yet costly for both businesses and individuals, especially when the knowledge concerns various areas. Question Answering Communities offer an opportunity for sharing knowledge at a low cost, where communities users, many of whom are domain experts, can potentially provide high-quality solutions to a given problem. In this paper, we propose a framework for finding experts across multiple collaborative networks. We employ the recent techniques of tree-guided learning (via tensor decomposition), and matrix factorization to explore user expertise from past voted posts. Tensor decomposition enables to leverage the latent expertise of users, and the posts and related tags help identify the related areas. The final result is an expertise score for every user on every knowledge area. We experiment on Stack Exchange Networks, a set of question answering websites on different topics with a huge group of users and posts. Experiments show our proposed approach produces steady and premium outputs

    ACQUA: Automated Community-based Question Answering through the Discretisation of Shallow Linguistic Features

    Get PDF
    This paper addresses the problem of determining the best answer in Community-based Question Answering (CQA) websites by focussing on the content. In particular, we present a novel system, ACQUA (http://acqua.kmi.open.ac.uk), that can be installed onto the majority of browsers as a plugin. The service offers a seamless and accurate prediction of the answer to be accepted. Our system is based on a novel approach for processing answers in CQAs. Previous research on this topic relies on the exploitation of community feedback on the answers, which involves rating of either users (e.g., reputation) or answers (e.g. scores manually assigned to answers). We propose a new technique that leverages the content/textual features of answers in a novel way. Our approach delivers better results than related linguistics-based solutions and manages to match rating-based approaches. More specifically, the gain in performance is achieved by rendering the values of these features into a discretised form. We also show how our technique manages to deliver equally good results in real-time settings, as opposed to having to rely on information not always readily available, such as user ratings and answer scores. We ran an evaluation on 21 StackExchange websites covering around 4 million questions and more than 8 million answers. We obtain 84% average precision and 70% recall, which shows that our technique is robust, effective, and widely applicable

    Software expert discovery via knowledge domain embeddings in a collaborative network

    Full text link
    © 2018 Elsevier B.V. Community Question Answering (CQA) websites can be claimed as the most major venues for knowledge sharing, and the most effective way of exchanging knowledge at present. Considering that massive amount of users are participating online and generating huge amount data, management of knowledge here systematically can be challenging. Expert recommendation is one of the major challenges, as it highlights users in CQA with potential expertise, which may help match unresolved questions with existing high quality answers while at the same time may help external services like human resource systems as another reference to evaluate their candidates. In this paper, we in this work we propose to exploring experts in CQA websites. We take advantage of recent distributed word representation technology to help summarize text chunks, and in a semantic view exploiting the relationships between natural language phrases to extract latent knowledge domains. By domains, the users’ expertise is determined on their historical performance, and a rank can be compute to given recommendation accordingly. In particular, Stack Overflow is chosen as our dataset to test and evaluate our work, where inclusive experiment shows our competence

    EFFECTS OF LABEL USAGE ON QUESTION LIFECYCLE IN Q&A COMMUNITY

    Get PDF
    Community question answering (CQA) sites have developed into vast collections of valuable knowledge. Questions, as CQA’s central component, go through several phases after they are posted, which are often referred to as the questions’ lifecycle or questions’ lifespan. Different questions have different lifecycles, which are closely linked to the topics of the questions that can be determined by their attached labels. We conduct an empirical analysis based on the dynamic panel data of a Q&A website and propose a framework for explaining the time sensitivity of topic labels. By applying a Discrete Fourier Transform and a Knee point detection method, we demonstrate the existence of three broad label clusters based on their recurring features and four common question lifecycle patterns. We further prove that the lifecycles of questions in disparate clusters vary significantly. The findings support our hypothesis that questions with more time-sensitive labels are more likely to hit their saturation point sooner than questions with less time-sensitive labels. The research results could be applied for better CQA interface design and more efficient digital resources management

    Identifying reputation collectors in community question answering (CQA) sites: Exploring the dark side of social media

    Get PDF
    YesThis research aims to identify users who are posting as well as encouraging others to post low-quality and duplicate contents on community question answering sites. The good guys called Caretakers and the bad guys called Reputation Collectors are characterised by their behaviour, answering pattern and reputation points. The proposed system is developed and analysed over publicly available Stack Exchange data dump. A graph based methodology is employed to derive the characteristic of Reputation Collectors and Caretakers. Results reveal that Reputation Collectors are primary sources of low-quality answers as well as answers to duplicate questions posted on the site. The Caretakers answer limited questions of challenging nature and fetches maximum reputation against those questions whereas Reputation Collectors answers have so many low-quality and duplicate questions to gain the reputation point. We have developed algorithms to identify the Caretakers and Reputation Collectors of the site. Our analysis finds that 1.05% of Reputation Collectors post 18.88% of low quality answers. This study extends previous research by identifying the Reputation Collectors and 2 how they collect their reputation points

    Is this question going to be closed? : Answering question closibility on Stack Exchange

    Get PDF
    Community question answering sites (CQAs) are often flooded with questions that are never answered. To cope with the problem, experienced users of Stack Exchange are now allowed to mark newly-posted questions as closed if they are of poor quality. Once closed, a question is no longer eligible to receive answers. However, identifying and closing subpar questions takes time. Therefore, the purpose of this paper is to develop a supervised machine learning system that predicts question closibility, the possibility of a newly posted question to be eventually closed. Building on extant research on CQA question quality, the supervised machine learning system uses 17 features that were grouped into four categories, namely, asker features, community features, question content features, and textual features. The performance of the developed system was tested on questions posted on Stack Exchange from 11 randomly chosen topics. The classification performance was generally promising and outperformed the baseline. Most of the measures of precision, recall, F1-score, and AUC were above 0.90 irrespective of the topic of questions. By conceptualizing question closibility, the paper extends previous CQA research on question quality. Unlike previous studies, which were mostly limited to programming-related questions from Stack Overflow, this one empirically tests question closibility on questions from 11 randomly selected topics. The set of features used for classification offers a framework of question closibility that is not only more comprehensive but also more parsimonious compared with prior works
    • …
    corecore