41,576 research outputs found

    Learning to predict closed questions on stack overflow

    Full text link
    The paper deals with the problem of predicting whether the user’s question will be closed by the moderator on Stack Overflow, a popular question answering service devoted to software programming. The task along with data and evaluation metrics was offered as an open machine learning competition on Kaggle platform. To solve this problem, we employed a wide range of classification features related to users, their interactions, and post content. Classification was carried out using several machine learning methods. According to the results of the experiment, the most important features are characteristics of the user and topical features of the question. The best results were obtained using Vowpal Wabbit – an implementation of online learning based on stochastic gradient descent. Our results are among the best ones in overall ranking, although they were obtained after the official competition was over

    NCeSS Project : Data mining for social scientists

    Get PDF
    We will discuss the work being undertaken on the NCeSS data mining project, a one year project at the University of Manchester which began at the start of 2007, to develop data mining tools of value to the social science community. Our primary goal is to produce a suite of data mining codes, supported by a web interface, to allow social scientists to mine their datasets in a straightforward way and hence, gain new insights into their data. In order to fully define the requirements, we are looking at a range of typical datasets to find out what forms they take and the applications and algorithms that will be required. In this paper, we will describe a number of these datasets and will discuss how easily data mining techniques can be used to extract information from the data that would either not be possible or would be too time consuming by more standard methods

    QDEE: Question Difficulty and Expertise Estimation in Community Question Answering Sites

    Full text link
    In this paper, we present a framework for Question Difficulty and Expertise Estimation (QDEE) in Community Question Answering sites (CQAs) such as Yahoo! Answers and Stack Overflow, which tackles a fundamental challenge in crowdsourcing: how to appropriately route and assign questions to users with the suitable expertise. This problem domain has been the subject of much research and includes both language-agnostic as well as language conscious solutions. We bring to bear a key language-agnostic insight: that users gain expertise and therefore tend to ask as well as answer more difficult questions over time. We use this insight within the popular competition (directed) graph model to estimate question difficulty and user expertise by identifying key hierarchical structure within said model. An important and novel contribution here is the application of "social agony" to this problem domain. Difficulty levels of newly posted questions (the cold-start problem) are estimated by using our QDEE framework and additional textual features. We also propose a model to route newly posted questions to appropriate users based on the difficulty level of the question and the expertise of the user. Extensive experiments on real world CQAs such as Yahoo! Answers and Stack Overflow data demonstrate the improved efficacy of our approach over contemporary state-of-the-art models. The QDEE framework also allows us to characterize user expertise in novel ways by identifying interesting patterns and roles played by different users in such CQAs.Comment: Accepted in the Proceedings of the 12th International AAAI Conference on Web and Social Media (ICWSM 2018). June 2018. Stanford, CA, US

    Visualizing and Interacting with Concept Hierarchies

    Full text link
    Concept Hierarchies and Formal Concept Analysis are theoretically well grounded and largely experimented methods. They rely on line diagrams called Galois lattices for visualizing and analysing object-attribute sets. Galois lattices are visually seducing and conceptually rich for experts. However they present important drawbacks due to their concept oriented overall structure: analysing what they show is difficult for non experts, navigation is cumbersome, interaction is poor, and scalability is a deep bottleneck for visual interpretation even for experts. In this paper we introduce semantic probes as a means to overcome many of these problems and extend usability and application possibilities of traditional FCA visualization methods. Semantic probes are visual user centred objects which extract and organize reduced Galois sub-hierarchies. They are simpler, clearer, and they provide a better navigation support through a rich set of interaction possibilities. Since probe driven sub-hierarchies are limited to users focus, scalability is under control and interpretation is facilitated. After some successful experiments, several applications are being developed with the remaining problem of finding a compromise between simplicity and conceptual expressivity

    Leveraging Personal Navigation Assistant Systems Using Automated Social Media Traffic Reporting

    Full text link
    Modern urbanization is demanding smarter technologies to improve a variety of applications in intelligent transportation systems to relieve the increasing amount of vehicular traffic congestion and incidents. Existing incident detection techniques are limited to the use of sensors in the transportation network and hang on human-inputs. Despite of its data abundance, social media is not well-exploited in such context. In this paper, we develop an automated traffic alert system based on Natural Language Processing (NLP) that filters this flood of information and extract important traffic-related bullets. To this end, we employ the fine-tuning Bidirectional Encoder Representations from Transformers (BERT) language embedding model to filter the related traffic information from social media. Then, we apply a question-answering model to extract necessary information characterizing the report event such as its exact location, occurrence time, and nature of the events. We demonstrate the adopted NLP approaches outperform other existing approach and, after effectively training them, we focus on real-world situation and show how the developed approach can, in real-time, extract traffic-related information and automatically convert them into alerts for navigation assistance applications such as navigation apps.Comment: This paper is accepted for publication in IEEE Technology Engineering Management Society International Conference (TEMSCON'20), Metro Detroit, Michigan (USA
    corecore