41,576 research outputs found
Learning to predict closed questions on stack overflow
The paper deals with the problem of predicting whether the user’s question will be closed by the moderator on Stack Overflow, a popular question answering service devoted to software programming. The task along with data and evaluation metrics was offered as an open machine learning competition on Kaggle platform. To solve this problem, we employed a wide range of classification features related to users, their interactions, and post content. Classification was carried out using several machine learning methods. According to the results of the experiment, the most important features are characteristics of the user and topical features of the question. The best results were obtained using Vowpal Wabbit – an implementation of online learning based on stochastic gradient descent. Our results are among the best ones in overall ranking, although they were obtained after the official competition was over
NCeSS Project : Data mining for social scientists
We will discuss the work being undertaken on the NCeSS data mining project, a one year project at the University of Manchester which began at the start of 2007, to develop data mining tools of value to the social science community. Our primary goal is to produce a
suite of data mining codes, supported by a web interface, to allow social scientists to mine their datasets in a straightforward way and hence, gain new insights into their data. In order to fully define the requirements, we are looking at a range of typical datasets to find out what
forms they take and the applications and algorithms that will be required. In this paper, we will describe a number of these datasets and will discuss how easily data mining techniques can be used to extract information from the data that would either not be possible or would be
too time consuming by more standard methods
QDEE: Question Difficulty and Expertise Estimation in Community Question Answering Sites
In this paper, we present a framework for Question Difficulty and Expertise
Estimation (QDEE) in Community Question Answering sites (CQAs) such as Yahoo!
Answers and Stack Overflow, which tackles a fundamental challenge in
crowdsourcing: how to appropriately route and assign questions to users with
the suitable expertise. This problem domain has been the subject of much
research and includes both language-agnostic as well as language conscious
solutions. We bring to bear a key language-agnostic insight: that users gain
expertise and therefore tend to ask as well as answer more difficult questions
over time. We use this insight within the popular competition (directed) graph
model to estimate question difficulty and user expertise by identifying key
hierarchical structure within said model. An important and novel contribution
here is the application of "social agony" to this problem domain. Difficulty
levels of newly posted questions (the cold-start problem) are estimated by
using our QDEE framework and additional textual features. We also propose a
model to route newly posted questions to appropriate users based on the
difficulty level of the question and the expertise of the user. Extensive
experiments on real world CQAs such as Yahoo! Answers and Stack Overflow data
demonstrate the improved efficacy of our approach over contemporary
state-of-the-art models. The QDEE framework also allows us to characterize user
expertise in novel ways by identifying interesting patterns and roles played by
different users in such CQAs.Comment: Accepted in the Proceedings of the 12th International AAAI Conference
on Web and Social Media (ICWSM 2018). June 2018. Stanford, CA, US
Visualizing and Interacting with Concept Hierarchies
Concept Hierarchies and Formal Concept Analysis are theoretically well
grounded and largely experimented methods. They rely on line diagrams called
Galois lattices for visualizing and analysing object-attribute sets. Galois
lattices are visually seducing and conceptually rich for experts. However they
present important drawbacks due to their concept oriented overall structure:
analysing what they show is difficult for non experts, navigation is
cumbersome, interaction is poor, and scalability is a deep bottleneck for
visual interpretation even for experts. In this paper we introduce semantic
probes as a means to overcome many of these problems and extend usability and
application possibilities of traditional FCA visualization methods. Semantic
probes are visual user centred objects which extract and organize reduced
Galois sub-hierarchies. They are simpler, clearer, and they provide a better
navigation support through a rich set of interaction possibilities. Since probe
driven sub-hierarchies are limited to users focus, scalability is under control
and interpretation is facilitated. After some successful experiments, several
applications are being developed with the remaining problem of finding a
compromise between simplicity and conceptual expressivity
Leveraging Personal Navigation Assistant Systems Using Automated Social Media Traffic Reporting
Modern urbanization is demanding smarter technologies to improve a variety of
applications in intelligent transportation systems to relieve the increasing
amount of vehicular traffic congestion and incidents. Existing incident
detection techniques are limited to the use of sensors in the transportation
network and hang on human-inputs. Despite of its data abundance, social media
is not well-exploited in such context. In this paper, we develop an automated
traffic alert system based on Natural Language Processing (NLP) that filters
this flood of information and extract important traffic-related bullets. To
this end, we employ the fine-tuning Bidirectional Encoder Representations from
Transformers (BERT) language embedding model to filter the related traffic
information from social media. Then, we apply a question-answering model to
extract necessary information characterizing the report event such as its exact
location, occurrence time, and nature of the events. We demonstrate the adopted
NLP approaches outperform other existing approach and, after effectively
training them, we focus on real-world situation and show how the developed
approach can, in real-time, extract traffic-related information and
automatically convert them into alerts for navigation assistance applications
such as navigation apps.Comment: This paper is accepted for publication in IEEE Technology Engineering
Management Society International Conference (TEMSCON'20), Metro Detroit,
Michigan (USA
- …