4,264 research outputs found
How to Ask for Technical Help? Evidence-based Guidelines for Writing Questions on Stack Overflow
Context: The success of Stack Overflow and other community-based
question-and-answer (Q&A) sites depends mainly on the will of their members to
answer others' questions. In fact, when formulating requests on Q&A sites, we
are not simply seeking for information. Instead, we are also asking for other
people's help and feedback. Understanding the dynamics of the participation in
Q&A communities is essential to improve the value of crowdsourced knowledge.
Objective: In this paper, we investigate how information seekers can increase
the chance of eliciting a successful answer to their questions on Stack
Overflow by focusing on the following actionable factors: affect, presentation
quality, and time.
Method: We develop a conceptual framework of factors potentially influencing
the success of questions in Stack Overflow. We quantitatively analyze a set of
over 87K questions from the official Stack Overflow dump to assess the impact
of actionable factors on the success of technical requests. The information
seeker reputation is included as a control factor. Furthermore, to understand
the role played by affective states in the success of questions, we
qualitatively analyze questions containing positive and negative emotions.
Finally, a survey is conducted to understand how Stack Overflow users perceive
the guideline suggestions for writing questions.
Results: We found that regardless of user reputation, successful questions
are short, contain code snippets, and do not abuse with uppercase characters.
As regards affect, successful questions adopt a neutral emotional style.
Conclusion: We provide evidence-based guidelines for writing effective
questions on Stack Overflow that software engineers can follow to increase the
chance of getting technical help. As for the role of affect, we empirically
confirmed community guidelines that suggest avoiding rudeness in question
writing.Comment: Preprint, to appear in Information and Software Technolog
FixMiner: Mining Relevant Fix Patterns for Automated Program Repair
Patching is a common activity in software development. It is generally
performed on a source code base to address bugs or add new functionalities. In
this context, given the recurrence of bugs across projects, the associated
similar patches can be leveraged to extract generic fix actions. While the
literature includes various approaches leveraging similarity among patches to
guide program repair, these approaches often do not yield fix patterns that are
tractable and reusable as actionable input to APR systems. In this paper, we
propose a systematic and automated approach to mining relevant and actionable
fix patterns based on an iterative clustering strategy applied to atomic
changes within patches. The goal of FixMiner is thus to infer separate and
reusable fix patterns that can be leveraged in other patch generation systems.
Our technique, FixMiner, leverages Rich Edit Script which is a specialized tree
structure of the edit scripts that captures the AST-level context of the code
changes. FixMiner uses different tree representations of Rich Edit Scripts for
each round of clustering to identify similar changes. These are abstract syntax
trees, edit actions trees, and code context trees. We have evaluated FixMiner
on thousands of software patches collected from open source projects.
Preliminary results show that we are able to mine accurate patterns,
efficiently exploiting change information in Rich Edit Scripts. We further
integrated the mined patterns to an automated program repair prototype,
PARFixMiner, with which we are able to correctly fix 26 bugs of the Defects4J
benchmark. Beyond this quantitative performance, we show that the mined fix
patterns are sufficiently relevant to produce patches with a high probability
of correctness: 81% of PARFixMiner's generated plausible patches are correct.Comment: 31 pages, 11 figure
DRIP - Data Rich, Information Poor: A Concise Synopsis of Data Mining
As production of data is exponentially growing with a drastically lower cost, the importance of data mining required to extract and discover valuable information is becoming more paramount. To be functional in any business or industry, data must be capable of supporting sound decision-making and plausible prediction. The purpose of this paper is concisely but broadly to provide a synopsis of the technology and theory of data mining, providing an enhanced comprehension of the methods by which massive data can be transferred into meaningful information
Label-Descriptive Patterns and their Application to Characterizing Classification Errors
State-of-the-art deep learning methods achieve human-like performance on many tasks, but make errors nevertheless. Characterizing these errors in easily interpretable terms gives insight into whether a model is prone to making systematic errors, but also gives a way to act and improve the model. In this paper we propose a method that allows us to do so for arbitrary classifiers by mining a small set of patterns that together succinctly describe the input data that is partitioned according to correctness of prediction. We show this is an instance of the more general label description problem, which we formulate in terms of the Minimum Description Length principle. To discover good pattern sets we propose the efficient and hyperparameter-free Premise algorithm, which through an extensive set of experiments we show on both synthetic and real-world data performs very well in practice; unlike existing solutions it ably recovers ground truth patterns, even on highly imbalanced data over many unique items, or where patterns are only weakly associated to labels. Through two real-world case studies we confirm that Premise gives clear and actionable insight into the systematic errors made by modern NLP classifiers
Measuring Interestingness – Perspectives on Anomaly Detection
We live in a data deluge. Our ability to gather, distribute, and store information has grown immensely over the past two decades. With this overabundance of data, the core knowledge discovery problem is no longer in the gathering of this data, but rather in the retrieving of relevant data efficiently. While the most common approach is to use rule interestingness to filter results of the association rule generation process, study of literature suggests that interestingness is difficult to define quantitatively and is best summarized as, “a record or pattern is interesting if it suggests a change in an established model.” In this paper we elaborate on the term interestingness, and the surrounding taxonomy of interestingness measures, anomalies, novelty and surprisingness. We review and summarize the current state of literature surrounding interestingness and associated approaches. Keywords: Interestingness, anomaly detection, rare-class mining, Interestingness measures, outliers, surprisingness, novelt
Exploring Critical Success Factors for Data Integration and Decision-Making in Law Enforcement
Agencies from various disciplines supporting law enforcement functions and processes have integrated, shared, and communicated data through ad hoc methods to address crime, terrorism, and many other threats in the United States. Data integration in law enforcement plays a critical role in the technical, business, and intelligence processes created by users to combine data from various sources and domains to transform them into valuable information. The purpose of this qualitative phenomenological study was to explore the current conditions of data integration frameworks through user and system interactions among law enforcement organizational processes. Further exploration of critical success factors used to integrate data more efficiently among systems of systems and user interactions may improve crime and intelligence analysis through modern applications and novel frameworks
Identification of patterns for increasing production with decision trees in sugarcane mill data
Sugarcane mills in Brazil collect a vast amount of data relating to production on an annual basis. The analysis of this type of database is complex, especially when factors relating to varieties, climate, detailed management techniques, and edaphic conditions are taken into account. The aim of this paper was to perform a decision tree analysis of a detailed database from a production unit and to evaluate the actionable patterns found in terms of their usefulness for increasing production. The decision tree revealed interpretable patterns relating to sugarcane yield (R2 = 0.617), certain of which were actionable and had been previously studied and reported in the literature. Based on two actionable patterns relating to soil chemistry, intervention which will increase production by almost 2 % were suitable for recommendation. The method was successful in reproducing the knowledge of experts of the factors which influence sugarcane yield, and the decision trees can support the decision-making process in the context of production and the formulation of hypotheses for specific experiments
- …