74 research outputs found
Detecting Similar Applications with Collaborative Tagging
Abstract—Detecting similar applications are useful for var-ious purposes ranging from program comprehension, rapid prototyping, plagiarism detection, and many more. McMillan et al. have proposed a solution to detect similar applications based on common Java API usage patterns. Recently, collaborative tagging has impacted software development practices. Various sites allow users to give various tags to software systems. In this study, we would like to complement the study by McMillan et al. by leveraging another source of information aside from API usage patterns, namely software tags. We have performed a user study involving several participants and the results show that collaborative tagging is a promising source of information useful for detecting similar software applications. I
A Tutorial on Software Engineering Intelligence: Case Studies on Model-Driven Engineering
Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/153783/1/MODELS_Tutorial__SEI___Copy_.pd
Stack Overflow in Github: Any Snippets There?
When programmers look for how to achieve certain programming tasks, Stack
Overflow is a popular destination in search engine results. Over the years,
Stack Overflow has accumulated an impressive knowledge base of snippets of code
that are amply documented. We are interested in studying how programmers use
these snippets of code in their projects. Can we find Stack Overflow snippets
in real projects? When snippets are used, is this copy literal or does it
suffer adaptations? And are these adaptations specializations required by the
idiosyncrasies of the target artifact, or are they motivated by specific
requirements of the programmer? The large-scale study presented on this paper
analyzes 909k non-fork Python projects hosted on Github, which contain 290M
function definitions, and 1.9M Python snippets captured in Stack Overflow.
Results are presented as quantitative analysis of block-level code cloning
intra and inter Stack Overflow and GitHub, and as an analysis of programming
behaviors through the qualitative analysis of our findings.Comment: 14th International Conference on Mining Software Repositories, 11
page
Contemporary Approach for Technical Reckoning Code Smells Detection using Textual Analysis
Software Designers should be aware of address design smells that can evident as results of design and decision. In a software project, technical debt needs to be repaid habitually to avoid its accretion. Large technical debt significantly degrades the quality of the software system and affects the productivity of the development team. In tremendous cases, when the accumulated technical reckoning becomes so enormous that it cannot be paid off to any further extent the product has to be abandoned. In this paper, we bridge the gap analyzing to what coverage abstract information, extracted using textual analysis techniques, can be used to identify smells in source code. The proposed textual-based move toward for detecting smells in source code, fabricated as TACO (Textual Analysis for Code smell detection), has been instantiated for detecting the long parameter list smell and has been evaluated on three sampling Java open source projects. The results determined that TACO is able to indentified between 50% and 77% of the smell instances with a exactitude ranging between 63% and 67%. In addition, the results show that TACO identifies smells that are not recognized by approaches based on exclusively structural information
Using Word Embedding and Convolution Neural Network for Bug Triaging by Considering Design Flaws
Resolving bugs in the maintenance phase of software is a complicated task.
Bug assignment is one of the main tasks for resolving bugs. Some Bugs cannot be
fixed properly without making design decisions and have to be assigned to
designers, rather than programmers, to avoid emerging bad smells that may cause
subsequent bug reports. Hence, it is important to refer some bugs to the
designer to check the possible design flaws. Based on our best knowledge, there
are a few works that have considered referring bugs to designers. Hence, this
issue is considered in this work. In this paper, a dataset is created, and a
CNN-based model is proposed to predict the need for assigning a bug to a
designer by learning the peculiarities of bug reports effective in creating bad
smells in the code. The features of each bug are extracted from CNN based on
its textual features, such as a summary and description. The number of bad
samples added to it in the fixing process using the PMD tool determines the bug
tag. The summary and description of the new bug are given to the model and the
model predicts the need to refer to the designer. The accuracy of 75% (or more)
was achieved for datasets with a sufficient number of samples for deep
learning-based model training. A model is proposed to predict bug referrals to
the designer. The efficiency of the model in predicting referrals to the
designer at the time of receiving the bug report was demonstrated by testing
the model on 10 projects
Identification-method research for open-source software ecosystems
In recent years, open-source software (OSS) development has grown, with many developers around the world working on different OSS projects. A variety of open-source software ecosystems have emerged, for instance, GitHub, StackOverflow, and SourceForge. One of the most typical social-programming and code-hosting sites, GitHub, has amassed numerous open-source-software projects and developers in the same virtual collaboration platform. Since GitHub itself is a large open-source community, it hosts a collection of software projects that are developed together and coevolve. The great challenge here is how to identify the relationship between these projects, i.e., project relevance. Software-ecosystem identification is the basis of other studies in the ecosystem. Therefore, how to extract useful information in GitHub and identify software ecosystems is particularly important, and it is also a research area in symmetry. In this paper, a Topic-based Project Knowledge Metrics Framework (TPKMF) is proposed. By collecting the multisource dataset of an open-source ecosystem, project-relevance analysis of the open-source software is carried out on the basis of software-ecosystem identification. Then, we used our Spectral Clustering algorithm based on Core Project (CP-SC) to identify software-ecosystem projects and further identify software ecosystems. We verified that most software ecosystems usually contain a core software project, and most other projects are associated with it. Furthermore, we analyzed the characteristics of the ecosystem, and we also found that interactive information has greater impact on project relevance. Finally, we summarize the Topic-based Project Knowledge Metrics Framework
MuDelta: Delta-Oriented Mutation Testing at Commit Time
To effectively test program changes using mutation testing, one needs to use mutants that are relevant to the altered program behaviours. In view of this, we introduce MuDelta, an approach that identifies commit-relevant mutants; mutants that affect and are affected by the changed program behaviours. Our approach uses machine learning applied on a combined scheme of graph and vector-based representations of static code features. Our results, from 50 commits in 21 Coreutils programs, demonstrate a strong prediction ability of our approach; yielding 0.80 (ROC) and 0.50 (PR Curve) AUC values with 0.63 and 0.32 precision and recall values. These predictions are significantly higher than random guesses, 0.20 (PR-Curve) AUC, 0.21 and 0.21 precision and recall, and subsequently lead to strong relevant tests that kill 45%more relevant mutants than randomly sampled mutants (either sampled from those residing on the changed component(s) or from the changed lines). Our results also show that MuDelta selects mutants with 27% higher fault revealing ability in fault introducing commits. Taken together, our results corroborate the conclusion that commit-based mutation testing is suitable and promising for evolving software
When would this bug get reported?
Abstract—Not all bugs in software would be experienced and reported by end users right away: Some bugs manifest themselves quickly and may be reported by users a few days after they get into the code base; others manifest many months or even years later, and may only be experienced and reported by a small number of users. We refer to the period of time between the time when a bug is introduced into code and the time when it is reported by a user as bug reporting latency. Knowledge of bug reporting latencies has an implication on prioritization of bug fixing activities—bugs with low reporting latencies may be fixed earlier than those with high latencies to shift debugging resources towards bugs highly concerning users. To investigate bug reporting latencies, we analyze bugs from three Java software systems: AspectJ, Rhino, and Lucene. We extract bug reporting data from their version control repositories and bug tracking systems, identify bug locations based on bug fixes, and back-trace bug introducing time based on change histories of the buggy code. Also, we remove nonessential changes, and most importantly, recover root causes of bugs from their treatments/fixes. We then calculate the bug reporting latencies, and find that bugs have diverse reporting latencies. Based on the calculated reporting latencies and features we extract from bugs, we build classification models that can predict whether a bug would be reported early (within 30 days) or later, which may be helpful for prioritizing bug fixing activities. Our evaluation on the three software systems shows that our bug reporting latency prediction models could achieve an AUC (Area Under the Receiving Operating Characteristics Curve) of 70.869%. I
- …