74 research outputs found

    Detecting Similar Applications with Collaborative Tagging

    Get PDF
    Abstract—Detecting similar applications are useful for var-ious purposes ranging from program comprehension, rapid prototyping, plagiarism detection, and many more. McMillan et al. have proposed a solution to detect similar applications based on common Java API usage patterns. Recently, collaborative tagging has impacted software development practices. Various sites allow users to give various tags to software systems. In this study, we would like to complement the study by McMillan et al. by leveraging another source of information aside from API usage patterns, namely software tags. We have performed a user study involving several participants and the results show that collaborative tagging is a promising source of information useful for detecting similar software applications. I

    Interactive fault localization leveraging simple user feedback

    Get PDF
    NSF

    When and Why Your Code Starts to Smell Bad

    Full text link

    A Tutorial on Software Engineering Intelligence: Case Studies on Model-Driven Engineering

    Full text link
    Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/153783/1/MODELS_Tutorial__SEI___Copy_.pd

    Stack Overflow in Github: Any Snippets There?

    Full text link
    When programmers look for how to achieve certain programming tasks, Stack Overflow is a popular destination in search engine results. Over the years, Stack Overflow has accumulated an impressive knowledge base of snippets of code that are amply documented. We are interested in studying how programmers use these snippets of code in their projects. Can we find Stack Overflow snippets in real projects? When snippets are used, is this copy literal or does it suffer adaptations? And are these adaptations specializations required by the idiosyncrasies of the target artifact, or are they motivated by specific requirements of the programmer? The large-scale study presented on this paper analyzes 909k non-fork Python projects hosted on Github, which contain 290M function definitions, and 1.9M Python snippets captured in Stack Overflow. Results are presented as quantitative analysis of block-level code cloning intra and inter Stack Overflow and GitHub, and as an analysis of programming behaviors through the qualitative analysis of our findings.Comment: 14th International Conference on Mining Software Repositories, 11 page

    Contemporary Approach for Technical Reckoning Code Smells Detection using Textual Analysis

    Get PDF
    Software Designers should be aware of address design smells that can evident as results of design and decision. In a software project, technical debt needs to be repaid habitually to avoid its accretion. Large technical debt significantly degrades the quality of the software system and affects the productivity of the development team. In tremendous cases, when the accumulated technical reckoning becomes so enormous that it cannot be paid off to any further extent the product has to be abandoned. In this paper, we bridge the gap analyzing to what coverage abstract information, extracted using textual analysis techniques, can be used to identify smells in source code. The proposed textual-based move toward for detecting smells in source code, fabricated as TACO (Textual Analysis for Code smell detection), has been instantiated for detecting the long parameter list smell and has been evaluated on three sampling Java open source projects. The results determined that TACO is able to indentified between 50% and 77% of the smell instances with a exactitude ranging between 63% and 67%. In addition, the results show that TACO identifies smells that are not recognized by approaches based on exclusively structural information

    Using Word Embedding and Convolution Neural Network for Bug Triaging by Considering Design Flaws

    Full text link
    Resolving bugs in the maintenance phase of software is a complicated task. Bug assignment is one of the main tasks for resolving bugs. Some Bugs cannot be fixed properly without making design decisions and have to be assigned to designers, rather than programmers, to avoid emerging bad smells that may cause subsequent bug reports. Hence, it is important to refer some bugs to the designer to check the possible design flaws. Based on our best knowledge, there are a few works that have considered referring bugs to designers. Hence, this issue is considered in this work. In this paper, a dataset is created, and a CNN-based model is proposed to predict the need for assigning a bug to a designer by learning the peculiarities of bug reports effective in creating bad smells in the code. The features of each bug are extracted from CNN based on its textual features, such as a summary and description. The number of bad samples added to it in the fixing process using the PMD tool determines the bug tag. The summary and description of the new bug are given to the model and the model predicts the need to refer to the designer. The accuracy of 75% (or more) was achieved for datasets with a sufficient number of samples for deep learning-based model training. A model is proposed to predict bug referrals to the designer. The efficiency of the model in predicting referrals to the designer at the time of receiving the bug report was demonstrated by testing the model on 10 projects

    Identification-method research for open-source software ecosystems

    Get PDF
    In recent years, open-source software (OSS) development has grown, with many developers around the world working on different OSS projects. A variety of open-source software ecosystems have emerged, for instance, GitHub, StackOverflow, and SourceForge. One of the most typical social-programming and code-hosting sites, GitHub, has amassed numerous open-source-software projects and developers in the same virtual collaboration platform. Since GitHub itself is a large open-source community, it hosts a collection of software projects that are developed together and coevolve. The great challenge here is how to identify the relationship between these projects, i.e., project relevance. Software-ecosystem identification is the basis of other studies in the ecosystem. Therefore, how to extract useful information in GitHub and identify software ecosystems is particularly important, and it is also a research area in symmetry. In this paper, a Topic-based Project Knowledge Metrics Framework (TPKMF) is proposed. By collecting the multisource dataset of an open-source ecosystem, project-relevance analysis of the open-source software is carried out on the basis of software-ecosystem identification. Then, we used our Spectral Clustering algorithm based on Core Project (CP-SC) to identify software-ecosystem projects and further identify software ecosystems. We verified that most software ecosystems usually contain a core software project, and most other projects are associated with it. Furthermore, we analyzed the characteristics of the ecosystem, and we also found that interactive information has greater impact on project relevance. Finally, we summarize the Topic-based Project Knowledge Metrics Framework

    MuDelta: Delta-Oriented Mutation Testing at Commit Time

    Get PDF
    To effectively test program changes using mutation testing, one needs to use mutants that are relevant to the altered program behaviours. In view of this, we introduce MuDelta, an approach that identifies commit-relevant mutants; mutants that affect and are affected by the changed program behaviours. Our approach uses machine learning applied on a combined scheme of graph and vector-based representations of static code features. Our results, from 50 commits in 21 Coreutils programs, demonstrate a strong prediction ability of our approach; yielding 0.80 (ROC) and 0.50 (PR Curve) AUC values with 0.63 and 0.32 precision and recall values. These predictions are significantly higher than random guesses, 0.20 (PR-Curve) AUC, 0.21 and 0.21 precision and recall, and subsequently lead to strong relevant tests that kill 45%more relevant mutants than randomly sampled mutants (either sampled from those residing on the changed component(s) or from the changed lines). Our results also show that MuDelta selects mutants with 27% higher fault revealing ability in fault introducing commits. Taken together, our results corroborate the conclusion that commit-based mutation testing is suitable and promising for evolving software

    When would this bug get reported?

    Get PDF
    Abstract—Not all bugs in software would be experienced and reported by end users right away: Some bugs manifest themselves quickly and may be reported by users a few days after they get into the code base; others manifest many months or even years later, and may only be experienced and reported by a small number of users. We refer to the period of time between the time when a bug is introduced into code and the time when it is reported by a user as bug reporting latency. Knowledge of bug reporting latencies has an implication on prioritization of bug fixing activities—bugs with low reporting latencies may be fixed earlier than those with high latencies to shift debugging resources towards bugs highly concerning users. To investigate bug reporting latencies, we analyze bugs from three Java software systems: AspectJ, Rhino, and Lucene. We extract bug reporting data from their version control repositories and bug tracking systems, identify bug locations based on bug fixes, and back-trace bug introducing time based on change histories of the buggy code. Also, we remove nonessential changes, and most importantly, recover root causes of bugs from their treatments/fixes. We then calculate the bug reporting latencies, and find that bugs have diverse reporting latencies. Based on the calculated reporting latencies and features we extract from bugs, we build classification models that can predict whether a bug would be reported early (within 30 days) or later, which may be helpful for prioritizing bug fixing activities. Our evaluation on the three software systems shows that our bug reporting latency prediction models could achieve an AUC (Area Under the Receiving Operating Characteristics Curve) of 70.869%. I
    • …
    corecore