1,100 research outputs found

    On the Use of Process Trails to Understand Software Development

    Full text link

    Visualization and analysis of software clones

    Get PDF
    Code clones are identical or similar fragments of code in a software system. Simple copy-paste programming practices of developers, reusing existing code fragments instead of implementing from the scratch, limitations of both programming languages and developers are the primary reasons behind code cloning. Despite the maintenance implications of clones, it is not possible to conclude that cloning is harmful because there are also benefits in using them (e.g. faster and independent development). As a result, researchers at least agree that clones need to be analyzed before aggressively refactoring them. Although a large number of state-of-the-art clone detectors are available today, handling raw clone data is challenging due to the textual nature and large volume. To address this issue, we propose a framework for large-scale clone analysis and develop a maintenance support environment based on the framework called VisCad. To manage the large volume of clone data, VisCad employs the Visual Information Seeking Mantra: overview first, zoom and filter, then provide details-on-demand. With VisCad users can analyze and identify distinctive code clones through a set of visualization techniques, metrics covering different clone relations and data filtering operations. The loosely coupled architecture of VisCad allows users to work with any clone detection tool that reports source-coordinates of the found clones. This yields the opportunity to work with the clone detectors of choice, which is important because each clone detector has its own strengths and weaknesses. In addition, we extend the support for clone evolution analysis, which is important to understand the cause and effect of changes at the clone level during the evolution of a software system. Such information can be used to make software maintenance decisions like when to refactor clones. We propose and implement a set of visualizations that can allow users to analyze the evolution of clones from a coarse grain to a fine grain level. Finally, we use VisCad to extract both spatial and temporal clone data to predict changes to clones in a future release/revision of the software, which can be used to rank clone classes as another means of handling a large volume of clone data. We believe that VisCad makes clone comprehension easier and it can be used as a test-bed to further explore code cloning, necessary in building a successful clone management system

    Machine Learning And Deep Learning Based Approaches For Detecting Duplicate Bug Reports With Stack Traces

    Get PDF
    Many large software systems rely on bug tracking systems to record the submitted bug reports and to track and manage bugs. Handling bug reports is known to be a challenging task, especially in software organizations with a large client base, which tend to receive a considerable large number of bug reports a day. Fortunately, not all reported bugs are new; many are similar or identical to previously reported bugs, also called duplicate bug reports. Automatic detection of duplicate bug reports is an important research topic to help reduce the time and effort spent by triaging and development teams on sorting and fixing bugs. This explains the recent increase in attention to this topic as evidenced by the number of tools and algorithms that have been proposed in academia and industry. The objective is to automatically detect duplicate bug reports as soon as they arrive into the system. To do so, existing techniques rely heavily on the nature of bug report data they operate on. This includes both structural information such as OS, product version, time and date of the crash, and stack traces, as well as unstructured information such as bug report summaries and descriptions written in natural language by end users and developers

    Mining System Specific Rules from Change Patterns

    Get PDF
    International audienceA significant percentage of warnings reported by tools to detect coding standard violations are false positives. Thus, there are some works dedicated to provide better rules by mining them from source code history, analyzing bug-fixes or changes between system releases. However, software evolves over time, and during development not only bugs are fixed, but also features are added, and code is refactored. In such cases, changes must be consistently applied in source code to avoid maintenance problems. In this paper, we propose to extract system specific rules by mining systematic changes over source code history, i.e., not just from bug-fixes or system releases, to ensure that changes are consistently applied over source code. We focus on structural changes done to support API modification or evolution with the goal of providing better rules to developers. Also, rules are mined from predefined rule patterns that ensure their quality. In order to assess the precision of such specific rules to detect real violations, we compare them with generic rules provided by tools to detect coding standard violations on four real world systems covering two programming languages. The results show that specific rules are more precise in identifying real violations in source code than generic ones, and thus can complement them

    PrAIoritize: Learning to Prioritize Smart Contract Bugs and Vulnerabilities

    Full text link
    Smart contract vulnerabilities and bugs have become a key concern for software engineers, as they can lead to significant financial losses, reputational damage, and legal issues. Therefore, prioritizing bug fixing for smart contracts is critical to maintaining trust. Due to the lack of tracking tools, prioritizing smart contract-reported bugs is done manually, which is a tedious task, limits bug triaging, and needs specialized knowledge. Towards this end, we propose PrAIoritize; an automated approach for predicting smart contract bug priorities that assist software engineers in prioritizing highly urgent bug reports. PrAIoritize consists of two main phases: 1) automatic labeling, which involves the automatic construction of a smart contract keyword lexicon and the automatic assignment of priority levels to unlabeled bug reports; 2) model construction, which involves feature engineering and designs layers of feed-forward neural networks (FFNNs) and bidirectional long short-term memory (BiLSTM) with multi-class classification to better capture the features of the textual descriptions of bugs and predict their priority levels. The model then is trained using smart contract bug reports collected from two data sources: open-source software (OSS) projects available on GitHub and NVD vulnerability database. Our evaluation demonstrates significant improvement over state-of-the-art baselines and commonly used pre-trained models (e.g. BERT) for similar classification tasks, with 5.75%-35.29% increase in F-measure, precision, and recall

    Automated Change Rule Inference for Distance-Based API Misuse Detection

    Full text link
    Developers build on Application Programming Interfaces (APIs) to reuse existing functionalities of code libraries. Despite the benefits of reusing established libraries (e.g., time savings, high quality), developers may diverge from the API's intended usage; potentially causing bugs or, more specifically, API misuses. Recent research focuses on developing techniques to automatically detect API misuses, but many suffer from a high false-positive rate. In this article, we improve on this situation by proposing ChaRLI (Change RuLe Inference), a technique for automatically inferring change rules from developers' fixes of API misuses based on API Usage Graphs (AUGs). By subsequently applying graph-distance algorithms, we use change rules to discriminate API misuses from correct usages. This allows developers to reuse others' fixes of an API misuse at other code locations in the same or another project. We evaluated the ability of change rules to detect API misuses based on three datasets and found that the best mean relative precision (i.e., for testable usages) ranges from 77.1 % to 96.1 % while the mean recall ranges from 0.007 % to 17.7 % for individual change rules. These results underpin that ChaRLI and our misuse detection are helpful complements to existing API misuse detectors
    corecore