229 research outputs found

    Data Mining and Machine Learning for Software Engineering

    Get PDF
    Software engineering is one of the most utilizable research areas for data mining. Developers have attempted to improve software quality by mining and analyzing software data. In any phase of software development life cycle (SDLC), while huge amount of data is produced, some design, security, or software problems may occur. In the early phases of software development, analyzing software data helps to handle these problems and lead to more accurate and timely delivery of software projects. Various data mining and machine learning studies have been conducted to deal with software engineering tasks such as defect prediction, effort estimation, etc. This study shows the open issues and presents related solutions and recommendations in software engineering, applying data mining and machine learning techniques

    Class-Level Refactoring Prediction by Ensemble Learning with Various Feature Selection Techniques

    Get PDF
    Background: Refactoring is changing a software system without affecting the software functionality. The current researchers aim i to identify the appropriate method(s) or class(s) that needs to be refactored in object-oriented software. Ensemble learning helps to reduce prediction errors by amalgamating different classifiers and their respective performances over the original feature data. Other motives are added in this paper regarding several ensemble learners, errors, sampling techniques, and feature selection techniques for refactoring prediction at the class level. Objective: This work aims to develop an ensemble-based refactoring prediction model with structural identification of source code metrics using different feature selection techniques and data sampling techniques to distribute the data uniformly. Our model finds the best classifier after achieving fewer errors during refactoring prediction at the class level. Methodology: At first, our proposed model extracts a total of 125 software metrics computed from object-oriented software systems processed for a robust multi-phased feature selection method encompassing Wilcoxon significant text, Pearson correlation test, and principal component analysis (PCA). The proposed multi-phased feature selection method retains the optimal features characterizing inheritance, size, coupling, cohesion, and complexity. After obtaining the optimal set of software metrics, a novel heterogeneous ensemble classifier is developed using techniques such as ANN-Gradient Descent, ANN-Levenberg Marquardt, ANN-GDX, ANN-Radial Basis Function; support vector machine with different kernel functions such as LSSVM-Linear, LSSVM-Polynomial, LSSVM-RBF, Decision Tree algorithm, Logistic Regression algorithm and extreme learning machine (ELM) model are used as the base classifier. In our paper, we have calculated four different errors i.e., Mean Absolute Error (MAE), Mean magnitude of Relative Error (MORE), Root Mean Square Error (RMSE), and Standard Error of Mean (SEM). Result: In our proposed model, the maximum voting ensemble (MVE) achieves better accuracy, recall, precision, and F-measure values (99.76, 99.93, 98.96, 98.44) as compared to the base trained ensemble (BTE) and it experiences less errors (MAE = 0.0057, MORE = 0.0701, RMSE = 0.0068, and SEM = 0.0107) during its implementation to develop the refactoring model. Conclusions: Our experimental result recommends that MVE with upsampling can be implemented to improve the performance of the refactoring prediction model at the class level. Furthermore, the performance of our model with different data sampling techniques and feature selection techniques has been shown in the form boxplot diagram of accuracy, F-measure, precision, recall, and area under the curve (AUC) parameters.publishedVersio

    Software module clustering: An in-depth literature analysis

    Get PDF
    Software module clustering is an unsupervised learning method used to cluster software entities (e.g., classes, modules, or files) with similar features. The obtained clusters may be used to study, analyze, and understand the software entities' structure and behavior. Implementing software module clustering with optimal results is challenging. Accordingly, researchers have addressed many aspects of software module clustering in the past decade. Thus, it is essential to present the research evidence that has been published in this area. In this study, 143 research papers from well-known literature databases that examined software module clustering were reviewed to extract useful data. The obtained data were then used to answer several research questions regarding state-of-the-art clustering approaches, applications of clustering in software engineering, clustering processes, clustering algorithms, and evaluation methods. Several research gaps and challenges in software module clustering are discussed in this paper to provide a useful reference for researchers in this field

    The 6th Conference of PhD Students in Computer Science

    Get PDF

    A Hybrid Model for Object-Oriented Software Maintenance

    Get PDF
    An object-oriented software system is composed of a collection of communicating objects that co-operate with one another to achieve some desired goals. The object is the basic unit of abstraction in an OO program; objects may model real-world entities or internal abstractions of the system. Similar objects forms classes, which encapsulate the data and operations performed on the data. Therefore, extracting, analyzing, and modelling classes/objects and their relationships is of key importance in understanding and maintaining object-oriented software systems. However, when dealing with large and complex object-oriented systems, maintainers can easily be overwhelmed by the vast number of classes/objects and the high degree of interdependencies among them. In this thesis, we propose a new model, which we call the Hybrid Model, to represent object-oriented systems at a coarse-grained level of abstraction. To promote the comprehensibility of objects as independent units, we group the complete static description of software objects into aggregate components. Each aggregate component logically represents a set of objects, and the components interact with one other through explicitly defined ports. We present and discuss several applications of the Hybrid Model in reverse engineering and software evolution. The Hybrid Model can be used to support a divide-and-conquer comprehension strategy for program comprehension. At a low level of abstraction, maintainers can focus on one aggregate-component at a time, while at a higher level, each aggregate component can be understood as a whole and be mapped to coarse-grained design abstractions, such as subsystems. Based on the new model, we further propose a set of dependency analysis methods. The analysis results reveal the external properties of aggregate components, and lead to better understand the nature of their interdependencies. In addition, we apply the new model in software evolution analysis. We identify a collection of change patterns in terms of changes in aggregate components and their interrelationships. These patterns help to interpret how an evolving system changes at the architectural level, and provides valuable information to understand why the system is designed as the way it is

    Four small puzzles that Rosetta doesn't solve

    Get PDF
    A complete macromolecule modeling package must be able to solve the simplest structure prediction problems. Despite recent successes in high resolution structure modeling and design, the Rosetta software suite fares poorly on deceptively small protein and RNA puzzles, some as small as four residues. To illustrate these problems, this manuscript presents extensive Rosetta results for four well-defined test cases: the 20-residue mini-protein Trp cage, an even smaller disulfide-stabilized conotoxin, the reactive loop of a serine protease inhibitor, and a UUCG RNA tetraloop. In contrast to previous Rosetta studies, several lines of evidence indicate that conformational sampling is not the major bottleneck in modeling these small systems. Instead, approximations and omissions in the Rosetta all-atom energy function currently preclude discriminating experimentally observed conformations from de novo models at atomic resolution. These molecular "puzzles" should serve as useful model systems for developers wishing to make foundational improvements to this powerful modeling suite.Comment: Published in PLoS One as a manuscript for the RosettaCon 2010 Special Collectio

    Évaluation de la cohésion des classes : une nouvelle approche basée sur la classification

    Get PDF

    An Exploratory Framework for Intelligent Labelling of Fault Datasets

    Get PDF
    Software fault prediction (SFP) has become a pivotal aspect in realm of software quality. Nevertheless, discipline of software quality suffers the starvation of fault datasets. Most of the research endeavors are focused on type of dataset, its granularity, metrics used and metrics extractors. However, sporadic attention has been exerted on development of fault datasets and their associated challenges. There are very few publicly available datasets limiting the possibilities of comprehensive experiments on way to improvising the quality of software. Current research targets to address the challenges pertinent to fault dataset collection and development if one is not available publicly. It also considers dynamic identification of available resources such as public dataset, open-source software archieves, metrics parsers and intelligent labeling techniques. A framework for dataset collection and development process has been furnished along with evaluation procedure for the identified resources
    • …
    corecore