1,186 research outputs found

    Data Mining and Machine Learning for Software Engineering

    Get PDF
    Software engineering is one of the most utilizable research areas for data mining. Developers have attempted to improve software quality by mining and analyzing software data. In any phase of software development life cycle (SDLC), while huge amount of data is produced, some design, security, or software problems may occur. In the early phases of software development, analyzing software data helps to handle these problems and lead to more accurate and timely delivery of software projects. Various data mining and machine learning studies have been conducted to deal with software engineering tasks such as defect prediction, effort estimation, etc. This study shows the open issues and presents related solutions and recommendations in software engineering, applying data mining and machine learning techniques

    Cyber Security

    Get PDF
    This open access book constitutes the refereed proceedings of the 16th International Annual Conference on Cyber Security, CNCERT 2020, held in Beijing, China, in August 2020. The 17 papers presented were carefully reviewed and selected from 58 submissions. The papers are organized according to the following topical sections: access control; cryptography; denial-of-service attacks; hardware security implementation; intrusion/anomaly detection and malware mitigation; social network security and privacy; systems security

    Statistical Analysis for Revealing Defects in Software Projects

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Information Systems and Technologies ManagementDefect detection in software is the procedure to identify parts of software that may comprise defects. Software companies always seek to improve the performance of software projects in terms of quality and efficiency. They also seek to deliver the soft-ware projects without any defects to the communities and just in time. The early revelation of defects in software projects is also tried to avoid failure of those projects, save costs, team effort, and time. Therefore, these companies need to build an intelligent model capable of detecting software defects accurately and efficiently. This study seeks to achieve two main objectives. The first goal is to build a statistical model to identify the critical defect factors that influence software projects. The second objective is to build a statistical model to reveal defects early in software pro-jects as reasonable accurately. A bibliometric map (VOSviewer) was used to find the relationships between the common terms in those domains. The results of this study are divided into three parts: In the first part The term "software engineering" is connected to "cluster," "regression," and "neural network." Moreover, the terms "random forest" and "feature selection" are connected to "neural network," "recall," and "software engineering," "cluster," "regression," and "fault prediction model" and "software defect prediction" and "defect density." In the second part We have checked and analyzed 29 manuscripts in detail, summarized their major contributions, and identified a few research gaps. In the third part Finally, software companies try to find the critical factors that affect the detection of software defects and find any of the intelligent or statistical methods that help to build a model capable of detecting those defects with high accuracy. Two statistical models (Multiple linear regression (MLR) and logistic regression (LR)) were used to find the critical factors and through them to detect software defects accurately. MLR is executed by using two methods which are critical defect factors (CDF) and premier list of software defect factors (PLSDF). The accuracy of MLR-CDF and MLR-PLSDF is 82.3 and 79.9 respectively. The standard error of MLR-CDF and MLR-PLSDF is 26% and 28% respectively. In addition, LR is executed by using two methods which are CDF and PLSDF. The accuracy of LR-CDF and LR-PLSDF is 86.4 and 83.8 respectively. The standard error of LR-CDF and LR-PLSDF is 22% and 25% respectively. Therefore, LRCDF outperforms on all the proposed models and state-of-the-art methods in terms of accuracy and standard error

    Software defect prediction using maximal information coefficient and fast correlation-based filter feature selection

    Get PDF
    Software quality ensures that applications that are developed are failure free. Some modern systems are intricate, due to the complexity of their information processes. Software fault prediction is an important quality assurance activity, since it is a mechanism that correctly predicts the defect proneness of modules and classifies modules that saves resources, time and developers’ efforts. In this study, a model that selects relevant features that can be used in defect prediction was proposed. The literature was reviewed and it revealed that process metrics are better predictors of defects in version systems and are based on historic source code over time. These metrics are extracted from the source-code module and include, for example, the number of additions and deletions from the source code, the number of distinct committers and the number of modified lines. In this research, defect prediction was conducted using open source software (OSS) of software product line(s) (SPL), hence process metrics were chosen. Data sets that are used in defect prediction may contain non-significant and redundant attributes that may affect the accuracy of machine-learning algorithms. In order to improve the prediction accuracy of classification models, features that are significant in the defect prediction process are utilised. In machine learning, feature selection techniques are applied in the identification of the relevant data. Feature selection is a pre-processing step that helps to reduce the dimensionality of data in machine learning. Feature selection techniques include information theoretic methods that are based on the entropy concept. This study experimented the efficiency of the feature selection techniques. It was realised that software defect prediction using significant attributes improves the prediction accuracy. A novel MICFastCR model, which is based on the Maximal Information Coefficient (MIC) was developed to select significant attributes and Fast Correlation Based Filter (FCBF) to eliminate redundant attributes. Machine learning algorithms were then run to predict software defects. The MICFastCR achieved the highest prediction accuracy as reported by various performance measures.School of ComputingPh. D. (Computer Science

    Design of software-oriented technician for vehicle’s fault system prediction using AdaBoost and random forest classifiers

    Get PDF
    Detecting and isolating faults on heavy duty vehicles is very important because it helps maintain high vehicle performance, low emissions, fuel economy, high vehicle safety and ensures repair and service efficiency. These factors are important because they help reduce the overall life cycle cost of a vehicle. The aim of this paper is to deliver a Web application model which aids the professional technician or vehicle user with basic automobile knowledge to access the working condition of the vehicles and detect the fault subsystem in the vehicles. The scope of this system is to visualize the data acquired from vehicle, diagnosis the fault component using trained fault model obtained from improvised Machine Learning (ML) classifiers and generate a report. The visualization page is built with plotly python package and prepared with selected parameter from On-board Diagnosis (OBD) tool data. The Histogram data is pre-processed with techniques such as null value Imputation techniques, Standardization and Balancing methods in order to increase the quality of training and it is trained with Classifiers. Finally, Classifier is tested and the Performance Metrics such as Accuracy, Precision, Re-call and F1 measure which are calculated from the Confusion Matrix. The proposed methodology for fault model prediction uses supervised algorithms such as Random Forest (RF), Ensemble Algorithm like AdaBoost Algorithm which offer reasonable Accuracy and Recall. The Python package joblib is used to save the model weights and reduce the computational time. Google Colabs is used as the python environment as it offers versatile features and PyCharm is utilised for the development of Web application. Hence, the Web application, outcome of this proposed work can, not only serve as the perfect companion to minimize the cost of time and money involved in unnecessary checks done for fault system detection but also aids to quickly detect and isolate the faulty system to avoid the propagation of errors that can lead to more dangerous cases

    Statistical Analysis for Revealing Defects in Software Projects: Systematic Literature Review

    Get PDF
    Mahmoud, A. N., & Santos, V. (2021). Statistical Analysis for Revealing Defects in Software Projects: Systematic Literature Review. International Journal of Advanced Computer Science and Applications, 12(11), 237-249. https://doi.org/10.14569/IJACSA.2021.0121128Defect detection in software is the procedure to identify parts of software that may comprise defects. Software companies always seek to improve the performance of software projects in terms of quality and efficiency. They also seek to deliver the soft-ware projects without any defects to the communities and just in time. The early revelation of defects in software projects is also tried to avoid failure of those projects, save costs, team effort, and time. Therefore, these companies need to build an intelligent model capable of detecting software defects accurately and efficiently. The paper is organized as follows. Section 2 presents the materials and methods, PRISMA, search questions, and search strategy. Section 3 presents the results with an analysis, and discussion, visualizing analysis and analysis per topic. Section 4 presents the methodology. Finally, in Section 5, the conclusion is discussed. The search string was applied to all electronic repositories looking for papers published between 2015 and 2021, which resulted in 627 publications. The results focused on finding three important points by linking the results of manuscript analysis and linking them to the results of the bibliometric analysis. First, the results showed that the number of defects and the number of lines of code are among the most important factors used in revealing software defects. Second, neural networks and regression analysis are among the most important smart and statistical methods used for this purpose. Finally, the accuracy metric and the error rate are among the most important metrics used in comparisons between the efficiency of statistical and intelligent models.publishersversionpublishe

    Detection of Non-Technical Losses: The Project MIDAS

    Get PDF
    The MIDAS project began in 2006 as collaboration between Endesa, Sadiel, and the University of Seville. The objective of the MIDAS project is the detection of Non-Technical Losses (NTLs) on power utilities. The NTLs represent the non-billed energy due to faults or illegal manipulations in clients’ fa cilities. Initially, research lines study the application of techniques of data mining and neural networks. After several researches, the studies are expanded to other research fields: expert systems, text mining, statistical techniques, pattern recognition, etc. These techniques have provided an automated system for detection of NTLs on company databases. This system is in the test phase, and it is applied in real cases in company databases
    • …
    corecore