5,260 research outputs found

    On the role of pre and post-processing in environmental data mining

    Get PDF
    The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed

    Insights on Research Techniques towards Cost Estimation in Software Design

    Get PDF
    Software cost estimation is of the most challenging task in project management in order to ensuring smoother development operation and target achievement. There has been evolution of various standards tools and techniques for cost estimation practiced in the industry at present times. However, it was never investigated about the overall picturization of effectiveness of such techniques till date. This paper initiates its contribution by presenting taxonomies of conventional cost-estimation techniques and then investigates the research trends towards frequently addressed problems in it. The paper also reviews the existing techniques in well-structured manner in order to highlight the problems addressed, techniques used, advantages associated and limitation explored from literatures. Finally, we also brief the explored open research issues as an added contribution to this manuscript

    A New Improved Prediction of Software Defects Using Machine Learning-based Boosting Techniques with NASA Dataset

    Get PDF
    Predicting when and where bugs will appear in software may assist improve quality and save on software testing expenses. Predicting bugs in individual modules of software by utilizing machine learning methods. There are, however, two major problems with the software defect prediction dataset: Social stratification (there are many fewer faulty modules than non-defective ones), and noisy characteristics (a result of irrelevant features) that make accurate predictions difficult. The performance of the machine learning model will suffer greatly if these two issues arise. Overfitting will occur, and biassed classification findings will be the end consequence. In this research, we suggest using machine learning approaches to enhance the usefulness of the CatBoost and Gradient Boost classifiers while predicting software flaws. Both the Random Over Sampler and Mutual info classification methods address the class imbalance and feature selection issues inherent in software fault prediction. Eleven datasets from NASA's data repository, "Promise," were utilised in this study. Using 10-fold cross-validation, we classified these 11 datasets and found that our suggested technique outperformed the baseline by a significant margin. The proposed methods have been evaluated based on their abilities to anticipate software defects using the most important indices available: Accuracy, Precision, Recall, F1 score, ROC values, RMSE, MSE, and MAE parameters. For all 11 datasets evaluated, the suggested methods outperform baseline classifiers by a significant margin. We tested our model to other methods of flaw identification and found that it outperformed them all. The computational detection rate of the suggested model is higher than that of conventional models, as shown by the experiments.

    What Causes My Test Alarm? Automatic Cause Analysis for Test Alarms in System and Integration Testing

    Full text link
    Driven by new software development processes and testing in clouds, system and integration testing nowadays tends to produce enormous number of alarms. Such test alarms lay an almost unbearable burden on software testing engineers who have to manually analyze the causes of these alarms. The causes are critical because they decide which stakeholders are responsible to fix the bugs detected during the testing. In this paper, we present a novel approach that aims to relieve the burden by automating the procedure. Our approach, called Cause Analysis Model, exploits information retrieval techniques to efficiently infer test alarm causes based on test logs. We have developed a prototype and evaluated our tool on two industrial datasets with more than 14,000 test alarms. Experiments on the two datasets show that our tool achieves an accuracy of 58.3% and 65.8%, respectively, which outperforms the baseline algorithms by up to 13.3%. Our algorithm is also extremely efficient, spending about 0.1s per cause analysis. Due to the attractive experimental results, our industrial partner, a leading information and communication technology company in the world, has deployed the tool and it achieves an average accuracy of 72% after two months of running, nearly three times more accurate than a previous strategy based on regular expressions.Comment: 12 page

    Dynamic Detection of Software Defects Using Supervised Learning Techniques

    Get PDF
    Software testing is the main step of detecting the faults in Software through executing it. Therefore, it is substantial to predict the faults that may happen while executing the software to maintain the existence of the software. There are different techniques of artificial intelligence that are utilized to predict future defects. The Machine learning is one of the most significant technique that used to build predicting models. In this paper, conducted a systematic review of the supervised machine learning techniques which are used for software defect prediction and evaluated the performance. Thus, using five state-of-the-art supervised machine learning (classifiers), for the evaluation, several of the data are used to predict software fault. In addition to, compared the performance of these classifiers with various parameters. After that, proceeds many experiments to improve the efficiency of the prediction of the defect through modifying the default parameters of the classifier. The results showed the ability of supervised machine learning algorithms to classify classes as bugs or not bugs. Thus, using supervised machine learning models for predicting software bugs is better than the traditional statistical models. Additionally, using PCA never noticeable impact on prediction systems performance while modifying the default parameters positively impact classifier values, especially with Artificial Neural Network (ANN).The main finding of this paper is gained through the application of Ensemble Learning methods, whereas Bagging achieves 95.1% accuracy with Mozilla dataset and Voting achieves 93.79% accuracy with kc1 dataset
    corecore