480 research outputs found

    Semi-supervised and Active Learning Models for Software Fault Prediction

    Get PDF
    As software continues to insinuate itself into nearly every aspect of our life, the quality of software has been an extremely important issue. Software Quality Assurance (SQA) is a process that ensures the development of high-quality software. It concerns the important problem of maintaining, monitoring, and developing quality software. Accurate detection of fault prone components in software projects is one of the most commonly practiced techniques that offer the path to high quality products without excessive assurance expenditures. This type of quality modeling requires the availability of software modules with known fault content developed in similar environment. However, collection of fault data at module level, particularly in new projects, is expensive and time-consuming. Semi-supervised learning and active learning offer solutions to this problem for learning from limited labeled data by utilizing inexpensive unlabeled data.;In this dissertation, we investigate semi-supervised learning and active learning approaches in the software fault prediction problem. The role of base learner in semi-supervised learning is discussed using several state-of-the-art supervised learners. Our results showed that semi-supervised learning with appropriate base learner leads to better performance in fault proneness prediction compared to supervised learning. In addition, incorporating pre-processing technique prior to semi-supervised learning provides a promising direction to further improving the prediction performance. Active learning, sharing the similar idea as semi-supervised learning in utilizing unlabeled data, requires human efforts for labeling fault proneness in its learning process. Empirical results showed that active learning supplemented by dimensionality reduction technique performs better than the supervised learning on release-based data sets

    A fault detection strategy for software projects

    Get PDF
    Postojeći modeli predviđanja pogrešaka softvera zahtijevaju metrike i podatke o pogreškama koji pripadaju prethodnim verzijama softvera ili sličnim projektima softvera. Međutim, postoje slučajevi kada prethodni podaci o pogreškama nisu prisutni, kao što je prelazak softverske tvrtke u novo projektno područje. U takvim situacijama, nadzorne metode učenja pomoću označavanja pogreške se ne mogu primijeniti, što dovodi do potrebe za novim tehnikama. Mi smo predložili strategiju predviđanja pogrešaka softvera uporabom razinske metode mjernih pragova za predviđanje sklonosti pogreškama neoznačenih programskih modula. Ova tehnika je eksperimentalno ocijenjena na NASA setovima podataka, KC2 i JM1. Neki postojeći pristupi primjenjuju nekoliko klasterskih tehnika kazetnog modula, proces popraćen fazom procjene. Ovu procjenu obavlja stručnjak za kvalitetu softvera, koji analizira svakog predstavnika pojedinog klastera, a zatim označava module kao pogreški-naklonjene ili pogreški-nenaklonjene. Naš pristup ne zahtijeva čovjeka kao stručnjaka tijekom predviđanja procesa. To je strategija predviđanja pogreške, koja kombinira razinsku metodu mjernih pragova kao mehanizma za filtriranje i ILI operatora kao sastavni mehanizam.The existing software fault prediction models require metrics and fault data belonging to previous software versions or similar software projects. However, there are cases when previous fault data are not present, such as a software company’s transition to a new project domain. In this kind of situations, supervised learning methods using fault labels cannot be applied, leading to the need for new techniques. We proposed a software fault prediction strategy using method-level metrics thresholds to predict the fault-proneness of unlabelled program modules. This technique was experimentally evaluated on NASA datasets, KC2 and JM1. Some existing approaches implement several clustering techniques to cluster modules, process followed by an evaluation phase. This evaluation is performed by a software quality expert, who analyses every representative of each cluster and then labels the modules as fault-prone or not fault-prone. Our approach does not require a human expert during the prediction process. It is a fault prediction strategy, which combines a method-level metrics thresholds as filtering mechanism and an OR operator as a composition mechanism

    A fault detection strategy for software projects

    Get PDF
    Postojeći modeli predviđanja pogrešaka softvera zahtijevaju metrike i podatke o pogreškama koji pripadaju prethodnim verzijama softvera ili sličnim projektima softvera. Međutim, postoje slučajevi kada prethodni podaci o pogreškama nisu prisutni, kao što je prelazak softverske tvrtke u novo projektno područje. U takvim situacijama, nadzorne metode učenja pomoću označavanja pogreške se ne mogu primijeniti, što dovodi do potrebe za novim tehnikama. Mi smo predložili strategiju predviđanja pogrešaka softvera uporabom razinske metode mjernih pragova za predviđanje sklonosti pogreškama neoznačenih programskih modula. Ova tehnika je eksperimentalno ocijenjena na NASA setovima podataka, KC2 i JM1. Neki postojeći pristupi primjenjuju nekoliko klasterskih tehnika kazetnog modula, proces popraćen fazom procjene. Ovu procjenu obavlja stručnjak za kvalitetu softvera, koji analizira svakog predstavnika pojedinog klastera, a zatim označava module kao pogreški-naklonjene ili pogreški-nenaklonjene. Naš pristup ne zahtijeva čovjeka kao stručnjaka tijekom predviđanja procesa. To je strategija predviđanja pogreške, koja kombinira razinsku metodu mjernih pragova kao mehanizma za filtriranje i ILI operatora kao sastavni mehanizam.The existing software fault prediction models require metrics and fault data belonging to previous software versions or similar software projects. However, there are cases when previous fault data are not present, such as a software company’s transition to a new project domain. In this kind of situations, supervised learning methods using fault labels cannot be applied, leading to the need for new techniques. We proposed a software fault prediction strategy using method-level metrics thresholds to predict the fault-proneness of unlabelled program modules. This technique was experimentally evaluated on NASA datasets, KC2 and JM1. Some existing approaches implement several clustering techniques to cluster modules, process followed by an evaluation phase. This evaluation is performed by a software quality expert, who analyses every representative of each cluster and then labels the modules as fault-prone or not fault-prone. Our approach does not require a human expert during the prediction process. It is a fault prediction strategy, which combines a method-level metrics thresholds as filtering mechanism and an OR operator as a composition mechanism

    A systematic review of unsupervised learning techniques for software defect prediction

    Get PDF
    National Key Basic Research Program of China [2018YFB1004401]; the National Natural Science Foundation of China [61972317, 61402370]

    Improved point center algorithm for K-Means clustering to increase software defect prediction

    Get PDF
    The k-means is a clustering algorithm that is often and easy to use. This algorithm is susceptible to randomly chosen centroid points so that it cannot produce optimal results. This research aimed to improve the k-means algorithm’s performance by applying a proposed algorithm called point center. The proposed algorithm overcame the random centroid value in k-means and then applied it to predict software defects modules’ errors. The point center algorithm was proposed to determine the initial centroid value for the k-means algorithm optimization. Then, the selection of X and Y variables determined the cluster center members. The ten datasets were used to perform the testing, of which nine datasets were used for predicting software defects. The proposed center point algorithm showed the lowest errors. It also improved the k-means algorithm’s performance by an average of 12.82% cluster errors in the software compared to the centroid value obtained randomly on the simple k-means algorithm. The findings are beneficial and contribute to developing a clustering model to handle data, such as to predict software defect modules more accurately

    Cluster analysis on radio product integration testing faults

    Get PDF
    Abstract. Nowadays, when the different software systems keep getting larger and more complex, integration testing is necessary to ensure that the different components of the system work together correctly. With the large and complex systems the analysis of the test faults can be difficult, as there are so many components that can cause the failure. Also with the increased usage of automated tests, the faults can often be caused by test environment or test automation issues. Testing data and logs collected during the test executions are usually the main source of information that are used for test fault analysis. With the usage of text mining, natural language processing and machine learning methods, the fault analysis process is possible to be automated using the data and logs collected from the tests, as multiple studies have shown in the recent years. In this thesis, an exploratory data study is done on data collected from radio product integration tests done at Nokia. Cluster analysis is used to find the different fault types that can be found from each of the collected file types. Different feature extraction methods are used and evaluated in terms of how well they separate the data for fault analysis. The study done on this thesis paves the way for automated fault analysis in the future. The introduced methods can be applied for classifying the faults and the results and findings can be used to determine what are the next steps that can be taken to enable future implementations for automated fault analysis applications.Radiotuotteiden integraatiotestauksen vikojen klusterianalyysi. Tiivistelmä. Nykypäivänä, kun erilaiset ohjelmistojärjestelmät jatkavat kasvamista ja muuttuvat monimutkaisimmaksi, integraatiotestaus on välttämätöntä, jotta voidaan varmistua siitä, että järjestelmän eri komponentit toimivat yhdessä oikein. Suurien ja monimutkaisten järjestelmien testivikojen analysointi voi olla vaikeaa, koska järjestelmissä on mukana niin monta komponenttia, jotka voivat aiheuttaa testien epäonnistumisen. Testien automatisoinnin lisääntymisen myötä testit voivat usein epäonnistua myös johtuen testiympäristön tai testiautomaation ongelmista. Testien aikana kerätty testidata ja testilogit ovat yleensä tärkein tiedonlähde testivikojen analyysissä. Hyödyntämällä tekstinlouhinnan, luonnollisen kielen käsittelyn sekä koneoppimisen menetelmiä, testivikojen analyysiprosessi on mahdollista automatisoida käyttämällä testien aikana kerättyä testidataa ja testilogeja, kuten monet tutkimukset ovat viime vuosina osoittaneet. Tässä tutkielmassa tehdään eksploratiivinen tutkimus Nokian radiotuotteiden integraatiotesteistä kerätyllä datalla. Erilaiset vikatyypit, jotka voidaan löytää kustakin kerätystä tiedostotyypistä, löydetään käyttämällä klusterianalyysiä. Ominaisuusvektorien laskentaan käytetään eri menetelmiä ja näiden menetelmien kykyä erotella dataa vika-analyysin näkökulmasta arvioidaan. Tutkielmassa tehty tutkimus avaa tietä vika-analyysien automatisoinnille tulevaisuudessa. Esitettyjä menetelmiä voidaan käyttää vikojen luokittelussa ja tuloksien perusteella voidaan määritellä, mitkä ovat seuraavia askelia, jotta vika-analyysiprosessia voidaan automatisoida tulevaisuudessa

    Detection and Classification of Anomalies in Railway Tracks

    Get PDF
    Em Portugal, existe uma grande afluência dos transportes ferroviários. Acontece que as empresas que providenciam esses serviços por vezes necessitam de efetuar manutenção às vias-férreas/infraestruturas, o que leva à indisponibilização e/ou atraso dos serviços e máquinas, e consequentemente perdas monetárias. Assim sendo, torna-se necessário preparar um plano de manutenção e prever quando será fundamental efetuar manutenções, de forma a minimizar perdas. Através de um sistema de manutenção preditivo, é possível efetuar a manutenção apenas quando esta é necessária. Este tipo de sistema monitoriza continuamente máquinas e/ou processos, permitindo determinar quando a manutenção deverá existir. Uma das formas de fazer esta análise é treinar algoritmos de machine learning com uma grande quantidade de dados provenientes das máquinas e/ou processos. Nesta dissertação, o objetivo é contribuir para o desenvolvimento de um sistema de manutenção preditivo nas vias-férreas. O contributo específico será detetar e classificar anomalias. Para tal, recorrem-se a técnicas de Machine Learning e Deep Learning, mais concretamente algoritmos não supervisionados e semi-supervisionados, pois o conjunto de dados fornecido possui um número reduzido de anomalias. A escolha dos algoritmos é feita com base naquilo que atualmente é mais utilizado e apresenta melhores resultados. Assim sendo, o primeiro passo da dissertação consistiu em investigar quais as implementações mais comuns para detetar e classificar anomalias em sistemas de manutenção preditivos. Após a investigação, foram treinados os algoritmos que à primeira vista seriam capazes de se adaptar ao cenário apresentado, procurando encontrar os melhores hiperparâmetros para os mesmos. Chegou-se à conclusão, através da comparação da performance, que o mais enquadrado para abordar o problema da identificação das anomalias seria uma rede neuronal artifical Autoencoder. Através dos resultados deste modelo, foi possível definir thresholds para efetuar posteriormente a classificação da anomalia.In Portugal, the railway tracks commonly require maintenance, which leads to a stop/delay of the services, and consequently monetary losses and the non-full use of the equipment. With the use of a Predictive Maintenance System, these problems can be minimized, since these systems continuously monitor the machines and/or processes and determine when maintenance is required. Predictive Maintenance systems can be put together with machine and/or deep learning algorithms since they can be trained with high volumes of historical data and provide diagnosis, detect and classify anomalies, and estimate the lifetime of a machine/process. This dissertation contributes to developing a predictive maintenance system for railway tracks/infrastructure. The main objectives are to detect and classify anomalies in the railway track. To achieve this, unsupervised and semi-supervised algorithms are tested and tuned to determine the one that best adapts to the presented scenario. The algorithms need to be unsupervised and semi-supervised given the few anomalous labels in the dataset
    corecore