480 research outputs found
Semi-supervised and Active Learning Models for Software Fault Prediction
As software continues to insinuate itself into nearly every aspect of our life, the quality of software has been an extremely important issue. Software Quality Assurance (SQA) is a process that ensures the development of high-quality software. It concerns the important problem of maintaining, monitoring, and developing quality software. Accurate detection of fault prone components in software projects is one of the most commonly practiced techniques that offer the path to high quality products without excessive assurance expenditures. This type of quality modeling requires the availability of software modules with known fault content developed in similar environment. However, collection of fault data at module level, particularly in new projects, is expensive and time-consuming. Semi-supervised learning and active learning offer solutions to this problem for learning from limited labeled data by utilizing inexpensive unlabeled data.;In this dissertation, we investigate semi-supervised learning and active learning approaches in the software fault prediction problem. The role of base learner in semi-supervised learning is discussed using several state-of-the-art supervised learners. Our results showed that semi-supervised learning with appropriate base learner leads to better performance in fault proneness prediction compared to supervised learning. In addition, incorporating pre-processing technique prior to semi-supervised learning provides a promising direction to further improving the prediction performance. Active learning, sharing the similar idea as semi-supervised learning in utilizing unlabeled data, requires human efforts for labeling fault proneness in its learning process. Empirical results showed that active learning supplemented by dimensionality reduction technique performs better than the supervised learning on release-based data sets
A fault detection strategy for software projects
Postojeći modeli predviđanja pogrešaka softvera zahtijevaju metrike i podatke o pogreškama koji pripadaju prethodnim verzijama softvera ili sličnim projektima softvera. Međutim, postoje slučajevi kada prethodni podaci o pogreškama nisu prisutni, kao što je prelazak softverske tvrtke u novo projektno područje. U takvim situacijama, nadzorne metode učenja pomoću označavanja pogreške se ne mogu primijeniti, što dovodi do potrebe za novim tehnikama. Mi smo predložili strategiju predviđanja pogrešaka softvera uporabom razinske metode mjernih pragova za predviđanje sklonosti pogreškama neoznačenih programskih modula. Ova tehnika je eksperimentalno ocijenjena na NASA setovima podataka, KC2 i JM1. Neki postojeći pristupi primjenjuju nekoliko klasterskih tehnika kazetnog modula, proces popraćen fazom procjene. Ovu procjenu obavlja stručnjak za kvalitetu softvera, koji analizira svakog predstavnika pojedinog klastera, a zatim označava module kao pogreški-naklonjene ili pogreški-nenaklonjene. Naš pristup ne zahtijeva čovjeka kao stručnjaka tijekom predviđanja procesa. To je strategija predviđanja pogreške, koja kombinira razinsku metodu mjernih pragova kao mehanizma za filtriranje i ILI operatora kao sastavni mehanizam.The existing software fault prediction models require metrics and fault data belonging to previous software versions or similar software projects. However, there are cases when previous fault data are not present, such as a software company’s transition to a new project domain. In this kind of situations, supervised learning methods using fault labels cannot be applied, leading to the need for new techniques. We proposed a software fault prediction strategy using method-level metrics thresholds to predict the fault-proneness of unlabelled program modules. This technique was experimentally evaluated on NASA datasets, KC2 and JM1. Some existing approaches implement several clustering techniques to cluster modules, process followed by an evaluation phase. This evaluation is performed by a software quality expert, who analyses every representative of each cluster and then labels the modules as fault-prone or not fault-prone. Our approach does not require a human expert during the prediction process. It is a fault prediction strategy, which combines a method-level metrics thresholds as filtering mechanism and an OR operator as a composition mechanism
A fault detection strategy for software projects
Postojeći modeli predviđanja pogrešaka softvera zahtijevaju metrike i podatke o pogreškama koji pripadaju prethodnim verzijama softvera ili sličnim projektima softvera. Međutim, postoje slučajevi kada prethodni podaci o pogreškama nisu prisutni, kao što je prelazak softverske tvrtke u novo projektno područje. U takvim situacijama, nadzorne metode učenja pomoću označavanja pogreške se ne mogu primijeniti, što dovodi do potrebe za novim tehnikama. Mi smo predložili strategiju predviđanja pogrešaka softvera uporabom razinske metode mjernih pragova za predviđanje sklonosti pogreškama neoznačenih programskih modula. Ova tehnika je eksperimentalno ocijenjena na NASA setovima podataka, KC2 i JM1. Neki postojeći pristupi primjenjuju nekoliko klasterskih tehnika kazetnog modula, proces popraćen fazom procjene. Ovu procjenu obavlja stručnjak za kvalitetu softvera, koji analizira svakog predstavnika pojedinog klastera, a zatim označava module kao pogreški-naklonjene ili pogreški-nenaklonjene. Naš pristup ne zahtijeva čovjeka kao stručnjaka tijekom predviđanja procesa. To je strategija predviđanja pogreške, koja kombinira razinsku metodu mjernih pragova kao mehanizma za filtriranje i ILI operatora kao sastavni mehanizam.The existing software fault prediction models require metrics and fault data belonging to previous software versions or similar software projects. However, there are cases when previous fault data are not present, such as a software company’s transition to a new project domain. In this kind of situations, supervised learning methods using fault labels cannot be applied, leading to the need for new techniques. We proposed a software fault prediction strategy using method-level metrics thresholds to predict the fault-proneness of unlabelled program modules. This technique was experimentally evaluated on NASA datasets, KC2 and JM1. Some existing approaches implement several clustering techniques to cluster modules, process followed by an evaluation phase. This evaluation is performed by a software quality expert, who analyses every representative of each cluster and then labels the modules as fault-prone or not fault-prone. Our approach does not require a human expert during the prediction process. It is a fault prediction strategy, which combines a method-level metrics thresholds as filtering mechanism and an OR operator as a composition mechanism
A systematic review of unsupervised learning techniques for software defect prediction
National Key Basic Research Program of China [2018YFB1004401]; the National Natural Science Foundation of China [61972317, 61402370]
Improved point center algorithm for K-Means clustering to increase software defect prediction
The k-means is a clustering algorithm that is often and easy to use. This algorithm is susceptible to randomly chosen centroid points so that it cannot produce optimal results. This research aimed to improve the k-means algorithm’s performance by applying a proposed algorithm called point center. The proposed algorithm overcame the random centroid value in k-means and then applied it to predict software defects modules’ errors. The point center algorithm was proposed to determine the initial centroid value for the k-means algorithm optimization. Then, the selection of X and Y variables determined the cluster center members. The ten datasets were used to perform the testing, of which nine datasets were used for predicting software defects. The proposed center point algorithm showed the lowest errors. It also improved the k-means algorithm’s performance by an average of 12.82% cluster errors in the software compared to the centroid value obtained randomly on the simple k-means algorithm. The findings are beneficial and contribute to developing a clustering model to handle data, such as to predict software defect modules more accurately
Cluster analysis on radio product integration testing faults
Abstract. Nowadays, when the different software systems keep getting larger and more complex, integration testing is necessary to ensure that the different components of the system work together correctly. With the large and complex systems the analysis of the test faults can be difficult, as there are so many components that can cause the failure. Also with the increased usage of automated tests, the faults can often be caused by test environment or test automation issues.
Testing data and logs collected during the test executions are usually the main source of information that are used for test fault analysis. With the usage of text mining, natural language processing and machine learning methods, the fault analysis process is possible to be automated using the data and logs collected from the tests, as multiple studies have shown in the recent years.
In this thesis, an exploratory data study is done on data collected from radio product integration tests done at Nokia. Cluster analysis is used to find the different fault types that can be found from each of the collected file types. Different feature extraction methods are used and evaluated in terms of how well they separate the data for fault analysis.
The study done on this thesis paves the way for automated fault analysis in the future. The introduced methods can be applied for classifying the faults and the results and findings can be used to determine what are the next steps that can be taken to enable future implementations for automated fault analysis applications.Radiotuotteiden integraatiotestauksen vikojen klusterianalyysi. Tiivistelmä. Nykypäivänä, kun erilaiset ohjelmistojärjestelmät jatkavat kasvamista ja muuttuvat monimutkaisimmaksi, integraatiotestaus on välttämätöntä, jotta voidaan varmistua siitä, että järjestelmän eri komponentit toimivat yhdessä oikein. Suurien ja monimutkaisten järjestelmien testivikojen analysointi voi olla vaikeaa, koska järjestelmissä on mukana niin monta komponenttia, jotka voivat aiheuttaa testien epäonnistumisen. Testien automatisoinnin lisääntymisen myötä testit voivat usein epäonnistua myös johtuen testiympäristön tai testiautomaation ongelmista.
Testien aikana kerätty testidata ja testilogit ovat yleensä tärkein tiedonlähde testivikojen analyysissä. Hyödyntämällä tekstinlouhinnan, luonnollisen kielen käsittelyn sekä koneoppimisen menetelmiä, testivikojen analyysiprosessi on mahdollista automatisoida käyttämällä testien aikana kerättyä testidataa ja testilogeja, kuten monet tutkimukset ovat viime vuosina osoittaneet.
Tässä tutkielmassa tehdään eksploratiivinen tutkimus Nokian radiotuotteiden integraatiotesteistä kerätyllä datalla. Erilaiset vikatyypit, jotka voidaan löytää kustakin kerätystä tiedostotyypistä, löydetään käyttämällä klusterianalyysiä. Ominaisuusvektorien laskentaan käytetään eri menetelmiä ja näiden menetelmien kykyä erotella dataa vika-analyysin näkökulmasta arvioidaan.
Tutkielmassa tehty tutkimus avaa tietä vika-analyysien automatisoinnille tulevaisuudessa. Esitettyjä menetelmiä voidaan käyttää vikojen luokittelussa ja tuloksien perusteella voidaan määritellä, mitkä ovat seuraavia askelia, jotta vika-analyysiprosessia voidaan automatisoida tulevaisuudessa
Detection and Classification of Anomalies in Railway Tracks
Em Portugal, existe uma grande afluência dos transportes ferroviários. Acontece que as
empresas que providenciam esses serviços por vezes necessitam de efetuar manutenção às
vias-férreas/infraestruturas, o que leva à indisponibilização e/ou atraso dos serviços e máquinas,
e consequentemente perdas monetárias. Assim sendo, torna-se necessário preparar um plano
de manutenção e prever quando será fundamental efetuar manutenções, de forma a minimizar
perdas.
Através de um sistema de manutenção preditivo, é possível efetuar a manutenção apenas
quando esta é necessária. Este tipo de sistema monitoriza continuamente máquinas e/ou
processos, permitindo determinar quando a manutenção deverá existir. Uma das formas de
fazer esta análise é treinar algoritmos de machine learning com uma grande quantidade de
dados provenientes das máquinas e/ou processos.
Nesta dissertação, o objetivo é contribuir para o desenvolvimento de um sistema de
manutenção preditivo nas vias-férreas. O contributo específico será detetar e classificar
anomalias. Para tal, recorrem-se a técnicas de Machine Learning e Deep Learning, mais
concretamente algoritmos não supervisionados e semi-supervisionados, pois o conjunto de
dados fornecido possui um número reduzido de anomalias.
A escolha dos algoritmos é feita com base naquilo que atualmente é mais utilizado e apresenta
melhores resultados. Assim sendo, o primeiro passo da dissertação consistiu em investigar
quais as implementações mais comuns para detetar e classificar anomalias em sistemas de
manutenção preditivos.
Após a investigação, foram treinados os algoritmos que à primeira vista seriam capazes de se
adaptar ao cenário apresentado, procurando encontrar os melhores hiperparâmetros para os
mesmos. Chegou-se à conclusão, através da comparação da performance, que o mais
enquadrado para abordar o problema da identificação das anomalias seria uma rede neuronal
artifical Autoencoder. Através dos resultados deste modelo, foi possível definir thresholds para
efetuar posteriormente a classificação da anomalia.In Portugal, the railway tracks commonly require maintenance, which leads to a stop/delay of
the services, and consequently monetary losses and the non-full use of the equipment. With
the use of a Predictive Maintenance System, these problems can be minimized, since these
systems continuously monitor the machines and/or processes and determine when
maintenance is required.
Predictive Maintenance systems can be put together with machine and/or deep learning
algorithms since they can be trained with high volumes of historical data and provide diagnosis,
detect and classify anomalies, and estimate the lifetime of a machine/process.
This dissertation contributes to developing a predictive maintenance system for railway
tracks/infrastructure. The main objectives are to detect and classify anomalies in the railway
track. To achieve this, unsupervised and semi-supervised algorithms are tested and tuned to
determine the one that best adapts to the presented scenario. The algorithms need to be
unsupervised and semi-supervised given the few anomalous labels in the dataset
- …