Search CORE

7 research outputs found

A fault detection strategy for software projects

Author: Banu Diri
Cagatay Catal
Publication venue: Faculty of Mechanical Engineering in Slavonski Brod; Faculty of Electrical Engineering, Computer Science and Information Technology Osijek; Faculty of Civil Engineering in Osijek
Publication date: 01/01/2013
Field of study

Postojeći modeli predviđanja pogrešaka softvera zahtijevaju metrike i podatke o pogreškama koji pripadaju prethodnim verzijama softvera ili sličnim projektima softvera. Međutim, postoje slučajevi kada prethodni podaci o pogreškama nisu prisutni, kao što je prelazak softverske tvrtke u novo projektno područje. U takvim situacijama, nadzorne metode učenja pomoću označavanja pogreške se ne mogu primijeniti, što dovodi do potrebe za novim tehnikama. Mi smo predložili strategiju predviđanja pogrešaka softvera uporabom razinske metode mjernih pragova za predviđanje sklonosti pogreškama neoznačenih programskih modula. Ova tehnika je eksperimentalno ocijenjena na NASA setovima podataka, KC2 i JM1. Neki postojeći pristupi primjenjuju nekoliko klasterskih tehnika kazetnog modula, proces popraćen fazom procjene. Ovu procjenu obavlja stručnjak za kvalitetu softvera, koji analizira svakog predstavnika pojedinog klastera, a zatim označava module kao pogreški-naklonjene ili pogreški-nenaklonjene. Naš pristup ne zahtijeva čovjeka kao stručnjaka tijekom predviđanja procesa. To je strategija predviđanja pogreške, koja kombinira razinsku metodu mjernih pragova kao mehanizma za filtriranje i ILI operatora kao sastavni mehanizam.The existing software fault prediction models require metrics and fault data belonging to previous software versions or similar software projects. However, there are cases when previous fault data are not present, such as a software company’s transition to a new project domain. In this kind of situations, supervised learning methods using fault labels cannot be applied, leading to the need for new techniques. We proposed a software fault prediction strategy using method-level metrics thresholds to predict the fault-proneness of unlabelled program modules. This technique was experimentally evaluated on NASA datasets, KC2 and JM1. Some existing approaches implement several clustering techniques to cluster modules, process followed by an evaluation phase. This evaluation is performed by a software quality expert, who analyses every representative of each cluster and then labels the modules as fault-prone or not fault-prone. Our approach does not require a human expert during the prediction process. It is a fault prediction strategy, which combines a method-level metrics thresholds as filtering mechanism and an OR operator as a composition mechanism

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Classification of software components based on clustering

Author: Konda Swetha Reddy
Publication venue: The Research Repository @ WVU
Publication date: 01/01/2007
Field of study

This thesis demonstrates how in different phases of the software life cycle, software components that have similar software metrics can be grouped into homogeneous clusters. We use multi-variate analysis techniques to group similar software components. The results were applied on several real case studies from NASA and open source software. We obtained process and product related metrics during the requirements specification, product related metrics at the architectural level and code metrics from operational stage for several case studies. We implemented clustering analysis using these metrics and validated the results. This analysis makes it possible to rank the clusters and assign similar development and validation tasks for all the components in a cluster, as the components in a cluster have similar metrics and hence tend to behave alike

The Research Repository @ WVU (West Virginia University)

Transforming big data into smart data: An insight on the use of the k-nearest neighbors algorithm to obtain quality data

Author: Aha
Al-Fuqaha
Andoni
Angiulli
Arnaiz-González
Arya
Batista
Bertino
Bezdek
Biau
Cano
Chang
Chen
Chen
Cover
Datta
Dean
Derrac
Derrac
Dutta
Eiben
Fadili
Fan
Fan
Fernández
Fernández
Figueredo
Friedman
Frénay
Garcia
Garcia
García
García
García
García
García-Laencina
Gupta
Hart
Hernández
Iafrate
Iguyon
Keller
Kim
Kononenko
Lenk
Little
Little
Liu
Liu
Luengo
Luengo
Maillo
Maillo
Marx
Meng
Navot
Nguyen
Palma-Mendoza
Pan
Peralta
Philip-Chen
Quinlan
Raja
Ramírez-Gallego
Ramírez-Gallego
Ramírez-Gallego
Rastogi
Royston
Río
Schneider
Skalak
Snir
Sun
Sánchez
Sánchez
Tan
Tomek
Triguero
Triguero
Triguero
Triguero
Triguero
Triguero
Uhlmann
Weinberger
Wettschereck
White
Wilson
Xue
Zaharia
Zerhari
Zhang
Zhong
Zhu
Zou
Publication venue: 'Wiley'
Publication date: 01/03/2019
Field of study

The k-nearest neighbours algorithm is characterised as a simple yet effective data mining technique. The main drawback of this technique appears when massive amounts of data -likely to contain noise and imperfections - are involved, turning this algorithm into an imprecise and especially inefficient technique. These disadvantages have been subject of research for many years, and among others approaches, data preprocessing techniques such as instance reduction or missing values imputation have targeted these weaknesses. As a result, these issues have turned out as strengths and the k-nearest neighbours rule has become a core algorithm to identify and correct imperfect data, removing noisy and redundant samples, or imputing missing values, transforming Big Data into Smart Data - which is data of sufficient quality to expect a good outcome from any data mining algorithm. The role of this smart data gleaning algorithm in a supervised learning context will be investigated. This will include a brief overview of Smart Data, current and future trends for the k-nearest neighbour algorithm in the Big Data context, and the existing data preprocessing techniques based on this algorithm. We present the emerging big data-ready versions of these algorithms and develop some new methods to cope with Big Data. We carry out a thorough experimental analysis in a series of big datasets that provide guidelines as to how to use the k-nearest neighbour algorithm to obtain Smart/Quality Data for a high quality data mining process. Moreover, multiple Spark Packages have been developed including all the Smart Data algorithms analysed

Crossref

Repository@Nottingham