Search CORE

17 research outputs found

Improvement of software defect prediction methods based on machine learning

Author: Mauša Goran
Publication venue: University of Zagreb. Faculty of Electrical Engineering and Computing. Department of Electronics, Microelectronics, Computer and Intelligent Systems.
Publication date: 01/01/2016
Field of study

Povećanje složenosti programskih sustava uzrokuje povećanje opsega verifikacijskog djelovanja, a time i troškova razvoja. Programske neispravnosti nejednoliko su raspoređene po programskim sustavima i to na način da se u manjem postotku programa nalazi veći postotak neispravnosti. Ova se disertacija bavi predviđanjem dijelova sustava sa programskim neispravnostima s ciljem pametnog usmjeravanja verifikacijskih strategija. Postupci prikupljanja podataka za potrebe predviđanja programskih neispravnosti nisu u potpunosti normirani i osnovni su uzrok nemogućnosti poopćenja primjene metoda predviđanja. Ostvareni doprinosi su pri tome postupak za prikupljanje podataka utemeljen na industrijskim normama te algoritam za prikupljanje podataka zasnovan na statičkim metrikama programskog koda, s ciljem povećanja prikladnosti podataka za primjenu metoda predviđanja programskih neispravnosti. Drugi problem kojem se posvećuje rad je neujednačenost skupova podataka, što je inherentno svojstvo u ovoj domeni. Ostvareni doprinosi su pri tome metode za utvrđivanje granične razine neujednačenosti za metode strojnog učenja te postupka za odabir prikladnog modela predviđanja programskih neispravnosti na neujednačenim skupovima podataka. Pomoću predloženih postupaka moguće je unaprijed bolje procijeniti neispravne dijelove programskog sustava i time poboljšati verifikacijsku i razvojnu strategiju programskih sustava te unaprijediti planiranje budućih ulaganja u razvoj složenih programskih sustava u evoluciji.The increasing complexity of software systems is extending the verification activities and increasing the development cost. Software defects are unequally distributed within the software system. The majority of defects is situated in the smaller part of the system. This doctoral thesis is investigating the software defect prediction that aims to improve the allocation of verification resources. The software defect prediction data collection procedure is not entirely standardized and this is the major cause of inconsistencies and research bias. The contributions of this thesis are a systematically defined data collection procedure that is based on existing industrial norms and the data collection algorithm based on static code attributes, which aim to enhance the appropriateness of data for software defect prediction. The second problem that is covered in this thesis is the data imbalance problem, an inherent feature in this domain. The contributions of this thesis are a method for establishing the critical level of data imbalance for machine learning algorithms and a method for choosing the most appropriate predictive model with respect to the present level of data imbalance. The proposed procedures enable us to better predict the defective software modules and to improve the verification and the development strategies of complex software systems and to improve the investment planning of the evolving systems

Croatian Digital Dissertations Repository

Improvement of software defect prediction methods based on machine learning

Author: Mauša Goran
Publication venue: University of Zagreb. Faculty of Electrical Engineering and Computing. Department of Electronics, Microelectronics, Computer and Intelligent Systems.
Publication date: 01/01/2016
Field of study

Croatian Digital Dissertations Repository

University of Zagreb Repository

FER Repository

Improvement of software defect prediction methods based on machine learning

Author: Mauša Goran
Publication venue: University of Zagreb. Faculty of Electrical Engineering and Computing. Department of Electronics, Microelectronics, Computer and Intelligent Systems.
Publication date: 01/01/2016
Field of study

University of Zagreb Repository

Manually curated dataset of catalytic peptides for ester hydrolysis

Author: Daniela Kalafatovic
Erik Otović
Goran Mauša
Patrizia Janković
Publication venue: 'Elsevier BV'
Publication date: 01/06/2023
Field of study

Catalytic peptides are low cost biomolecules able to catalyse chemical reactions such as ester hydrolysis. This dataset provides a list of catalytic peptides currently reported in literature. Several parameters were evaluated, including sequence length, composition, net charge, isoelectric point, hydrophobicity, self-assembly propensity and mechanism of catalysis. Along with the analysis of physico-chemical properties, the SMILES representation for each sequence was generated to provide an easy-to-use means of training machine learning models. This offers a unique opportunity for the development and validation of proof-of-concept predictive models. Being a reliable manually curated dataset, it also enables the benchmark for comparison of new models or models trained on automatically gathered peptide-oriented datasets. Moreover, the dataset provides an insight in the currently developed catalytic mechanisms and can be used as the foundation for the development of next-generation peptide-based catalysts

Directory of Open Access Journals

Multivariate logistic regression prediction of fault-proneness in software modules

Author: Bojana Dalbelo Bašić
Galinac Tihana
Goran Mauša
Grbac
Publication venue
Publication date: 01/01/2012
Field of study

Abstract -This paper explores additional features, provided by stepwise logistic regression, which could further improve performance of fault predicting model. Three different models have been used to predict fault-proneness in NASA PROMISE data set and have been compared in terms of accuracy, sensitivity and false alarm rate: one with forward stepwise logistic regression, one with backward stepwise logistic regression and one without stepwise selection in logistic regression. Despite an obvious trade-off between sensitivity and false alarm rate, we can conclude that backward stepwise regression gave the best model

CiteSeerX

Deep Learning Approach For Objects Detection in Underwater Pipeline Images

Author: Boris Gašparović
Goran Mauša
Jonatan Lerga
Marina Ivašić-Kos
Publication venue: Taylor & Francis Group
Publication date: 01/12/2022
Field of study

In this paper, we present automatic, deep-learning methods for pipeline detection in underwater environments. Seafloor pipelines are critical infrastructure for oil and gas transport. The inspection of those pipelines is required to verify their integrity and determine the need for maintenance. Underwater conditions present a harsh environment that is challenging for image recognition due to light refraction and absorption, poor visibility, scattering, and attenuation, often causing poor image quality. Modern machine-learning object detectors utilize Convolutional Neural Network (CNN), requiring a training dataset of sufficient quality. In the paper, six different deep-learning CNN detectors for underwater object detection were trained and tested: five are based on the You Only Look Once (YOLO) architectures (YOLOv4, YOLOv4-Tiny, CSP-YOLOv4, YOLOv4@Resnet, YOLOv4@DenseNet), and one on the Faster Region-based CNN (RCNN) architecture. The models’ performances were evaluated in terms of detection accuracy, mean average precision (mAP), and processing speed measured with the Frames Per Second (FPS) on a custom dataset containing underwater pipeline images. In the study, the YOLOv4 outperformed other models for underwater pipeline object detection resulting in an mAP of 94.21% with the ability to detect objects in real-time. Based on the literature review, this is one of the pioneering works in this field

Directory of Open Access Journals

Esterase Sequence Composition Patterns for the Identification of Catalytic Triad Microenvironment Motifs

Author: Babić Marko
Janković Patrizia
Kalafatovic Daniela
Marchesan Silvia
Mauša Goran
Publication venue
Publication date: 01/01/2022
Field of study

Ester hydrolysis is of wide biomedical interest, spanning from the green synthesis of pharmaceuticals to biomaterials' development. Existing peptide-based catalysts exhibit low catalytic efficiency compared to natural enzymes, due to the conformational heterogeneity of peptides. Moreover, there is lack of understanding of the correlation between the primary sequence and catalytic function. For this purpose, we statistically analyzed 22 EC 3.1 hydrolases with known catalytic triads, characterized by unique and well-defined mechanisms. The aim was to identify patterns at the sequence level that will better inform the creation of short peptides containing important information for catalysis, based on the catalytic triad, oxyanion holes and the triad residues microenvironments. Moreover, fragmentation schemes of the primary sequence of selected enzymes alongside the study of their amino acid frequencies, composition, and physicochemical properties are proposed. The results showed highly conserved catalytic sites with distinct positional patterns and chemical microenvironments that favor catalysis and revealed variations in catalytic site composition that could be useful for the design of minimalistic catalysts

Archivio istituzionale della ricerca - Università di Trieste

Comparing Direct Measurements and Three-Dimensional (3D) Scans for Evaluating Facial Soft Tissue

Author: Gašparović Boris
Katić Višnja
Lenac Kristijan
Mauša Goran
Morelato Luka
Zhurov Alexei
Publication venue: 'MDPI AG'
Publication date: 22/02/2023
Field of study

The inspection of patients’ soft tissues and the effects of various dental procedures on their facial physiognomy are quite challenging. To minimise discomfort and simplify the process of manual measuring, we performed facial scanning and computer measurement of experimentally determined demarcation lines. Images were acquired using a low-cost 3D scanner. Two consecutive scans were obtained from 39 participants, to test the scanner repeatability. An additional ten persons were scanned before and after forward movement of the mandible (predicted treatment outcome). Sensor technology that combines red, green, and blue (RGB) data with depth information (RGBD) integration was used for merging frames into a 3D object. For proper comparison, the resulting images were registered together, which was performed with ICP (Iterative Closest Point)-based techniques. Measurements on 3D images were performed using the exact distance algorithm. One operator measured the same demarcation lines directly on participants; repeatability was tested (intra-class correlations). The results showed that the 3D face scans were reproducible with high accuracy (mean difference between repeated scan

Online Research @ Cardiff

Repository of the University of Rijeka

The Choice of Time–Frequency Representations of Non-Stationary Signals Affects Machine Learning Model Accuracy: A Case Study on Earthquake Detection from LEN-DB Data

Author: Alberto Michelini
Dario Jozinović
Erik Otović
Goran Mauša
Ivan Štajduhar
Jonatan Lerga
Marko Njirjak
Publication venue: 'MDPI AG'
Publication date: 17/03/2022
Field of study

Non-stationary signals are often analyzed using raw waveform data or spectrograms of those data; however, the possibility of alternative time–frequency representations being more informative than the original data or spectrograms is yet to be investigated. This paper tested whether alternative time–frequency representations could be more informative for machine learning classification of seismological data. The mentioned hypothesis was evaluated by training three well-established convolutional neural networks using nine time–frequency representations. The results were compared to the base model, which was trained on the raw waveform data. The signals that were used in the experiment are three-component seismogram instances from the Local Earthquakes and Noise DataBase (LEN-DB). The results demonstrate that Pseudo Wigner–Ville and Wigner–Ville time–frequency representations yield significantly better results than the base model, while spectrogram and Margenau–Hill perform significantly worse (p < 0.01). Interestingly, the spectrogram, which is often used in signal analysis, had inferior performance when compared to the base model. The findings presented in this research could have notable impacts in the fields of geophysics and seismology as the phenomena that were previously hidden in the seismic noise are now more easily identified. Furthermore, the results indicate that applying Pseudo Wigner–Ville or Wigner–Ville time–frequency representations could result in a large increase in earthquakes in the catalogs and lessen the need to add new stations with an overall reduction in the costs. Finally, the proposed approach of extracting valuable information through time–frequency representations could be applied in other domains as well, such as electroencephalogram and electrocardiogram signal analysis, speech recognition, gravitational waves investigation, and so on

Multidisciplinary Digital Publishing Institute