17 research outputs found

    Improvement of software defect prediction methods based on machine learning

    No full text
    Povećanje složenosti programskih sustava uzrokuje povećanje opsega verifikacijskog djelovanja, a time i troÅ”kova razvoja. Programske neispravnosti nejednoliko su raspoređene po programskim sustavima i to na način da se u manjem postotku programa nalazi veći postotak neispravnosti. Ova se disertacija bavi predviđanjem dijelova sustava sa programskim neispravnostima s ciljem pametnog usmjeravanja verifikacijskih strategija. Postupci prikupljanja podataka za potrebe predviđanja programskih neispravnosti nisu u potpunosti normirani i osnovni su uzrok nemogućnosti poopćenja primjene metoda predviđanja. Ostvareni doprinosi su pri tome postupak za prikupljanje podataka utemeljen na industrijskim normama te algoritam za prikupljanje podataka zasnovan na statičkim metrikama programskog koda, s ciljem povećanja prikladnosti podataka za primjenu metoda predviđanja programskih neispravnosti. Drugi problem kojem se posvećuje rad je neujednačenost skupova podataka, Å”to je inherentno svojstvo u ovoj domeni. Ostvareni doprinosi su pri tome metode za utvrđivanje granične razine neujednačenosti za metode strojnog učenja te postupka za odabir prikladnog modela predviđanja programskih neispravnosti na neujednačenim skupovima podataka. Pomoću predloženih postupaka moguće je unaprijed bolje procijeniti neispravne dijelove programskog sustava i time poboljÅ”ati verifikacijsku i razvojnu strategiju programskih sustava te unaprijediti planiranje budućih ulaganja u razvoj složenih programskih sustava u evoluciji.The increasing complexity of software systems is extending the verification activities and increasing the development cost. Software defects are unequally distributed within the software system. The majority of defects is situated in the smaller part of the system. This doctoral thesis is investigating the software defect prediction that aims to improve the allocation of verification resources. The software defect prediction data collection procedure is not entirely standardized and this is the major cause of inconsistencies and research bias. The contributions of this thesis are a systematically defined data collection procedure that is based on existing industrial norms and the data collection algorithm based on static code attributes, which aim to enhance the appropriateness of data for software defect prediction. The second problem that is covered in this thesis is the data imbalance problem, an inherent feature in this domain. The contributions of this thesis are a method for establishing the critical level of data imbalance for machine learning algorithms and a method for choosing the most appropriate predictive model with respect to the present level of data imbalance. The proposed procedures enable us to better predict the defective software modules and to improve the verification and the development strategies of complex software systems and to improve the investment planning of the evolving systems

    Improvement of software defect prediction methods based on machine learning

    No full text
    Povećanje složenosti programskih sustava uzrokuje povećanje opsega verifikacijskog djelovanja, a time i troÅ”kova razvoja. Programske neispravnosti nejednoliko su raspoređene po programskim sustavima i to na način da se u manjem postotku programa nalazi veći postotak neispravnosti. Ova se disertacija bavi predviđanjem dijelova sustava sa programskim neispravnostima s ciljem pametnog usmjeravanja verifikacijskih strategija. Postupci prikupljanja podataka za potrebe predviđanja programskih neispravnosti nisu u potpunosti normirani i osnovni su uzrok nemogućnosti poopćenja primjene metoda predviđanja. Ostvareni doprinosi su pri tome postupak za prikupljanje podataka utemeljen na industrijskim normama te algoritam za prikupljanje podataka zasnovan na statičkim metrikama programskog koda, s ciljem povećanja prikladnosti podataka za primjenu metoda predviđanja programskih neispravnosti. Drugi problem kojem se posvećuje rad je neujednačenost skupova podataka, Å”to je inherentno svojstvo u ovoj domeni. Ostvareni doprinosi su pri tome metode za utvrđivanje granične razine neujednačenosti za metode strojnog učenja te postupka za odabir prikladnog modela predviđanja programskih neispravnosti na neujednačenim skupovima podataka. Pomoću predloženih postupaka moguće je unaprijed bolje procijeniti neispravne dijelove programskog sustava i time poboljÅ”ati verifikacijsku i razvojnu strategiju programskih sustava te unaprijediti planiranje budućih ulaganja u razvoj složenih programskih sustava u evoluciji.The increasing complexity of software systems is extending the verification activities and increasing the development cost. Software defects are unequally distributed within the software system. The majority of defects is situated in the smaller part of the system. This doctoral thesis is investigating the software defect prediction that aims to improve the allocation of verification resources. The software defect prediction data collection procedure is not entirely standardized and this is the major cause of inconsistencies and research bias. The contributions of this thesis are a systematically defined data collection procedure that is based on existing industrial norms and the data collection algorithm based on static code attributes, which aim to enhance the appropriateness of data for software defect prediction. The second problem that is covered in this thesis is the data imbalance problem, an inherent feature in this domain. The contributions of this thesis are a method for establishing the critical level of data imbalance for machine learning algorithms and a method for choosing the most appropriate predictive model with respect to the present level of data imbalance. The proposed procedures enable us to better predict the defective software modules and to improve the verification and the development strategies of complex software systems and to improve the investment planning of the evolving systems

    Improvement of software defect prediction methods based on machine learning

    No full text
    Povećanje složenosti programskih sustava uzrokuje povećanje opsega verifikacijskog djelovanja, a time i troÅ”kova razvoja. Programske neispravnosti nejednoliko su raspoređene po programskim sustavima i to na način da se u manjem postotku programa nalazi veći postotak neispravnosti. Ova se disertacija bavi predviđanjem dijelova sustava sa programskim neispravnostima s ciljem pametnog usmjeravanja verifikacijskih strategija. Postupci prikupljanja podataka za potrebe predviđanja programskih neispravnosti nisu u potpunosti normirani i osnovni su uzrok nemogućnosti poopćenja primjene metoda predviđanja. Ostvareni doprinosi su pri tome postupak za prikupljanje podataka utemeljen na industrijskim normama te algoritam za prikupljanje podataka zasnovan na statičkim metrikama programskog koda, s ciljem povećanja prikladnosti podataka za primjenu metoda predviđanja programskih neispravnosti. Drugi problem kojem se posvećuje rad je neujednačenost skupova podataka, Å”to je inherentno svojstvo u ovoj domeni. Ostvareni doprinosi su pri tome metode za utvrđivanje granične razine neujednačenosti za metode strojnog učenja te postupka za odabir prikladnog modela predviđanja programskih neispravnosti na neujednačenim skupovima podataka. Pomoću predloženih postupaka moguće je unaprijed bolje procijeniti neispravne dijelove programskog sustava i time poboljÅ”ati verifikacijsku i razvojnu strategiju programskih sustava te unaprijediti planiranje budućih ulaganja u razvoj složenih programskih sustava u evoluciji.The increasing complexity of software systems is extending the verification activities and increasing the development cost. Software defects are unequally distributed within the software system. The majority of defects is situated in the smaller part of the system. This doctoral thesis is investigating the software defect prediction that aims to improve the allocation of verification resources. The software defect prediction data collection procedure is not entirely standardized and this is the major cause of inconsistencies and research bias. The contributions of this thesis are a systematically defined data collection procedure that is based on existing industrial norms and the data collection algorithm based on static code attributes, which aim to enhance the appropriateness of data for software defect prediction. The second problem that is covered in this thesis is the data imbalance problem, an inherent feature in this domain. The contributions of this thesis are a method for establishing the critical level of data imbalance for machine learning algorithms and a method for choosing the most appropriate predictive model with respect to the present level of data imbalance. The proposed procedures enable us to better predict the defective software modules and to improve the verification and the development strategies of complex software systems and to improve the investment planning of the evolving systems

    Manually curated dataset of catalytic peptides for ester hydrolysis

    No full text
    Catalytic peptides are low cost biomolecules able to catalyse chemical reactions such as ester hydrolysis. This dataset provides a list of catalytic peptides currently reported in literature. Several parameters were evaluated, including sequence length, composition, net charge, isoelectric point, hydrophobicity, self-assembly propensity and mechanism of catalysis. Along with the analysis of physico-chemical properties, the SMILES representation for each sequence was generated to provide an easy-to-use means of training machine learning models. This offers a unique opportunity for the development and validation of proof-of-concept predictive models. Being a reliable manually curated dataset, it also enables the benchmark for comparison of new models or models trained on automatically gathered peptide-oriented datasets. Moreover, the dataset provides an insight in the currently developed catalytic mechanisms and can be used as the foundation for the development of next-generation peptide-based catalysts

    Multivariate logistic regression prediction of fault-proneness in software modules

    No full text
    Abstract -This paper explores additional features, provided by stepwise logistic regression, which could further improve performance of fault predicting model. Three different models have been used to predict fault-proneness in NASA PROMISE data set and have been compared in terms of accuracy, sensitivity and false alarm rate: one with forward stepwise logistic regression, one with backward stepwise logistic regression and one without stepwise selection in logistic regression. Despite an obvious trade-off between sensitivity and false alarm rate, we can conclude that backward stepwise regression gave the best model

    Deep Learning Approach For Objects Detection in Underwater Pipeline Images

    No full text
    In this paper, we present automatic, deep-learning methods for pipeline detection in underwater environments. Seafloor pipelines are critical infrastructure for oil and gas transport. The inspection of those pipelines is required to verify their integrity and determine the need for maintenance. Underwater conditions present a harsh environment that is challenging for image recognition due to light refraction and absorption, poor visibility, scattering, and attenuation, often causing poor image quality. Modern machine-learning object detectors utilize Convolutional Neural Network (CNN), requiring a training dataset of sufficient quality. In the paper, six different deep-learning CNN detectors for underwater object detection were trained and tested: five are based on the You Only Look Once (YOLO) architectures (YOLOv4, YOLOv4-Tiny, CSP-YOLOv4, YOLOv4@Resnet, YOLOv4@DenseNet), and one on the Faster Region-based CNN (RCNN) architecture. The modelsā€™ performances were evaluated in terms of detection accuracy, mean average precision (mAP), and processing speed measured with the Frames Per Second (FPS) on a custom dataset containing underwater pipeline images. In the study, the YOLOv4 outperformed other models for underwater pipeline object detection resulting in an mAP of 94.21% with the ability to detect objects in real-time. Based on the literature review, this is one of the pioneering works in this field

    Esterase Sequence Composition Patterns for the Identification of Catalytic Triad Microenvironment Motifs

    No full text
    Ester hydrolysis is of wide biomedical interest, spanning from the green synthesis of pharmaceuticals to biomaterials' development. Existing peptide-based catalysts exhibit low catalytic efficiency compared to natural enzymes, due to the conformational heterogeneity of peptides. Moreover, there is lack of understanding of the correlation between the primary sequence and catalytic function. For this purpose, we statistically analyzed 22 EC 3.1 hydrolases with known catalytic triads, characterized by unique and well-defined mechanisms. The aim was to identify patterns at the sequence level that will better inform the creation of short peptides containing important information for catalysis, based on the catalytic triad, oxyanion holes and the triad residues microenvironments. Moreover, fragmentation schemes of the primary sequence of selected enzymes alongside the study of their amino acid frequencies, composition, and physicochemical properties are proposed. The results showed highly conserved catalytic sites with distinct positional patterns and chemical microenvironments that favor catalysis and revealed variations in catalytic site composition that could be useful for the design of minimalistic catalysts

    Comparing Direct Measurements and Three-Dimensional (3D) Scans for Evaluating Facial Soft Tissue

    Get PDF
    The inspection of patientsā€™ soft tissues and the effects of various dental procedures on their facial physiognomy are quite challenging. To minimise discomfort and simplify the process of manual measuring, we performed facial scanning and computer measurement of experimentally determined demarcation lines. Images were acquired using a low-cost 3D scanner. Two consecutive scans were obtained from 39 participants, to test the scanner repeatability. An additional ten persons were scanned before and after forward movement of the mandible (predicted treatment outcome). Sensor technology that combines red, green, and blue (RGB) data with depth information (RGBD) integration was used for merging frames into a 3D object. For proper comparison, the resulting images were registered together, which was performed with ICP (Iterative Closest Point)-based techniques. Measurements on 3D images were performed using the exact distance algorithm. One operator measured the same demarcation lines directly on participants; repeatability was tested (intra-class correlations). The results showed that the 3D face scans were reproducible with high accuracy (mean difference between repeated scan

    The Choice of Time–Frequency Representations of Non-Stationary Signals Affects Machine Learning Model Accuracy: A Case Study on Earthquake Detection from LEN-DB Data

    No full text
    Non-stationary signals are often analyzed using raw waveform data or spectrograms of those data; however, the possibility of alternative time–frequency representations being more informative than the original data or spectrograms is yet to be investigated. This paper tested whether alternative time–frequency representations could be more informative for machine learning classification of seismological data. The mentioned hypothesis was evaluated by training three well-established convolutional neural networks using nine time–frequency representations. The results were compared to the base model, which was trained on the raw waveform data. The signals that were used in the experiment are three-component seismogram instances from the Local Earthquakes and Noise DataBase (LEN-DB). The results demonstrate that Pseudo Wigner–Ville and Wigner–Ville time–frequency representations yield significantly better results than the base model, while spectrogram and Margenau–Hill perform significantly worse (p < 0.01). Interestingly, the spectrogram, which is often used in signal analysis, had inferior performance when compared to the base model. The findings presented in this research could have notable impacts in the fields of geophysics and seismology as the phenomena that were previously hidden in the seismic noise are now more easily identified. Furthermore, the results indicate that applying Pseudo Wigner–Ville or Wigner–Ville time–frequency representations could result in a large increase in earthquakes in the catalogs and lessen the need to add new stations with an overall reduction in the costs. Finally, the proposed approach of extracting valuable information through time–frequency representations could be applied in other domains as well, such as electroencephalogram and electrocardiogram signal analysis, speech recognition, gravitational waves investigation, and so on
    corecore