Search CORE

21 research outputs found

Development of Robust and Scalable Hyperbox based Machine Learning Algorithms

Author: Khuat Thanh Tung
Publication venue
Publication date: 01/01/2021
Field of study

University of Technology Sydney. Faculty of Engineering and Information Technology.Together with the rapid development of digital information and the increase in amount of data, machine learning (ML) algorithms have been developed and evolved constantly to discover new information and knowledge from different data sources. The use of hyperbox fuzzy sets as fundamental representational and building blocks in learning algorithms forms an important branch of ML. Hyperbox-based algorithms have a huge potential for high scalability and incremental adaptation to applications working in the dynamically changing environments. Additionally, learning algorithms based on hyperbox representations can form interpretable models, which are highly desirable for areas with the requirement of safety and trust. This study aims to develop and expand robust, scalable, and transparent learning algorithms for hyperbox-based classification models with a specific focus on a general fuzzy min-max neural network (GFMMNN). First of all, a comprehensive survey on hyperbox-based machine learning models together with empirical assessments of the GFMMNN on pattern classification problems were conducted. Next, a new online learning algorithm was proposed for the GFMMNN and improved the robustness of the whole family of GFMMNN learning algorithms to work effectively with mixed-attribute data by introducing a new learning mechanism for categorical features. In terms of scalability, the main steps of the learning algorithms were reformulated so they can be effectively executed on graphics processing units using matrix operations, simultaneously proposing mathematical lemmas to reduce the redundancies of hyperbox candidates in the learning process. This thesis also proposed a novel method to enhance the transparency of classifiers while maintaining a good classification performance by using hierarchical granular representations from hyperbox fuzzy sets. The last contribution was a simple but powerful ensemble model built from many individual hyperbox-based classifiers trained on random subsets of both sample and feature spaces. Extensive empirical analyses indicated that the proposed solutions are highly competitive with other evaluated learning algorithms

OPUS - University of Technology Sydney

Ensemble learning for software fault prediction problem with imbalanced data

Author: Khuat Thanh Tung
Le My Hanh
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/08/2019
Field of study

Fault prediction problem has a crucial role in the software development process because it contributes to reducing defects and assisting the testing process towards fault-free software components. Therefore, there are a lot of efforts aiming to address this type of issues, in which static code characteristics are usually adopted to construct fault classification models. One of the challenging problems influencing the performance of predictive classifiers is the high imbalance among patterns belonging to different classes. This paper aims to integrate the sampling techniques and common classification techniques to form a useful ensemble model for the software defect prediction problem. The empirical results conducted on the benchmark datasets of software projects have shown the promising performance of our proposal in comparison with individual classifiers

Crossref

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

Survey on Mutation-based Test Data Generation

Author: Khuat Thanh Tung
Le Thi My Hanh
Nguyen Thanh Binh
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/10/2015
Field of study

The critical activity of testing is the systematic selection of suitable test cases, which be able to reveal highly the faults. Therefore, mutation coverage is an effective criterion for generating test data. Since the test data generation process is very labor intensive, time-consuming and error-prone when done manually, the automation of this process is highly aspired. The researches about automatic test data generation contributed a set of tools, approaches, development and empirical results. In this paper, we will analyse and conduct a comprehensive survey on generating test data based on mutation. The paper also analyses the trends in this field

Crossref

Institute of Advanced Engineering and Science

An improved online learning algorithm for general fuzzy min-max neural network

Author: Chen Fang
Gabrys Bogdan
Khuat Thanh Tung
Publication venue
Publication date: 08/01/2020
Field of study

This paper proposes an improved version of the current online learning algorithm for a general fuzzy min-max neural network (GFMM) to tackle existing issues concerning expansion and contraction steps as well as the way of dealing with unseen data located on decision boundaries. These drawbacks lower its classification performance, so an improved algorithm is proposed in this study to address the above limitations. The proposed approach does not use the contraction process for overlapping hyperboxes, which is more likely to increase the error rate as shown in the literature. The empirical results indicated the improvement in the classification accuracy and stability of the proposed method compared to the original version and other fuzzy min-max classifiers. In order to reduce the sensitivity to the training samples presentation order of this new on-line learning algorithm, a simple ensemble method is also proposed.Comment: 9 pages, 8 tables, 6 figure

arXiv.org e-Print Archive

Crossref

A Novel Technique of Optimization for the COCOMO II Model Parameters using Teaching-Learning-Based Optimization Algorithm, Journal of Telecommunications and Information Technology, 2016, nr 1

Author: Khuat Thanh Tung
Le Hanh Le
Publication venue: 'National Institute of Telecommunications'
Publication date
Field of study

Software cost estimation is a critical activity in the development life cycle for controlling risks and planning project schedules. Accurate estimation of the cost before the start-up of a project is essential for both the developers and the customers. Therefore, many models were proposed to address this issue, in which COCOMO II has been being widely employed in actual software projects. Good estimation models, such as COCOMO II, can avoid insufficient resources being allocated to a project. However, parameters for estimation formula in this model have not been optimized yet, and so the estimated results are not close to the actual results. In this paper, a novel technique to optimize the coefficients for COCOMO II model by using teaching-learning-based optimization (TLBO) algorithm is proposed. The performance of the model after optimizing parameters was tested on NASA software project dataset. The obtained results indicated that the improvement of parameters provided a better estimation capabilities compared to the original COCOMO II model

Biblioteka Cyfrowa Instytutu Łączności / National Institute of Telecomunications: Digital Library

Application of Machine Learning to Performance Assessment for a class of PID-based Control Systems

Author: Czeczot Jacek
Gabrys Bogdan
Grelewicz Patryk
Khuat Thanh Tung
Klopot Tomasz
Publication venue
Publication date: 08/01/2021
Field of study

In this paper, a novel machine learning derived control performance assesment (CPA) classification system is proposed. It is dedicated for a class of PID-based control loops with processes exhibiting second order plus delay time (SOPDT) dynamical properties. The proposed concept is based on deriving and combining a number of different, diverse control performance indices (CPIs) that separately do not provide sufficient information about the control performance. However, when combined together and used as discriminative features of the assessed control system, they can provide consistent and accurate CPA information. This concept is discussed in terms of the introduced extended set of CPIs, comprehensive performance assessment of different machine learning based classification methods and practical applicability of the suggested solution. The latter is shown and verified by practical application of the proposed approach to a CPA system for a laboratory heat exchange and ditribution setup.Comment: Submitted to IEEE Transactions on Industrial Electronic

arXiv.org e-Print Archive