21 research outputs found
Development of Robust and Scalable Hyperbox based Machine Learning Algorithms
University of Technology Sydney. Faculty of Engineering and Information Technology.Together with the rapid development of digital information and the increase in amount of data, machine learning (ML) algorithms have been developed and evolved constantly to discover new information and knowledge from different data sources. The use of hyperbox fuzzy sets as fundamental representational and building blocks in learning algorithms forms an important branch of ML. Hyperbox-based algorithms have a huge potential for high scalability and incremental adaptation to applications working in the dynamically changing environments. Additionally, learning algorithms based on hyperbox representations can form interpretable models, which are highly desirable for areas with the requirement of safety and trust. This study aims to develop and expand robust, scalable, and transparent learning algorithms for hyperbox-based classification models with a specific focus on a general fuzzy min-max neural network (GFMMNN).
First of all, a comprehensive survey on hyperbox-based machine learning models together with empirical assessments of the GFMMNN on pattern classification problems were conducted. Next, a new online learning algorithm was proposed for the GFMMNN and improved the robustness of the whole family of GFMMNN learning algorithms to work effectively with mixed-attribute data by introducing a new learning mechanism for categorical features. In terms of scalability, the main steps of the learning algorithms were reformulated so they can be effectively executed on graphics processing units using matrix operations, simultaneously proposing mathematical lemmas to reduce the redundancies of hyperbox candidates in the learning process. This thesis also proposed a novel method to enhance the transparency of classifiers while maintaining a good classification performance by using hierarchical granular representations from hyperbox fuzzy sets. The last contribution was a simple but powerful ensemble model built from many individual hyperbox-based classifiers trained on random subsets of both sample and feature spaces. Extensive empirical analyses indicated that the proposed solutions are highly competitive with other evaluated learning algorithms
Ensemble learning for software fault prediction problem with imbalanced data
Fault prediction problem has a crucial role in the software development process because it contributes to reducing defects and assisting the testing process towards fault-free software components. Therefore, there are a lot of efforts aiming to address this type of issues, in which static code characteristics are usually adopted to construct fault classification models. One of the challenging problems influencing the performance of predictive classifiers is the high imbalance among patterns belonging to different classes. This paper aims to integrate the sampling techniques and common classification techniques to form a useful ensemble model for the software defect prediction problem. The empirical results conducted on the benchmark datasets of software projects have shown the promising performance of our proposal in comparison with individual classifiers
Survey on Mutation-based Test Data Generation
The critical activity of testing is the systematic selection of suitable test cases, which be able to reveal highly the faults. Therefore, mutation coverage is an effective criterion for generating test data. Since the test data generation process is very labor intensive, time-consuming and error-prone when done manually, the automation of this process is highly aspired. The researches about automatic test data generation contributed a set of tools, approaches, development and empirical results. In this paper, we will analyse and conduct a comprehensive survey on generating test data based on mutation. The paper also analyses the trends in this field
An improved online learning algorithm for general fuzzy min-max neural network
This paper proposes an improved version of the current online learning
algorithm for a general fuzzy min-max neural network (GFMM) to tackle existing
issues concerning expansion and contraction steps as well as the way of dealing
with unseen data located on decision boundaries. These drawbacks lower its
classification performance, so an improved algorithm is proposed in this study
to address the above limitations. The proposed approach does not use the
contraction process for overlapping hyperboxes, which is more likely to
increase the error rate as shown in the literature. The empirical results
indicated the improvement in the classification accuracy and stability of the
proposed method compared to the original version and other fuzzy min-max
classifiers. In order to reduce the sensitivity to the training samples
presentation order of this new on-line learning algorithm, a simple ensemble
method is also proposed.Comment: 9 pages, 8 tables, 6 figure
A Novel Technique of Optimization for the COCOMO II Model Parameters using Teaching-Learning-Based Optimization Algorithm, Journal of Telecommunications and Information Technology, 2016, nr 1
Software cost estimation is a critical activity in the development life cycle for controlling risks and planning project schedules. Accurate estimation of the cost before the start-up of a project is essential for both the developers and the customers. Therefore, many models were proposed to address this issue, in which COCOMO II has been being widely employed in actual software projects. Good estimation models, such as COCOMO II, can avoid insufficient resources being allocated to a project. However, parameters for estimation formula in this model have not been optimized yet, and so the estimated results are not close to the actual results. In this paper, a novel technique to optimize the coefficients for COCOMO II model by using teaching-learning-based optimization (TLBO) algorithm is proposed. The performance of the model after optimizing parameters was tested on NASA software project dataset. The obtained results indicated that the improvement of parameters provided a better estimation capabilities compared to the original COCOMO II model
Application of Machine Learning to Performance Assessment for a class of PID-based Control Systems
In this paper, a novel machine learning derived control performance assesment
(CPA) classification system is proposed. It is dedicated for a class of
PID-based control loops with processes exhibiting second order plus delay time
(SOPDT) dynamical properties. The proposed concept is based on deriving and
combining a number of different, diverse control performance indices (CPIs)
that separately do not provide sufficient information about the control
performance. However, when combined together and used as discriminative
features of the assessed control system, they can provide consistent and
accurate CPA information. This concept is discussed in terms of the introduced
extended set of CPIs, comprehensive performance assessment of different machine
learning based classification methods and practical applicability of the
suggested solution. The latter is shown and verified by practical application
of the proposed approach to a CPA system for a laboratory heat exchange and
ditribution setup.Comment: Submitted to IEEE Transactions on Industrial Electronic