Search CORE

5 research outputs found

Recommended from our members

Discovering Software Reliability Patterns Based On Multiple Software Projects

Author: Adkins Gerald
Liu Yi
Williams Gita
Yao Jeng-Foung
Publication venue: CSUSB ScholarWorks
Publication date: 01/01/2007
Field of study

Discovering patterns that indicate software reliability provides valuable information to software project managers. Software Quality Classification (SQC) modeling is a methodology that can be used to discover reliability patterns of large software projects. However, the patterns found by SQC modeling may not be accurate and robust owing to insufficient information used in the training process. This study compares two genetic programming-based SQC models using different volumes of data. These data were extracted from seven different NASA software projects. The results demonstrate that combining data from different projects can produce more accurate and reliable patterns

CSUSB ScholarWorks

Re-purposing Heterogeneous Generative Ensembles with Evolutionary Computation

Author: Al-Dujaili Abdullah
Arora Sanjeev
Chen Y.
Dietterich Thomas G.
Grover Aditya
Iba Hitoshi
Pulido Martha
Tran Cao Truong
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/08/2020
Field of study

Generative Adversarial Networks (GANs) are popular tools for generative modeling. The dynamics of their adversarial learning give rise to convergence pathologies during training such as mode and discriminator collapse. In machine learning, ensembles of predictors demonstrate better results than a single predictor for many tasks. In this study, we apply two evolutionary algorithms (EAs) to create ensembles to re-purpose generative models, i.e., given a set of heterogeneous generators that were optimized for one objective (e.g., minimize Frechet Inception Distance), create ensembles of them for optimizing a different objective (e.g., maximize the diversity of the generated samples). The first method is restricted by the exact size of the ensemble and the second method only restricts the upper bound of the ensemble size. Experimental analysis on the MNIST image benchmark demonstrates that both EA ensembles creation methods can re-purpose the models, without reducing their original functionality. The EA-based demonstrate significantly better performance compared to other heuristic-based methods. When comparing both evolutionary, the one with only an upper size bound on the ensemble size is the best.Comment: Accepted as a full paper for the Genetic and Evolutionary Computation Conference - GECCO'2

arXiv.org e-Print Archive

Crossref

Knowledge mining sensory evaluation data: genetic programming, statistical techniques, and swarm optimization

Author: O'Reilly Una-May
Veeramachaneni Kalyan
Vladislavleva Ekaterina
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/08/2016
Field of study

Knowledge mining sensory evaluation data is a challenging process due to extreme sparsity of the data, and a large variation in responses from different members (called assessors) of the panel. The main goals of knowledge mining in sensory sciences are understanding the dependency of the perceived liking score on the concentration levels of flavors’ ingredients, identifying ingredients that drive liking, segmenting the panel into groups with similar liking preferences and optimizing flavors to maximize liking per group. Our approach employs (1) Genetic programming (symbolic regression) and ensemble methods to generate multiple diverse explanations of assessor liking preferences with confidence information; (2) statistical techniques to extrapolate using the produced ensembles to unobserved regions of the flavor space, and segment the assessors into groups which either have the same propensity to like flavors, or are driven by the same ingredients; and (3) two-objective swarm optimization to identify flavors which are well and consistently liked by a selected segment of assessors

DSpace@MIT

Hydrologic prediction using pattern recognition and soft-computing techniques

Author: Parasuraman Kamban
Publication venue: 'University of Saskatchewan Library'
Publication date: 01/01/2007
Field of study

Several studies indicate that the data-driven models have proven to be potentially useful tools in hydrological modeling. Nevertheless, it is a common perception among researchers and practitioners that the usefulness of the system theoretic models is limited to forecast applications, and they cannot be used as a tool for scientific investigations. Also, the system-theoretic models are believed to be less reliable as they characterize the hydrological processes by learning the input-output patterns embedded in the dataset and not based on strong physical understanding of the system. It is imperative that the above concerns needs to be addressed before the data-driven models can gain wider acceptability by researchers and practitioners.In this research different methods and tools that can be adopted to promote transparency in the data-driven models are probed with the objective of extending the usefulness of data-driven models beyond forecast applications as a tools for scientific investigations, by providing additional insights into the underlying input-output patterns based on which the data-driven models arrive at a decision. In this regard, the utility of self-organizing networks (competitive learning and self-organizing maps) in learning the patterns in the input space is evaluated by developing a novel neural network model called the spiking modular neural networks (SMNNs). The performance of the SMNNs is evaluated based on its ability to characterize streamflows and actual evapotranspiration process. Also the utility of self-organizing algorithms, namely genetic programming (GP), is evaluated with regards to its ability to promote transparency in data-driven models. The robustness of the GP to evolve its own model structure with relevant parameters is illustrated by applying GP to characterize the actual-evapotranspiration process. The results from this research indicate that self-organization in learning, both in terms of self-organizing networks and self-organizing algorithms, could be adopted to promote transparency in data-driven models.In pursuit of improving the reliability of the data-driven models, different methods for incorporating uncertainty estimates as part of the data-driven model building exercise is evaluated in this research. The local-scale models are shown to be more reliable than the global-scale models in characterizing the saturated hydraulic conductivity of soils. In addition, in this research, the importance of model structure uncertainty in geophysical modeling is emphasized by developing a framework to account for the model structure uncertainty in geophysical modeling. The contribution of the model structure uncertainty to the predictive uncertainty of the model is shown to be larger than the uncertainty associated with the model parameters. Also it has been demonstrated that increasing the model complexity may lead to a better fit of the function, but at the cost of an increasing level of uncertainty. It is recommended that the effect of model structure uncertainty should be considered for developing reliable hydrological models

eCommons@USASK

University of Saskatchewan Research Archive

Hybrid Metaheuristic Methods for Ensemble Classification in Non-stationary Data Streams

Author: Ghomeshi Hossein
Publication venue
Publication date: 15/06/2020
Field of study

The extensive growth of digital technologies has led to new challenges in terms of processing and distilling insights from data that generated continuously in real-time. To address this challenge, several data stream mining techniques, where each instance of data is typically processed once on its arrival (i.e. online), have been proposed. However, such techniques of-ten perform poorly over non-stationary data streams, where the distribution of data evolves over time in unforeseen ways. To ensure the predictive ability of a computational model working with evolving data, appropriate data-stream mining techniques capable of adapting to different types of concept drifts are required. So far, ensemble-based learning methods are among the most popular techniques employed for performing data stream classification tasks in the presence of concept drifts. In ensemble learning, multiple learners forming an ensemble are trained to obtain a better predictive performance compared to that of a single learner. This thesis aims to propose and investigate novel hybrid metaheuristic methods for per-forming classification tasks in non-stationary environments. In particular, the thesis offers the following three main contributions. First, it presents the Evolutionary Adaptation to Concept Drifts (EACD) method that uses two evolutionary algorithms, namely, Replicator Dynamics (RD) and Genetic algorithm (GA). According to this method, an ensemble of different classification types is created based on various feature sets (called subspaces) randomly drawn from the target data stream. These subspaces are allowed to grow or shrink based on their performance using RD, while their combinations are optimised using GA. As the second contribution, this thesis proposes the REplicator Dynamics & GENEtic (RED-GENE)algorithm. RED-GENE builds upon the EACD method and employs the same approach to creating different classification types and GA optimisation technique. At the same time, RED-GENE improves the EACD method by proposing three different modified versions of RD to accelerate the concept drift adaptation process. The third contribution of the thesis is the REplicator Dynamics & Particle Swarm Optimisation (RED-PSO) algorithm that is based on a three-layer architecture to produce classification types of different sizes. The selected feature combinations in all classification types are optimised using a non-canonical version of the Particle Swarm Optimisation (PSO) technique for each layer individually. An extensive set of experiments using both synthetic and real-world data streams proves the effectiveness of the three proposed methods along with their statistical significance to the state-of-the-art algorithms. The proposed methods in this dissertation are consequently compared with each other that proves each of the proposed methods has its strengths to-wards concept drift adaptation in non-stationary data stream classification. This has led us to formulate a list of suggestions on when to use each of the proposed methods with regards to different applications and environments

Birmingham City University Open Access Repository

BCU Open Access