180 research outputs found

    Optimisation approaches for data mining in biological systems

    Get PDF
    The advances in data acquisition technologies have generated massive amounts of data that present considerable challenge for analysis. How to efficiently and automatically mine through the data and extract the maximum value by identifying the hidden patterns is an active research area, called data mining. This thesis tackles several problems in data mining, including data classification, regression analysis and community detection in complex networks, with considerable applications in various biological systems. First, the problem of data classification is investigated. An existing classifier has been adopted from literature and two novel solution procedures have been proposed, which are shown to improve the predictive accuracy of the original method and significantly reduce the computational time. Disease classification using high throughput genomic data is also addressed. To tackle the problem of analysing large number of genes against small number of samples, a new approach of incorporating extra biological knowledge and constructing higher level composite features for classification has been proposed. A novel model has been introduced to optimise the construction of composite features. Subsequently, regression analysis is considered where two piece-wise linear regression methods have been presented. The first method partitions one feature into multiple complementary intervals and ts each with a distinct linear function. The other method is a more generalised variant of the previous one and performs recursive binary partitioning that permits partitioning of multiple features. Lastly, community detection in complex networks is investigated where a new optimisation framework is introduced to identify the modular structure hidden in directed networks via optimisation of modularity. A non-linear model is firstly proposed before its linearised variant is presented. The optimisation framework consists of two major steps, including solving the non-linear model to identify a coarse initial partition and a second step of solving repeatedly the linearised models to re fine the network partition

    Optimisation Models for Pathway Activity Inference in Cancer

    Get PDF
    BACKGROUND: With advances in high-throughput technologies, there has been an enormous increase in data related to profiling the activity of molecules in disease. While such data provide more comprehensive information on cellular actions, their large volume and complexity pose difficulty in accurate classification of disease phenotypes. Therefore, novel modelling methods that can improve accuracy while offering interpretable means of analysis are required. Biological pathways can be used to incorporate a priori knowledge of biological interactions to decrease data dimensionality and increase the biological interpretability of machine learning models. METHODOLOGY: A mathematical optimisation model is proposed for pathway activity inference towards precise disease phenotype prediction and is applied to RNA-Seq datasets. The model is based on mixed-integer linear programming (MILP) mathematical optimisation principles and infers pathway activity as the linear combination of pathway member gene expression, multiplying expression values with model-determined gene weights that are optimised to maximise discrimination of phenotype classes and minimise incorrect sample allocation. RESULTS: The model is evaluated on the transcriptome of breast and colorectal cancer, and exhibits solution results of good optimality as well as good prediction performance on related cancer subtypes. Two baseline pathway activity inference methods and three advanced methods are used for comparison. Sample prediction accuracy, robustness against noise expression data, and survival analysis suggest competitive prediction performance of our model while providing interpretability and insight on key pathways and genes. Overall, our work demonstrates that the flexible nature of mathematical programming lends itself well to developing efficient computational strategies for pathway activity inference and disease subtype prediction

    Identification of pathway and gene markers using enhanced directed random walk for multiclass cancer expression data

    Get PDF
    Cancer markers play a significant role in the diagnosis of the origin of cancers and in the detection of cancers from initial treatments. This is a challenging task owing to the heterogeneity nature of cancers. Identification of these markers could help in improving the survival rate of cancer patients, in which dedicated treatment can be provided according to the diagnosis or even prevention. Previous investigations show that the use of pathway topology information could help in the detection of cancer markers from gene expression. Such analysis reduces its complexity from thousands of genes to a few hundreds of pathways. However, most of the existing methods group different cancer subtypes into just disease samples, and consider all pathways contribute equally in the analysis process. Meanwhile, the interaction between multiple genes and the genes with missing edges has been ignored in several other methods, and hence could lead to the poor performance of the identification of cancer markers from gene expression. Thus, this research proposes enhanced directed random walk to identify pathway and gene markers for multiclass cancer gene expression data. Firstly, an improved pathway selection with analysis of variances (ANOVA) that enables the consideration of multiple cancer subtypes is performed, and subsequently the integration of k-mean clustering and average silhouette method in the directed random walk that considers the interaction of multiple genes is also conducted. The proposed methods are tested on benchmark gene expression datasets (breast, lung, and skin cancers) and biological pathways. The performance of the proposed methods is then measured and compared in terms of classification accuracy and area under the receiver operating characteristics curve (AUC). The results indicate that the proposed methods are able to identify a list of pathway and gene markers from the datasets with better classification accuracy and AUC. The proposed methods have improved the classification performance in the range of between 1% and 35% compared with existing methods. Cell cycle and p53 signaling pathway were found significantly associated with breast, lung, and skin cancers, while the cell cycle was highly enriched with squamous cell carcinoma and adenocarcinoma

    Meta Heuristics based Machine Learning and Neural Mass Modelling Allied to Brain Machine Interface

    Get PDF
    New understanding of the brain function and increasing availability of low-cost-non-invasive electroencephalograms (EEGs) recording devices have made brain-computer-interface (BCI) as an alternative option to augmentation of human capabilities by providing a new non-muscular channel for sending commands, which could be used to activate electronic or mechanical devices based on modulation of thoughts. In this project, our emphasis will be on how to develop such a BCI using fuzzy rule-based systems (FRBSs), metaheuristics and Neural Mass Models (NMMs). In particular, we treat the BCI system as an integrated problem consisting of mathematical modelling, machine learning and classification. Four main steps are involved in designing a BCI system: 1) data acquisition, 2) feature extraction, 3) classification and 4) transferring the classification outcome into control commands for extended peripheral capability. Our focus has been placed on the first three steps. This research project aims to investigate and develop a novel BCI framework encompassing classification based on machine learning, optimisation and neural mass modelling. The primary aim in this project is to bridge the gap of these three different areas in a bid to design a more reliable and accurate communication path between the brain and external world. To achieve this goal, the following objectives have been investigated: 1) Steady-State Visual Evoked Potential (SSVEP) EEG data are collected from human subjects and pre-processed; 2) Feature extraction procedure is implemented to detect and quantify the characteristics of brain activities which indicates the intention of the subject.; 3) a classification mechanism called an Immune Inspired Multi-Objective Fuzzy Modelling Classification algorithm (IMOFM-C), is adapted as a binary classification approach for classifying binary EEG data. Then, the DDAG-Distance aggregation approach is proposed to aggregate the outcomes of IMOFM-C based binary classifiers for multi-class classification; 4) building on IMOFM-C, a preference-based ensemble classification framework known as IMOFM-CP is proposed to enhance the convergence performance and diversity of each individual component classifier, leading to an improved overall classification accuracy of multi-class EEG data; and 5) finally a robust parameterising approach which combines a single-objective GA and a clustering algorithm with a set of newly devised objective and penalty functions is proposed to obtain robust sets of synaptic connectivity parameters of a thalamic neural mass model (NMM). The parametrisation approach aims to cope with nonlinearity nature normally involved in describing multifarious features of brain signals

    Genetic algorithm-neural network: feature extraction for bioinformatics data.

    Get PDF
    With the advance of gene expression data in the bioinformatics field, the questions which frequently arise, for both computer and medical scientists, are which genes are significantly involved in discriminating cancer classes and which genes are significant with respect to a specific cancer pathology. Numerous computational analysis models have been developed to identify informative genes from the microarray data, however, the integrity of the reported genes is still uncertain. This is mainly due to the misconception of the objectives of microarray study. Furthermore, the application of various preprocessing techniques in the microarray data has jeopardised the quality of the microarray data. As a result, the integrity of the findings has been compromised by the improper use of techniques and the ill-conceived objectives of the study. This research proposes an innovative hybridised model based on genetic algorithms (GAs) and artificial neural networks (ANNs), to extract the highly differentially expressed genes for a specific cancer pathology. The proposed method can efficiently extract the informative genes from the original data set and this has reduced the gene variability errors incurred by the preprocessing techniques. The novelty of the research comes from two perspectives. Firstly, the research emphasises on extracting informative features from a high dimensional and highly complex data set, rather than to improve classification results. Secondly, the use of ANN to compute the fitness function of GA which is rare in the context of feature extraction. Two benchmark microarray data have been taken to research the prominent genes expressed in the tumour development and the results show that the genes respond to different stages of tumourigenesis (i.e. different fitness precision levels) which may be useful for early malignancy detection. The extraction ability of the proposed model is validated based on the expected results in the synthetic data sets. In addition, two bioassay data have been used to examine the efficiency of the proposed model to extract significant features from the large, imbalanced and multiple data representation bioassay data

    Bioinformatics Applications Based On Machine Learning

    Get PDF
    The great advances in information technology (IT) have implications for many sectors, such as bioinformatics, and has considerably increased their possibilities. This book presents a collection of 11 original research papers, all of them related to the application of IT-related techniques within the bioinformatics sector: from new applications created from the adaptation and application of existing techniques to the creation of new methodologies to solve existing problems

    Enhancing remanufacturing automation using deep learning approach

    Get PDF
    In recent years, remanufacturing has significant interest from researchers and practitioners to improve efficiency through maximum value recovery of products at end-of-life (EoL). It is a process of returning used products, known as EoL products, to as-new condition with matching or higher warranty than the new products. However, these remanufacturing processes are complex and time-consuming to implement manually, causing reduced productivity and posing dangers to personnel. These challenges require automating the various remanufacturing process stages to achieve higher throughput, reduced lead time, cost and environmental impact while maximising economic gains. Besides, as highlighted by various research groups, there is currently a shortage of adequate remanufacturing-specific technologies to achieve full automation. -- This research explores automating remanufacturing processes to improve competitiveness by analysing and developing deep learning-based models for automating different stages of the remanufacturing processes. Analysing deep learning algorithms represents a viable option to investigate and develop technologies with capabilities to overcome the outlined challenges. Deep learning involves using artificial neural networks to learn high-level abstractions in data. Deep learning (DL) models are inspired by human brains and have produced state-of-the-art results in pattern recognition, object detection and other applications. The research further investigates the empirical data of torque converter components recorded from a remanufacturing facility in Glasgow, UK, using the in-case and cross-case analysis to evaluate the remanufacturing inspection, sorting, and process control applications. -- Nevertheless, the developed algorithm helped capture, pre-process, train, deploy and evaluate the performance of the respective processes. The experimental evaluation of the in-case and cross-case analysis using model prediction accuracy, misclassification rate, and model loss highlights that the developed models achieved a high prediction accuracy of above 99.9% across the sorting, inspection and process control applications. Furthermore, a low model loss between 3x10-3 and 1.3x10-5 was obtained alongside a misclassification rate that lies between 0.01% to 0.08% across the three applications investigated, thereby highlighting the capability of the developed deep learning algorithms to perform the sorting, process control and inspection in remanufacturing. The results demonstrate the viability of adopting deep learning-based algorithms in automating remanufacturing processes, achieving safer and more efficient remanufacturing. -- Finally, this research is unique because it is the first to investigate using deep learning and qualitative torque-converter image data for modelling remanufacturing sorting, inspection and process control applications. It also delivers a custom computational model that has the potential to enhance remanufacturing automation when utilised. The findings and publications also benefit both academics and industrial practitioners. Furthermore, the model is easily adaptable to other remanufacturing applications with minor modifications to enhance process efficiency in today's workplaces.In recent years, remanufacturing has significant interest from researchers and practitioners to improve efficiency through maximum value recovery of products at end-of-life (EoL). It is a process of returning used products, known as EoL products, to as-new condition with matching or higher warranty than the new products. However, these remanufacturing processes are complex and time-consuming to implement manually, causing reduced productivity and posing dangers to personnel. These challenges require automating the various remanufacturing process stages to achieve higher throughput, reduced lead time, cost and environmental impact while maximising economic gains. Besides, as highlighted by various research groups, there is currently a shortage of adequate remanufacturing-specific technologies to achieve full automation. -- This research explores automating remanufacturing processes to improve competitiveness by analysing and developing deep learning-based models for automating different stages of the remanufacturing processes. Analysing deep learning algorithms represents a viable option to investigate and develop technologies with capabilities to overcome the outlined challenges. Deep learning involves using artificial neural networks to learn high-level abstractions in data. Deep learning (DL) models are inspired by human brains and have produced state-of-the-art results in pattern recognition, object detection and other applications. The research further investigates the empirical data of torque converter components recorded from a remanufacturing facility in Glasgow, UK, using the in-case and cross-case analysis to evaluate the remanufacturing inspection, sorting, and process control applications. -- Nevertheless, the developed algorithm helped capture, pre-process, train, deploy and evaluate the performance of the respective processes. The experimental evaluation of the in-case and cross-case analysis using model prediction accuracy, misclassification rate, and model loss highlights that the developed models achieved a high prediction accuracy of above 99.9% across the sorting, inspection and process control applications. Furthermore, a low model loss between 3x10-3 and 1.3x10-5 was obtained alongside a misclassification rate that lies between 0.01% to 0.08% across the three applications investigated, thereby highlighting the capability of the developed deep learning algorithms to perform the sorting, process control and inspection in remanufacturing. The results demonstrate the viability of adopting deep learning-based algorithms in automating remanufacturing processes, achieving safer and more efficient remanufacturing. -- Finally, this research is unique because it is the first to investigate using deep learning and qualitative torque-converter image data for modelling remanufacturing sorting, inspection and process control applications. It also delivers a custom computational model that has the potential to enhance remanufacturing automation when utilised. The findings and publications also benefit both academics and industrial practitioners. Furthermore, the model is easily adaptable to other remanufacturing applications with minor modifications to enhance process efficiency in today's workplaces

    Experimental investigation and modelling of the heating value and elemental composition of biomass through artificial intelligence

    Get PDF
    Abstract: Knowledge advancement in artificial intelligence and blockchain technologies provides new potential predictive reliability for biomass energy value chain. However, for the prediction approach against experimental methodology, the prediction accuracy is expected to be high in order to develop a high fidelity and robust software which can serve as a tool in the decision making process. The global standards related to classification methods and energetic properties of biomass are still evolving given different observation and results which have been reported in the literature. Apart from these, there is a need for a holistic understanding of the effect of particle sizes and geospatial factors on the physicochemical properties of biomass to increase the uptake of bioenergy. Therefore, this research carried out an experimental investigation of some selected bioresources and also develops high-fidelity models built on artificial intelligence capability to accurately classify the biomass feedstocks, predict the main elemental composition (Carbon, Hydrogen, and Oxygen) on dry basis and the Heating value in (MJ/kg) of biomass...Ph.D. (Mechanical Engineering Science
    corecore