8 research outputs found

    Cat Swarm based Optimization of Gene Expression Data Classification

    Get PDF
    Abstract-An Artificial Neural Network (ANN) does have the capability to provide solutions of various complex problems. The generalization ability of ANN due to the massively parallel processing capability can be utilized to learn the patterns discovered in the data set which can be represented in terms of a set of rules. This rule can be used to find the solution to a classification problem. The learning ability of the ANN is degraded due to the high dimensionality of the datasets. Hence, to minimize this risk we have used Principal Component Analysis (PCA) and Factor Analysis (FA) which provides a feature reduced dataset to the Multi Layer Perceptron (MLP), the classifier used. Again, since the weight matrices are randomly initialized, hence, in this paper we have used Cat Swarm Optimization (CSO) method to update the weight values of the weight matrix. From the experimental evaluation, it was found that using CSO with the MLP classifier provides better classification accuracy as compared to when the classifier is solely used

    An improved bees algorithm local search mechanism for numerical dataset

    Get PDF
    Bees Algorithm (BA), a heuristic optimization procedure, represents one of the fundamental search techniques is based on the food foraging activities of bees. This algorithm performs a kind of exploitative neighbourhoods search combined with random explorative search. However, the main issue of BA is that it requires long computational time as well as numerous computational processes to obtain a good solution, especially in more complicated issues. This approach does not guarantee any optimum solutions for the problem mainly because of lack of accuracy. To solve this issue, the local search in the BA is investigated by Simple swap, 2-Opt and 3-Opt were proposed as Massudi methods for Bees Algorithm Feature Selection (BAFS). In this study, the proposed extension methods is 4-Opt as search neighbourhood is presented. This proposal was implemented and comprehensively compares and analyse their performances with respect to accuracy and time. Furthermore, in this study the feature selection algorithm is implemented and tested using most popular dataset from Machine Learning Repository (UCI). The obtained results from experimental work confirmed that the proposed extension of the search neighbourhood including 4-Opt approach has provided better accuracy with suitable time than the Massudi methods

    Knowledge management overview of feature selection problem in high-dimensional financial data: Cooperative co-evolution and Map Reduce perspectives

    Get PDF
    The term big data characterizes the massive amounts of data generation by the advanced technologies in different domains using 4Vs volume, velocity, variety, and veracity-to indicate the amount of data that can only be processed via computationally intensive analysis, the speed of their creation, the different types of data, and their accuracy. High-dimensional financial data, such as time-series and space-Time data, contain a large number of features (variables) while having a small number of samples, which are used to measure various real-Time business situations for financial organizations. Such datasets are normally noisy, and complex correlations may exist between their features, and many domains, including financial, lack the al analytic tools to mine the data for knowledge discovery because of the high-dimensionality. Feature selection is an optimization problem to find a minimal subset of relevant features that maximizes the classification accuracy and reduces the computations. Traditional statistical-based feature selection approaches are not adequate to deal with the curse of dimensionality associated with big data. Cooperative co-evolution, a meta-heuristic algorithm and a divide-And-conquer approach, decomposes high-dimensional problems into smaller sub-problems. Further, MapReduce, a programming model, offers a ready-To-use distributed, scalable, and fault-Tolerant infrastructure for parallelizing the developed algorithm. This article presents a knowledge management overview of evolutionary feature selection approaches, state-of-The-Art cooperative co-evolution and MapReduce-based feature selection techniques, and future research directions

    Evolutionary approaches for feature selection in biological data

    Get PDF
    Data mining techniques have been used widely in many areas such as business, science, engineering and medicine. The techniques allow a vast amount of data to be explored in order to extract useful information from the data. One of the foci in the health area is finding interesting biomarkers from biomedical data. Mass throughput data generated from microarrays and mass spectrometry from biological samples are high dimensional and is small in sample size. Examples include DNA microarray datasets with up to 500,000 genes and mass spectrometry data with 300,000 m/z values. While the availability of such datasets can aid in the development of techniques/drugs to improve diagnosis and treatment of diseases, a major challenge involves its analysis to extract useful and meaningful information. The aims of this project are: 1) to investigate and develop feature selection algorithms that incorporate various evolutionary strategies, 2) using the developed algorithms to find the “most relevant” biomarkers contained in biological datasets and 3) and evaluate the goodness of extracted feature subsets for relevance (examined in terms of existing biomedical domain knowledge and from classification accuracy obtained using different classifiers). The project aims to generate good predictive models for classifying diseased samples from control

    Mechanisms Regulating HIV-1 Protease Activity

    Get PDF
    The Human Immunodeficiency Virus Type 1 (HIV-1) Protease (PR) has no direct involvement in the early steps of HIV-1 replication. Nonetheless, it is the timely and ordered processing of the viral structural proteins by the HIV-1 PR during virion maturation that facilitates the successful completion of virus entry, reverse transcription, and integration. Though a considerable amount of research has been devoted to deciphering how the enzyme prepares a virus particle for infection, the mechanisms regulating its activities continue to remain incompletely defined. RNA serves as one putative regulatory factor, since efficient processing of the maturation intermediate p15NC requires RNA in vitro. Though previously believed relevant to only p15NC cleavage, I demonstrate that RNA enhances HIV-1 proteolysis reactions in a substrate-independent manner. The increased catalytic activity of the HIV-1 PR results from a direct interaction between RNA and the enzyme, with the magnitude of the effect dependent upon the size of the RNA molecule. Large (>400 base) RNAs accelerated proteolytic processing by over 100-fold under near-physiological conditions. This considerable change stemmed from both improved substrate recognition (Km) and turnover rate (kcat). Variability in amino acid sequence also guides HIV-1 PR activity. However, the absence of any overt patterns across HIV-1 cleavage sites has complicated the delineation of why these differences result in diverse processing efficiencies. To address this question, I generated the largest-to-date dataset of globular proteins cleaved by the HIV-1 PR in near-physiological conditions. From these data, I unravel a number of site-specific processing requirements, and identify potentially important relationships shared between multiple cleavage sites. These results additionally enabled the formation of a preliminary conceptual model for explaining processing site amino acid composition.Doctor of Philosoph

    Intelligent Systems Approach for Classification and Management of Patients with Headache

    Get PDF
    Primary headache disorders are the most common complaints worldwide. The socioeconomic and personal impact of headache disorders is enormous, as it is the leading cause of workplace absence. Headache patients’ consultations are increasing as the population has increased in size, live longer and many people have multiple conditions, however, access to specialist services across the UK is currently inequitable because the numbers of trained consultant neurologists in the UK are 10 times lower than other European countries. Additionally, more than two third of headache cases presented to primary care were labelled with unspecified headache. Therefore, an alternative pathway to diagnose and manage patients with primary headache could be crucial to reducing the need for specialist assessment and increase capacity within the current service model. Several recent studies have targeted this issue through the development of clinical decision support systems, which can help non-specialist doctors and general practitioners to diagnose patients with primary headache disorders in primary clinics. However, the majority of these studies were following a rule-based system style, in which the rules were summarised and expressed by a computer engineer. This style carries many downsides, and we will discuss them later on in this dissertation. In this study, we are adopting a completely different approach. The use of machine learning is recruited for the classification of primary headache disorders, for which a dataset of 832 records of patients with primary headaches was considered, originating from three medical centres located in Turkey. Three main types of primary headaches were derived from the data set including Tension Type Headache in both episodic and chronic forms, Migraine with and without Aura, followed by Trigeminal Autonomic Cephalalgia that further subdivided into Cluster headache, paroxysmal hemicrania and short-lasting unilateral neuralgiform headache attacks with conjunctival injection and tearing. Six popular machine-learning based classifiers, including linear and non-linear ensemble learning, in addition to one regression based procedure, have been evaluated for the classification of primary headaches within a supervised learning setting, achieving highest aggregate performance outcomes of AUC 0.923, sensitivity 0.897, and overall classification accuracy of 0.843. This study also introduces the proposed HydroApp system, which is an M-health based personalised application for the follow-up of patients with long-term conditions such as chronic headache and hydrocephalus. We managed to develop this system with the supervision of headache specialists at Ashford hospital, London, and neurology experts at Walton Centre and Alder Hey hospital Liverpool. We have successfully investigated the acceptance of using such an M-health based system via an online questionnaire, where 86% of paediatric patients and 60% of adult patients were interested in using HydroApp system to manage their conditions. Features and functions offered by HydroApp system such as recording headache score, recording of general health and well-being as well as alerting the treating team, have been perceived as very or extremely important aspects from patients’ point of view. The study concludes that the advances in intelligent systems and M-health applications represent a promising atmosphere through which to identify alternative solutions, which in turn increases the capacity in the current service model and improves diagnostic capability in the primary headache domain and beyond

    Knowledge extraction from biomedical data using machine learning

    Get PDF
    PhD ThesisThanks to the breakthroughs in biotechnologies that have occurred during the recent years, biomedical data is accumulating at a previously unseen pace. In the field of biomedicine, decades-old statistical methods are still commonly used to analyse such data. However, the simplicity of these approaches often limits the amount of useful information that can be extracted from the data. Machine learning methods represent an important alternative due to their ability to capture complex patterns, within the data, likely missed by simpler methods. This thesis focuses on the extraction of useful knowledge from biomedical data using machine learning. Within the biomedical context, the vast majority of machine learning applications focus their e↵ort on the generation and validation of prediction models. Rarely the inferred models are used to discover meaningful biomedical knowledge. The work presented in this thesis goes beyond this scenario and devises new methodologies to mine machine learning models for the extraction of useful knowledge. The thesis targets two important and challenging biomedical analytic tasks: (1) the inference of biological networks and (2) the discovery of biomarkers. The first task aims to identify associations between di↵erent biological entities, while the second one tries to discover sets of variables that are relevant for specific biomedical conditions. Successful solutions for both problems rely on the ability to recognise complex interactions within the data, hence the use of multivariate machine learning methods. The network inference problem is addressed with FuNeL: a protocol to generate networks based on the analysis of rule-based machine learning models. The second task, the biomarker discovery, is studied with RGIFE, a heuristic that exploits the information extracted from machine learning models to guide its search for minimal subsets of variables. The extensive analysis conducted for this dissertation shows that the networks inferred with FuNeL capture relevant knowledge complementary to that extracted by standard inference methods. Furthermore, the associations defined by FuNeL are discovered - 6 - more pertinent in a disease context. The biomarkers selected by RGIFE are found to be disease-relevant and to have a high predictive power. When applied to osteoarthritis data, RGIFE confirmed the importance of previously identified biomarkers, whilst also extracting novel biomarkers with possible future clinical applications. Overall, the thesis shows new e↵ective methods to leverage the information, often remaining buried, encapsulated within machine learning models and discover useful biomedical knowledge.European Union Seventh Framework Programme (FP7/2007- 2013) that funded part of this work under the “D-BOARD” project (grant agreement number 305815)
    corecore