594 research outputs found

    Aco-based feature selection algorithm for classification

    Get PDF
    Dataset with a small number of records but big number of attributes represents a phenomenon called “curse of dimensionality”. The classification of this type of dataset requires Feature Selection (FS) methods for the extraction of useful information. The modified graph clustering ant colony optimisation (MGCACO) algorithm is an effective FS method that was developed based on grouping the highly correlated features. However, the MGCACO algorithm has three main drawbacks in producing a features subset because of its clustering method, parameter sensitivity, and the final subset determination. An enhanced graph clustering ant colony optimisation (EGCACO) algorithm is proposed to solve the three (3) MGCACO algorithm problems. The proposed improvement includes: (i) an ACO feature clustering method to obtain clusters of highly correlated features; (ii) an adaptive selection technique for subset construction from the clusters of features; and (iii) a genetic-based method for producing the final subset of features. The ACO feature clustering method utilises the ability of various mechanisms such as intensification and diversification for local and global optimisation to provide highly correlated features. The adaptive technique for ant selection enables the parameter to adaptively change based on the feedback of the search space. The genetic method determines the final subset, automatically, based on the crossover and subset quality calculation. The performance of the proposed algorithm was evaluated on 18 benchmark datasets from the University California Irvine (UCI) repository and nine (9) deoxyribonucleic acid (DNA) microarray datasets against 15 benchmark metaheuristic algorithms. The experimental results of the EGCACO algorithm on the UCI dataset are superior to other benchmark optimisation algorithms in terms of the number of selected features for 16 out of the 18 UCI datasets (88.89%) and the best in eight (8) (44.47%) of the datasets for classification accuracy. Further, experiments on the nine (9) DNA microarray datasets showed that the EGCACO algorithm is superior than the benchmark algorithms in terms of classification accuracy (first rank) for seven (7) datasets (77.78%) and demonstrates the lowest number of selected features in six (6) datasets (66.67%). The proposed EGCACO algorithm can be utilised for FS in DNA microarray classification tasks that involve large dataset size in various application domains

    Gene selection for cancer classification with the help of bees

    Full text link

    Molecular regulation of neutrophil swarming in health and disease: Lessons from the phagocyte oxidase

    Get PDF
    Neutrophil swarming is a complex coordinated process in which neutrophils sensing pathogen or damage signals are rapidly recruited to sites of infections or injuries. This process involves cooperation between neutrophils where autocrine and paracrine positive-feedback loops, mediated by receptor/ligand pairs including lipid chemoattractants and chemokines, amplify localized recruitment of neutrophils. This review will provide an overview of key pathways involved in neutrophil swarming and then discuss the cell intrinsic and systemic mechanisms by which NADPH oxidase 2 (NOX2) regulates swarming, including modulation of calcium signaling, inflammatory mediators, and the mobilization and production of neutrophils. We will also discuss mechanisms by which altered neutrophil swarming in disease may contribute to deficient control of infections and/or exuberant inflammation. Deeper understanding of underlying mechanisms controlling neutrophil swarming and how neutrophil cooperative behavior can be perturbed in the setting of disease may help to guide development of tools for diagnosis and precision medicine

    Mutable composite firefly algorithm for gene selection in microarray based cancer classification

    Get PDF
    Cancer classification is critical due to the strenuous effort required in cancer treatment and the rising cancer mortality rate. Recent trends with high throughput technologies have led to discoveries in terms of biomarkers that successfully contributed to cancerrelated issues. A computational approach for gene selection based on microarray data analysis has been applied in many cancer classification problems. However, the existing hybrid approaches with metaheuristic optimization algorithms in feature selection (specifically in gene selection) are not generalized enough to efficiently classify most cancer microarray data while maintaining a small set of genes. This leads to the classification accuracy and genes subset size problem. Hence, this study proposed to modify the Firefly Algorithm (FA) along with the Correlation-based Feature Selection (CFS) filter for the gene selection task. An improved FA was proposed to overcome FA slow convergence by generating mutable size solutions for the firefly population. In addition, a composite position update strategy was designed for the mutable size solutions. The proposed strategy was to balance FA exploration and exploitation in order to address the local optima problem. The proposed hybrid algorithm known as CFS-Mutable Composite Firefly Algorithm (CFS-MCFA) was evaluated on cancer microarray data for biomarker selection along with the deployment of Support Vector Machine (SVM) as the classifier. Evaluation was performed based on two metrics: classification accuracy and size of feature set. The results showed that the CFS-MCFA-SVM algorithm outperforms benchmark methods in terms of classification accuracy and genes subset size. In particular, 100 percent accuracy was achieved on all four datasets and with only a few biomarkers (between one and four). This result indicates that the proposed algorithm is one of the competitive alternatives in feature selection, which later contributes to the analysis of microarray data

    Ensemble of heterogeneous flexible neural trees using multiobjective genetic programming

    Get PDF
    Machine learning algorithms are inherently multiobjective in nature, where approximation error minimization and model's complexity simplification are two conflicting objectives. We proposed a multiobjective genetic programming (MOGP) for creating a heterogeneous flexible neural tree (HFNT), tree-like flexible feedforward neural network model. The functional heterogeneity in neural tree nodes was introduced to capture a better insight of data during learning because each input in a dataset possess different features. MOGP guided an initial HFNT population towards Pareto-optimal solutions, where the final population was used for making an ensemble system. A diversity index measure along with approximation error and complexity was introduced to maintain diversity among the candidates in the population. Hence, the ensemble was created by using accurate, structurally simple, and diverse candidates from MOGP final population. Differential evolution algorithm was applied to fine-tune the underlying parameters of the selected candidates. A comprehensive test over classification, regression, and time-series datasets proved the efficiency of the proposed algorithm over other available prediction methods. Moreover, the heterogeneous creation of HFNT proved to be efficient in making ensemble system from the final population

    Evolution, epistasis, and the genotype-to-phenotype problem in Myxococcus xanthus.

    Get PDF
    The complex social behavior of M. xanthus makes it an excellent model system to study the relationship between genotype and phenotype. Under nutrient rich conditions, a swarm of M. xanthus cells coordinate their movement outward in search of prey. When starved, cells condense into multicellular structures called aggregates. Taken together, these two aspects of the M. xanthus life cycle display several sub-traits that are used to describe its phenotype. Furthermore, the genome of M. xanthus is large, encoding a predicted 7,314 genes, many of which have been linked to aspects of its multicellular phenotype. This work presented here addresses the genotype-to-phenotype (G2P) problem as it relates to the annotation of a biological process in a model system. The first project addresses G2P from a population genetics approach; we constructed a mutant strain library consisting of 180 single gene knockouts of the ABC transporter superfamily of genes to examine the distribution of mutant phenotypes among an entire group of genes. While the phenotype of only ~10% of mutants show extreme defects, more than three quarters of mutants are parsed into different categories of phenotypic deviation following our analyses. Our results demonstrate that strong mutant phenotypes are uncommon, but the majority of null mutants are phenotypically distinct from wild type in at least one trait. Thus, a more comprehensive understanding of the M. xanthus phenome will help elucidate the biological function of many uncharacterized genes. The second part of this dissertation examines the evolution of M. xanthus as it has been studied as a model organism in different laboratories. Disrupting a gene, or mutating a single nucleotide, may have no discernable impact on the organism\u27s phenotype by itself, but may still substantially affect the phenotypes of additional mutation through epistasis. This is an ongoing phenomena in M. xanthus; whole genome resequencing of several inter-laboratory isolates of M. xanthus wild type DK1622 reveals genomic variation that has resulted in significant phenotypic variation. We demonstrate that the naturally occurring genetic variants among wild type isolates is sufficient to mask the effect of a targeted mutation in one isolate that is significant in another. These results are the first to indicate that isolates of wild type M. xanthus DK1622 have evolved to a functionally significant degree

    Platelet Diagnostics:A novel liquid biomarker

    Get PDF
    The aim of this thesis is to find a novel liquid biomarker for the detection of cancer and to optimize treatment. The first chapter gives an introduction to the oncology biomarker field and focuses on platelets and their role in cancer. In part 1, we evaluate extracellular vesicles (EVs). EVs are small vesicles released by all types of cells, including tumor cells, into the circulation. They carry protein kinases and can be isolated from plasma. We demonstrate that AKT and ERK kinase protein levels in EVs reflect the cellular expression levels and treatment with kinase inhibitors alters their concentration, depending on the clinical response to the drug. Therefore, EVs may provide a promising biomarker biosource for monitoring of treatment responses. Part 2 starts with reviews describing the function and role of platelets in greater depth. Chapter 3 focusses on thrombocytogenesis and several biological processes in which platelets play a role. Furthermore, the RNA processing machineries harboured by platelets are discussed. Both chapter 3 and 4 evaluate the change platelets undergo after being exposed to tumor and its environment. The exchange of biomolecules with tumor cells results in educated platelets, so-called tumor educated platelets (TEPs). TEPs play a role in several hallmarks of cancer and have the ability to respond to systemic alterations making them an interesting biomarker. In chapter 5 the diagnostic potential of platelets is first discussed. We determine their potential by sequencing the RNA of 283 platelet samples, of which 228 are patients with cancer, and 55 are healthy controls. We reach an accuracy of 96%. Furthermore, we are able to pinpoint the location of the primary tumor with an accuracy of 71%. In part 3, our developed thromboSeq platform is taken to the next level. Several potential confounding factors are taken into account such as age and comorbidity. We show that particle-swarm optimization (PSO)-enhanced algorithms enable efficient selection of RNA biomarker panels. In a validation cohort we apply these algorithms to non-small-cell lung cancer and reach an accuracy of 88% in late stage (n=518) and early-stage 81% accuracy. Finally, in chapter 7 we describe our wet- and dry-lab protocols in detail. This includes platelet RNA isolation, mRNA amplification, and preparation for next-generation sequencing. The dry-lab protocol describes the automated FASTQ file pre-processing to quantified gene counts, quality controls, data normalization and correction, and swarm intelligence-enhanced support vector machine (SVM) algorithm development. Part 4 focuses on central nervous system (CNS) malignancies especially on glioblastoma. Chapter 8 gives an overview of the different liquid biomarkers for diffuse glioma, the most common primary CNS malignancy. In chapter 9 we assess the specificity of the platelet education due to glioblastoma by comparing the RNA profile of TEPs from glioblastoma patients with a neuroinflammatory disease and brain metastasis patients. This results in a detection accuracy of 80%. Secondly, analysis of patients with glioblastoma versus healthy controls in an independent validation series provide a detection accuracy of 95%. Furthermore, we describe the potential value of platelets as a monitoring biomarker for patients with glioma, distinguishing pseudoprogression from real tumor progression. In part 5 thromboSeq is applied to breast cancer diagnostics both as a screening tool in the general population and in a high risk population, BRCA mutated women. In chapter 11 we first apply our technique to an inflammatory condition, multiple sclerosis (MS). Platelet RNA is used as input for the development of a diagnostic MS classifier capable of detecting MS with 80% accuracy in the independent validation series. In the final part we conclude this thesis with a general discussion of the main findings and suggestions for future research

    Artificial immune systems based committee machine for classification application

    Get PDF
    This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.A new adaptive learning Artificial Immune System (AIS) based committee machine is developed in this thesis. The new proposed approach efficiently tackles the general problem of clustering high-dimensional data. In addition, it helps on deriving useful decision and results related to other application domains such classification and prediction. Artificial Immune System (AIS) is a branch of computational intelligence field inspired by the biological immune system, and has gained increasing interest among researchers in the development of immune-based models and techniques to solve diverse complex computational or engineering problems. This work presents some applications of AIS techniques to health problems, and a thorough survey of existing AIS models and algorithms. The main focus of this research is devoted to building an ensemble model integrating different AIS techniques (i.e. Artificial Immune Networks, Clonal Selection, and Negative Selection) for classification applications to achieve better classification results. A new AIS-based ensemble architecture with adaptive learning features is proposed by integrating different learning and adaptation techniques to overcome individual limitations and to achieve synergetic effects through the combination of these techniques. Various techniques related to the design and enhancements of the new adaptive learning architecture are studied, including a neuro-fuzzy based detector and an optimizer using particle swarm optimization method to achieve enhanced classification performance. An evaluation study was conducted to show the performance of the new proposed adaptive learning ensemble and to compare it to alternative combining techniques. Several experiments are presented using different medical datasets for the classification problem and findings and outcomes are discussed. The new adaptive learning architecture improves the accuracy of the ensemble. Moreover, there is an improvement over the existing aggregation techniques. The outcomes, assumptions and limitations of the proposed methods with its implications for further research in this area draw this research to its conclusion

    Modélisation formelle des systèmes de détection d'intrusions

    Get PDF
    L’écosystème de la cybersécurité évolue en permanence en termes du nombre, de la diversité, et de la complexité des attaques. De ce fait, les outils de détection deviennent inefficaces face à certaines attaques. On distingue généralement trois types de systèmes de détection d’intrusions : détection par anomalies, détection par signatures et détection hybride. La détection par anomalies est fondée sur la caractérisation du comportement habituel du système, typiquement de manière statistique. Elle permet de détecter des attaques connues ou inconnues, mais génère aussi un très grand nombre de faux positifs. La détection par signatures permet de détecter des attaques connues en définissant des règles qui décrivent le comportement connu d’un attaquant. Cela demande une bonne connaissance du comportement de l’attaquant. La détection hybride repose sur plusieurs méthodes de détection incluant celles sus-citées. Elle présente l’avantage d’être plus précise pendant la détection. Des outils tels que Snort et Zeek offrent des langages de bas niveau pour l’expression de règles de reconnaissance d’attaques. Le nombre d’attaques potentielles étant très grand, ces bases de règles deviennent rapidement difficiles à gérer et à maintenir. De plus, l’expression de règles avec état dit stateful est particulièrement ardue pour reconnaître une séquence d’événements. Dans cette thèse, nous proposons une approche stateful basée sur les diagrammes d’état-transition algébriques (ASTDs) afin d’identifier des attaques complexes. Les ASTDs permettent de représenter de façon graphique et modulaire une spécification, ce qui facilite la maintenance et la compréhension des règles. Nous étendons la notation ASTD avec de nouvelles fonctionnalités pour représenter des attaques complexes. Ensuite, nous spécifions plusieurs attaques avec la notation étendue et exécutons les spécifications obtenues sur des flots d’événements à l’aide d’un interpréteur pour identifier des attaques. Nous évaluons aussi les performances de l’interpréteur avec des outils industriels tels que Snort et Zeek. Puis, nous réalisons un compilateur afin de générer du code exécutable à partir d’une spécification ASTD, capable d’identifier de façon efficiente les séquences d’événements.Abstract : The cybersecurity ecosystem continuously evolves with the number, the diversity, and the complexity of cyber attacks. Generally, we have three types of Intrusion Detection System (IDS) : anomaly-based detection, signature-based detection, and hybrid detection. Anomaly detection is based on the usual behavior description of the system, typically in a static manner. It enables detecting known or unknown attacks but also generating a large number of false positives. Signature based detection enables detecting known attacks by defining rules that describe known attacker’s behavior. It needs a good knowledge of attacker behavior. Hybrid detection relies on several detection methods including the previous ones. It has the advantage of being more precise during detection. Tools like Snort and Zeek offer low level languages to represent rules for detecting attacks. The number of potential attacks being large, these rule bases become quickly hard to manage and maintain. Moreover, the representation of stateful rules to recognize a sequence of events is particularly arduous. In this thesis, we propose a stateful approach based on algebraic state-transition diagrams (ASTDs) to identify complex attacks. ASTDs allow a graphical and modular representation of a specification, that facilitates maintenance and understanding of rules. We extend the ASTD notation with new features to represent complex attacks. Next, we specify several attacks with the extended notation and run the resulting specifications on event streams using an interpreter to identify attacks. We also evaluate the performance of the interpreter with industrial tools such as Snort and Zeek. Then, we build a compiler in order to generate executable code from an ASTD specification, able to efficiently identify sequences of events
    corecore