Search CORE

876 research outputs found

Machine learning for network based intrusion detection: an investigation into discrepancies in findings with the KDD cup '99 data set and multi-objective evolution of neural network classifier ensembles from imbalanced data.

Author: Engen Vegard
Publication venue
Publication date
Field of study

For the last decade it has become commonplace to evaluate machine learning techniques for network based intrusion detection on the KDD Cup '99 data set. This data set has served well to demonstrate that machine learning can be useful in intrusion detection. However, it has undergone some criticism in the literature, and it is out of date. Therefore, some researchers question the validity of the findings reported based on this data set. Furthermore, as identified in this thesis, there are also discrepancies in the findings reported in the literature. In some cases the results are contradictory. Consequently, it is difficult to analyse the current body of research to determine the value in the findings. This thesis reports on an empirical investigation to determine the underlying causes of the discrepancies. Several methodological factors, such as choice of data subset, validation method and data preprocessing, are identified and are found to affect the results significantly. These findings have also enabled a better interpretation of the current body of research. Furthermore, the criticisms in the literature are addressed and future use of the data set is discussed, which is important since researchers continue to use it due to a lack of better publicly available alternatives. Due to the nature of the intrusion detection domain, there is an extreme imbalance among the classes in the KDD Cup '99 data set, which poses a significant challenge to machine learning. In other domains, researchers have demonstrated that well known techniques such as Artificial Neural Networks (ANNs) and Decision Trees (DTs) often fail to learn the minor class(es) due to class imbalance. However, this has not been recognized as an issue in intrusion detection previously. This thesis reports on an empirical investigation that demonstrates that it is the class imbalance that causes the poor detection of some classes of intrusion reported in the literature. An alternative approach to training ANNs is proposed in this thesis, using Genetic Algorithms (GAs) to evolve the weights of the ANNs, referred to as an Evolutionary Neural Network (ENN). When employing evaluation functions that calculate the fitness proportionally to the instances of each class, thereby avoiding a bias towards the major class(es) in the data set, significantly improved true positive rates are obtained whilst maintaining a low false positive rate. These findings demonstrate that the issues of learning from imbalanced data are not due to limitations of the ANNs; rather the training algorithm. Moreover, the ENN is capable of detecting a class of intrusion that has been reported in the literature to be undetectable by ANNs. One limitation of the ENN is a lack of control of the classification trade-off the ANNs obtain. This is identified as a general issue with current approaches to creating classifiers. Striving to create a single best classifier that obtains the highest accuracy may give an unfruitful classification trade-off, which is demonstrated clearly in this thesis. Therefore, an extension of the ENN is proposed, using a Multi-Objective GA (MOGA), which treats the classification rate on each class as a separate objective. This approach produces a Pareto front of non-dominated solutions that exhibit different classification trade-offs, from which the user can select one with the desired properties. The multi-objective approach is also utilised to evolve classifier ensembles, which yields an improved Pareto front of solutions. Furthermore, the selection of classifier members for the ensembles is investigated, demonstrating how this affects the performance of the resultant ensembles. This is a key to explaining why some classifier combinations fail to give fruitful solutions

Bournemouth University Research Online

Computational prediction of the human-microbial oral interactome

Author: Arrais Joel P.
Barros Marlene
Coelho Edgar D.
Correia Maria J.
Matos Sérgio
Oliveira José L.
Pereira Carlos
Rosa Nuno
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/02/2014
Field of study

Background: The oral cavity is a complex ecosystem where human chemical compounds coexist with a particular microbiota. However, shifts in the normal composition of this microbiota may result in the onset of oral ailments, such as periodontitis and dental caries. In addition, it is known that the microbial colonization of the oral cavity is mediated by protein-protein interactions (PPIs) between the host and microorganisms. Nevertheless, this kind of PPIs is still largely undisclosed. To elucidate these interactions, we have created a computational prediction method that allows us to obtain a first model of the Human-Microbial oral interactome.Results: We collected high-quality experimental PPIs from five major human databases. The obtained PPIs were used to create our positive dataset and, indirectly, our negative dataset. The positive and negative datasets were merged and used for training and validation of a naïve Bayes classifier. For the final prediction model, we used an ensemble methodology combining five distinct PPI prediction techniques, namely: literature mining, primary protein sequences, orthologous profiles, biological process similarity, and domain interactions. Performance evaluation of our method revealed an area under the ROC-curve (AUC) value greater than 0.926, supporting our primary hypothesis, as no single set of features reached an AUC greater than 0.877. After subjecting our dataset to the prediction model, the classified result was filtered for very high confidence PPIs (probability ≥ 1-10-7), leading to a set of 46,579 PPIs to be further explored.Conclusions: We believe this dataset holds not only important pathways involved in the onset of infectious oral diseases, but also potential drug-targets and biomarkers. The dataset used for training and validation, the predictions obtained and the network final network are available at http://bioinformatics.ua.pt/software/oralint.info:eu-repo/semantics/publishedVersio

Crossref

Springer - Publisher Connector

PubMed Central

Estudo Geral

Repositório Institucional da Universidade Católica Portuguesa

Bio-inspired computation: where we stand and what's next

Author: Abouhawwash
Abraham
Afifi
Ahmadi-Javid
Al Amro
Al-Faris
Alba
Alba
Alba
Alba
Alba
Amine Bouhlel
Andrade
Andres
Andrés-Pérez
Andrés-Pérez
Antonio
Antonio
Antonio
Arcuri
Arnold
Arnold
Atashpaz-Gargari
Atencia
Auger
Auger
Awad
Awais
Baringo
Barrera
Barták
Basak
Beale
Bechikh
Bello-Orgaz
Ben-Tal
Bermejo
Bertsimas
Bessaou
Beume
Beyer
Bhosekar
Biamonte
Binitha
Biswas
Biswas
Blackwell
Blanchard
Bokrantz
Bonabeau
Bouhlel
Branke
Brest
Bucking
Burke
Burke
Bäck
Camacho
Camacho-Villalón
Cao
Cao
Cao
Carrasco
Chen
Chen
Chen
Cheng
Cheng
Cheng
Chica
Chicano
Choraś
Christelis
Ciliberto
Clerc
Cobb
Coello
Coello Coello
Coello Coello
Collette
Cowling
Cruz
Cuadra
Cui
Dantzig
Das
Das
Das
Das
De Falco
de França
De Jong
Deb
Deb
Deb
Deb
Del Ser
Demertzis
Derrac
Diez-Olivan
Dilek
Dorigo
Drugan
Du
Duan
Duarte
Durillo
Easum
Eberhart
Eiben
Eiben
Eichfelder
Elsayed
Engelbrecht
Epitropakis
Eskandar
Falcón-Cardona
Farina
Fazenda
Fiore
Fister
Fister
Fogel
Forrester
Fu
Gal
Gamarra
Gao
Garcia
García-Martínez
Gen
Gen
Ghaheri
Goh
Goldberg
Gong
Gong
Gonzalez-Pardo
Gonzalez-Pardo
González-Pardo
González-Pardo
Gonçalves
Grandell
Greene
Grobler
Gutjahr
Gálvez
Gómez
Gómez
Hadka
Han
Hansen
Hansen
Hansen
Hansen
He
Hellwig
Holland
Hong
Hooper
Hu
Hu
Huband
Hussain
Hussain
Igel
Igel
Inuiguchi
Ishibuchi
Ishibuchi
Jabbarpour
Jaimes
Jana
Janson
Janson
Jena
Jiang
Jiang
Jin
Jin
Jin
Jones
Jordehi
Joyce
Kalyanmoy
Kamyab
Kar
Kar
Karaboga
Karafotias
Karnan
Kashan
Kim
Komatsu
Kononova
Koza
Koziel
Koziel
Krasnogor
Krasnogor
Kuhn
Kusyk
Lara
Lara-Cabrera
Laszczyk
LaTorre
LaTorre
Lee
Lehman
Lehre
Leskinen
Li
Li
Li
Li
Li
Li
Li
Li
Li
Liang
Liang
Liang
Liang
Liao
Lim
Lin
Liu
Liu
Liu
Liu
Logenthiran
Logeswari
Lones
Lozano
Lu
Lucas
Lynn
Lynn
Lynn
López-Ibáńez
Ma
Maashi
Mahdavi
Mahdavi
Mahdavi
Mahfoud
Malikopoulos
Mallipeddi
Mallipeddi
Mandal
Mane
Martinez
Martí
Mashwani
Maul
Maučec
Mavrovouniotis
Mazzara
McClymont
Melcer
Mendiburu
Meuth
Miikkulainen
Molina
Molina
Molina
Montana
Moscato
Moscato
Moser
Mostaghim
Müller
Müller
Naldi
Nannen
Nebro
Neri
Neumann
Nguyen
Nguyen
Ni
Nogueira Collazo
Novoa-Hernández
Oliveira
Omidvar
Omidvar
Omidvar
Ong
Ong
Orgaz
Parpinelli
Passino
Payne
Peng
Peng
Pescador-Rojas
Piotrowski
Piotrowski
Piotrowski
Piotrowski
Pitzer
Pizzuti
Polakova
Potter
Potter
Pošík
Praditwong
Prebeg
Pétrowski
Qian
Qin
Qu
Qu
Qu
Qu
Queipo
Rajasekhar
Rakshit
Ramírez-Gallego
Rashedi
Ray
Rechenberg
Recio
Remde
Ross
Ross
Rothlauf
Roy
Sahinidis
Saka
Salcedo-Sanz
Salcedo-Sanz
Salcedo-Sanz
Salcedo-Sanz
Salcedo-Sanz
Sareni
Schumacher
Schutze
Schwefel
Senapati
Seredynski
Sergeyev
Shaker
Shang
Shen
Simon
Sivakumar
Smit
Smit
Smith
Smith
Soleimani
Srinivas
Stanley
Starzynski
Storn
Subbu
Subbu
Such
Suganthan
Suganthan
Suganthan
Suganthi
Suganuma
Sun
Sun
Sutton
Swan
Sörensen
Talbi
Tanabe
Tanabe
Tang
Tang
Tang
Tassiulas
Ter Braak
Thangavel
Thomsen
Tintner
Tomassini
Tricoire
Tsai
Tsang
Ursem
Vafaee
Verma
Verma
Vitaliy
Vrugt
Vrugt
Vrugt
Vrugt
Walker
Wang
Wang
Wang
Wang
Wang
Wang
Wari
Weber
Wedyan
Wessing
Weyland
Whitley
Whitley
Woldesenbet
Wolpert
Wu
Wu
Xiao
Xiong
Xue
Xue
Yang
Yang
Yang
Yang
Yang
Yang
Yang
Yang
Yang
Yang
Yanıkoğlu
Yannakakis
Yazdani
Yazdi
Yu
Yu
Yu
Yue
Zabinsky
Zainud-Deen
Zhang
Zhang
Zhang
Zhang
Zhao
Zhao
Zhao
Zhou
Zhou
Zhu
Zhuang
Zille
Zille
Zitzler
Özcan
Črepinšek
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

In recent years, the research community has witnessed an explosion of literature dealing with the adaptation of behavioral patterns and social phenomena observed in nature towards efficiently solving complex computational tasks. This trend has been especially dramatic in what relates to optimization problems, mainly due to the unprecedented complexity of problem instances, arising from a diverse spectrum of domains such as transportation, logistics, energy, climate, social networks, health and industry 4.0, among many others. Notwithstanding this upsurge of activity, research in this vibrant topic should be steered towards certain areas that, despite their eventual value and impact on the field of bio-inspired computation, still remain insufficiently explored to date. The main purpose of this paper is to outline the state of the art and to identify open challenges concerning the most relevant areas within bio-inspired optimization. An analysis and discussion are also carried out over the general trajectory followed in recent years by the community working in this field, thereby highlighting the need for reaching a consensus and joining forces towards achieving valuable insights into the understanding of this family of optimization techniques

Crossref

Middlesex University Research Repository

DR-NTU (Digital Repository of NTU)

Bio-inspired computation: where we stand and what's next

Author: Camacho D.
Camacho D.
Coello Coello C.
Coello Coello C.
Das S.
Das S.
Del Ser J.
Del Ser J.
Herrera F.
Herrera F.
Molina D.
Molina D.
Osaba E.
Osaba E.
Salcedo-Sanz S.
Salcedo-Sanz S.
Suganthan P.
Suganthan P.
Yang X.
Yang X.
Publication venue: Elsevier
Publication date: 01/01/2019
Field of study

Middlesex University Research Repository

Seeking multiple solutions:an updated survey on niching methods and their applications

Author: Deb Kalyanmoy
Engelbrecht Andries
Epitropakis Michael G.
Li Xiaodong
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2017
Field of study

Multi-Modal Optimization (MMO) aiming to locate multiple optimal (or near-optimal) solutions in a single simulation run has practical relevance to problem solving across many fields. Population-based meta-heuristics have been shown particularly effective in solving MMO problems, if equipped with specificallydesigned diversity-preserving mechanisms, commonly known as niching methods. This paper provides an updated survey on niching methods. The paper first revisits the fundamental concepts about niching and its most representative schemes, then reviews the most recent development of niching methods, including novel and hybrid methods, performance measures, and benchmarks for their assessment. Furthermore, the paper surveys previous attempts at leveraging the capabilities of niching to facilitate various optimization tasks (e.g., multi-objective and dynamic optimization) and machine learning tasks (e.g., clustering, feature selection, and learning ensembles). A list of successful applications of niching methods to real-world problems is presented to demonstrate the capabilities of niching methods in providing solutions that are difficult for other optimization methods to offer. The significant practical value of niching methods is clearly exemplified through these applications. Finally, the paper poses challenges and research questions on niching that are yet to be appropriately addressed. Providing answers to these questions is crucial before we can bring more fruitful benefits of niching to real-world problem solving

Crossref

Lancaster E-Prints

UPSpace at the University of Pretoria

Machine learning for network based intrusion detection : an investigation into discrepancies in findings with the KDD cup '99 data set and multi-objective evolution of neural network classifier ensembles from imbalanced data

Author: Engen Vegard
Publication venue
Publication date: 01/01/2010
Field of study

OpenGrey Repository

Robust Algorithms for Detecting Hidden Structure in Biological Data

Author: Sloutsky Roman
Publication venue: Washington University Open Scholarship
Publication date: 15/08/2017
Field of study

Biological data, such as molecular abundance measurements and protein sequences, harbor complex hidden structure that reflects its underlying biological mechanisms. For example, high-throughput abundance measurements provide a snapshot the global state of a living cell, while homologous protein sequences encode the residue-level logic of the proteins\u27 function and provide a snapshot of the evolutionary trajectory of the protein family. In this work I describe algorithmic approaches and analysis software I developed for uncovering hidden structure in both kinds of data. Clustering is an unsurpervised machine learning technique commonly used to map the structure of data collected in high-throughput experiments, such as quantification of gene expression by DNA microarrays or short-read sequencing. Clustering algorithms always yield a partitioning of the data, but relying on a single partitioning solution can lead to spurious conclusions. In particular, noise in the data can cause objects to fall into the same cluster by chance rather than due to meaningful association. In the first part of this thesis I demonstrate approaches to clustering data robustly in the presence of noise and apply robust clustering to analyze the transcriptional response to injury in a neuron cell. In the second part of this thesis I describe identifying hidden specificity determining residues (SDPs) from alignments of protein sequences descended through gene duplication from a common ancestor (paralogs) and apply the approach to identify numerous putative SDPs in bacterial transcription factors in the LacI family. Finally, I describe and demonstrate a new algorithm for reconstructing the history of duplications by which paralogs descended from their common ancestor. This algorithm addresses the complexity of such reconstruction due to indeterminate or erroneous homology assignments made by sequence alignment algorithms and to the vast prevalence of divergence through speciation over divergence through gene duplication in protein evolution

Washington University St. Louis: Open Scholarship

Phylogenomics and Geometric Morphometrics Define Species Flocks of Snowtrout (Teleostei: Schizothorax) in the Central Himalayas

Author: Regmi Binod
Publication venue: ScholarWorks@UARK
Publication date: 01/05/2019
Field of study

Schizothorax (Snowtrout) is a genus of medium-sized minnows (Cypriniformes) inhabiting glacier-fed streams, rivers, and lakes in the Himalayas. There are more than 30 species of Schizothorax across the region. The speciation and diversity of the Snowtrout in the vast hinterlands of the Himalayan Region has not been fully explored. Three species in Lake Rara, Western Nepal are considered a species flock, comprising endemic ecotypes that are morphologically differentiated and reproductively isolated. My dissertation research examined the diversity of Schizothorax in the Central Himalayan region and evolutionary relationships among species distributed in the Tibet, Central and Southeast Asia. Chapter I describes the historical biogeography and distribution of Schizothorax species in the Himalayas and Tibetan Region. In Chapter II, morphological and genetic variation was examined among Schizothorax collected from three major drainage systems in Nepal using 18 anatomical landmarks (number of images, N=565) and mitochondrial gene (cytochrome b) sequence analysis (n=115). In Chapter III, machine learning algorithms were evaluated to discriminate morphological species based on head and body shape using Procrustes aligned data generated in Chapter 2. In Chapter IV, a phylogenetic tree of Schizothorax was constructed comprising Central (Nepal, haplotypes=14) and Eastern (Bhutan, haplotypes=18) Himalayan species to explore their evolutionary relationships within in a global species phylogeny based on GenBank data (n=51, outgroups=5). Chapter V employed a phylogenomic approach to examine fine-scale relationships amongst Schizothorax in Nepal and assess uniqueness of endemic forms in Lake Rara. Double digest restriction associated DNA (ddRAD)sequences were generated to extract 20,000 single nucleotide polymorphism (SNPs) loci. These data were used to trace the selection driven phenotypic convergence among species isolated largely due to the geographical and ecological barriers. Both species and basins were significant predictors of the shape. Classifiers, such as Linear Discriminant Analysis (LDA) and Support Vector Machine (SVM), assigned individuals to morphological species with high accuracy. However, a strong geographic structure was reflected in the mitochondrial (cytochrome b) gene sequence data. Conversely, phylogenomic analysis of SNPs uncovered basin-specific upstream and downstream clades, as well as Lake Rara endemic species as a monophyletic group that mitochondrial gene analyses failed to resolve in previous studies

ScholarWorks@UARK

UARK (University of Arkansas )

Information Theory-based Evolution of Neural Networks for Side-channel Analysis

Author: Domenic Forte
Fatemeh Ganji
Rabin Y. Acharya
Publication venue: 'Universitatsbibliothek der Ruhr-Universitat Bochum'
Publication date: 01/11/2022
Field of study

Profiled side-channel analysis (SCA) leverages leakage from cryptographic implementations to extract the secret key. When combined with advanced methods in neural networks (NNs), profiled SCA can successfully attack even those cryptocores assumed to be protected against SCA. Despite the rise in the number of studies devoted to NN-based SCA, a range of questions has remained unanswered, namely: how to choose an NN with an adequate configuration, how to tune the NN’s hyperparameters, when to stop the training, etc. Our proposed approach, “InfoNEAT,” tackles these issues in a natural way. InfoNEAT relies on the concept of neural structure search, enhanced by information-theoretic metrics to guide the evolution, halt it with novel stopping criteria, and improve time-complexity and memory footprint. The performance of InfoNEAT is evaluated by applying it to publicly available datasets composed of real side-channel measurements. In addition to the considerable advantages regarding the automated configuration of NNs, InfoNEAT demonstrates significant improvements over other approaches for effective key recovery in terms of the number of epochs (e.g.,x6 faster) and the number of attack traces compared to both MLPs and CNNs (e.g., up to 1000s fewer traces to break a device) as well as a reduction in the number of trainable parameters compared to MLPs (e.g., by the factor of up to 32). Furthermore, through experiments, it is demonstrated that InfoNEAT’s models are robust against noise and desynchronization in traces

Directory of Open Access Journals

From genotypes to organisms: state-of-the-art and perspectives of a cornerstone in evolutionary dynamics

Author: Aguirre Jacobo
Ahnert Sebastian E
Altenberg Lee
Cano Alejandro V
Catalán Pablo
Cuesta José A
Diaz-Uriarte Ramon
Elena Santiago F
García-Martín Juan Antonio
Hogeweg Paulien
Khatri Bhavin S
Krug Joachim
Louis Adriaan
Manrubia Susanna
Martin Nora S
Payne Joshua L
Tarnowski Matthew J
Weiß Marcel
Publication venue: Elsevier
Publication date: 21/05/2021
Field of study

Understanding how genotypes map onto phenotypes, fitness, and eventually organisms is arguably the next major missing piece in a fully predictive theory of evolution. We refer to this generally as the problem of the genotype-phenotype map. Though we are still far from achieving a complete picture of these relationships, our current understanding of simpler questions, such as the structure induced in the space of genotypes by sequences mapped to molecular structures, has revealed important facts that deeply affect the dynamical description of evolutionary processes. Empirical evidence supporting the fundamental relevance of features such as phenotypic bias is mounting as well, while the synthesis of conceptual and experimental progress leads to questioning current assumptions on the nature of evolutionary dynamics-cancer progression models or synthetic biology approaches being notable examples. This work delves with a critical and constructive attitude into our current knowledge of how genotypes map onto molecular phenotypes and organismal functions, and discusses theoretical and empirical avenues to broaden and improve this comprehension. As a final goal, this community should aim at deriving an updated picture of evolutionary processes soundly relying on the structural properties of genotype spaces, as revealed by modern techniques of molecular and functional analysis

Oxford University Research Archive