6,130 research outputs found

    Classification of Adversarial Attacks Using Ensemble Clustering Approach

    Get PDF
    As more business transactions and information services have been implemented via communication networks, both personal and organization assets encounter a higher risk of attacks. To safeguard these, a perimeter defence like NIDS (network-based intrusion detection system) can be effective for known intrusions. There has been a great deal of attention within the joint community of security and data science to improve machine-learning based NIDS such that it becomes more accurate for adversarial attacks, where obfuscation techniques are applied to disguise patterns of intrusive traffics. The current research focuses on non-payload connections at the TCP (transmission control protocol) stack level that is applicable to different network applications. In contrary to the wrapper method introduced with the benchmark dataset, three new filter models are proposed to transform the feature space without knowledge of class labels. These ECT (ensemble clustering based transformation) techniques, i.e., ECT-Subspace, ECT-Noise and ECT-Combined, are developed using the concept of ensemble clustering and three different ensemble generation strategies, i.e., random feature subspace, feature noise injection and their combinations. Based on the empirical study with published dataset and four classification algorithms, new models usually outperform that original wrapper and other filter alternatives found in the literature. This is similarly summarized from the first experiment with basic classification of legitimate and direct attacks, and the second that focuses on recognizing obfuscated intrusions. In addition, analysis of algorithmic parameters, i.e., ensemble size and level of noise, is provided as a guideline for a practical use

    Generative Adversarial Networks (GANs): Challenges, Solutions, and Future Directions

    Full text link
    Generative Adversarial Networks (GANs) is a novel class of deep generative models which has recently gained significant attention. GANs learns complex and high-dimensional distributions implicitly over images, audio, and data. However, there exists major challenges in training of GANs, i.e., mode collapse, non-convergence and instability, due to inappropriate design of network architecture, use of objective function and selection of optimization algorithm. Recently, to address these challenges, several solutions for better design and optimization of GANs have been investigated based on techniques of re-engineered network architectures, new objective functions and alternative optimization algorithms. To the best of our knowledge, there is no existing survey that has particularly focused on broad and systematic developments of these solutions. In this study, we perform a comprehensive survey of the advancements in GANs design and optimization solutions proposed to handle GANs challenges. We first identify key research issues within each design and optimization technique and then propose a new taxonomy to structure solutions by key research issues. In accordance with the taxonomy, we provide a detailed discussion on different GANs variants proposed within each solution and their relationships. Finally, based on the insights gained, we present the promising research directions in this rapidly growing field.Comment: 42 pages, Figure 13, Table

    Robust Algorithms for Detecting Hidden Structure in Biological Data

    Get PDF
    Biological data, such as molecular abundance measurements and protein sequences, harbor complex hidden structure that reflects its underlying biological mechanisms. For example, high-throughput abundance measurements provide a snapshot the global state of a living cell, while homologous protein sequences encode the residue-level logic of the proteins\u27 function and provide a snapshot of the evolutionary trajectory of the protein family. In this work I describe algorithmic approaches and analysis software I developed for uncovering hidden structure in both kinds of data. Clustering is an unsurpervised machine learning technique commonly used to map the structure of data collected in high-throughput experiments, such as quantification of gene expression by DNA microarrays or short-read sequencing. Clustering algorithms always yield a partitioning of the data, but relying on a single partitioning solution can lead to spurious conclusions. In particular, noise in the data can cause objects to fall into the same cluster by chance rather than due to meaningful association. In the first part of this thesis I demonstrate approaches to clustering data robustly in the presence of noise and apply robust clustering to analyze the transcriptional response to injury in a neuron cell. In the second part of this thesis I describe identifying hidden specificity determining residues (SDPs) from alignments of protein sequences descended through gene duplication from a common ancestor (paralogs) and apply the approach to identify numerous putative SDPs in bacterial transcription factors in the LacI family. Finally, I describe and demonstrate a new algorithm for reconstructing the history of duplications by which paralogs descended from their common ancestor. This algorithm addresses the complexity of such reconstruction due to indeterminate or erroneous homology assignments made by sequence alignment algorithms and to the vast prevalence of divergence through speciation over divergence through gene duplication in protein evolution

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Providing contexts for classification of transients in a wide-area sky survey: An application of noise-induced cluster ensemble

    Get PDF
    With new sensor systems that capture sky survey at high quality level, analyzing the resulting data within a limited time frame appears to be the next challenge. Specific to the GOTO project, this task proves to be crucial to discover new transients from a pool of large candidates. Initial works based on the feature-based approach design this detection as imbalance classification, where a data-level method can be used to resolve the difference in cardinality between classes. This paper presents a context generation framework to complement the previously proposed model. In particular, samples are clustered to form data contexts to which different learning strategies may be applied. To ensure the quality of data clustering, a noise-induced cluster ensemble technique that has been recently introduced in the literature is employed here. The results with simulated data and algorithms of NB, C4.5 and KNN have shown that the proposed framework can filter out some negative samples quickly, while making classification of the rest more effective. In particular, it enhances predictive performance of basic classifiers by lifting F1 scores from less than 0.1 to around 0.3–0.5. Besides, parameter analysis is also given as a guideline for its application

    Proceedings of Abstracts Engineering and Computer Science Research Conference 2019

    Get PDF
    © 2019 The Author(s). This is an open-access work distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. For further details please see https://creativecommons.org/licenses/by/4.0/. Note: Keynote: Fluorescence visualisation to evaluate effectiveness of personal protective equipment for infection control is © 2019 Crown copyright and so is licensed under the Open Government Licence v3.0. Under this licence users are permitted to copy, publish, distribute and transmit the Information; adapt the Information; exploit the Information commercially and non-commercially for example, by combining it with other Information, or by including it in your own product or application. Where you do any of the above you must acknowledge the source of the Information in your product or application by including or linking to any attribution statement specified by the Information Provider(s) and, where possible, provide a link to this licence: http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/This book is the record of abstracts submitted and accepted for presentation at the Inaugural Engineering and Computer Science Research Conference held 17th April 2019 at the University of Hertfordshire, Hatfield, UK. This conference is a local event aiming at bringing together the research students, staff and eminent external guests to celebrate Engineering and Computer Science Research at the University of Hertfordshire. The ECS Research Conference aims to showcase the broad landscape of research taking place in the School of Engineering and Computer Science. The 2019 conference was articulated around three topical cross-disciplinary themes: Make and Preserve the Future; Connect the People and Cities; and Protect and Care
    corecore