103,123 research outputs found

    Genetic heterogeneity analysis using genetic algorithm and network science

    Full text link
    Through genome-wide association studies (GWAS), disease susceptible genetic variables can be identified by comparing the genetic data of individuals with and without a specific disease. However, the discovery of these associations poses a significant challenge due to genetic heterogeneity and feature interactions. Genetic variables intertwined with these effects often exhibit lower effect-size, and thus can be difficult to be detected using machine learning feature selection methods. To address these challenges, this paper introduces a novel feature selection mechanism for GWAS, named Feature Co-selection Network (FCSNet). FCS-Net is designed to extract heterogeneous subsets of genetic variables from a network constructed from multiple independent feature selection runs based on a genetic algorithm (GA), an evolutionary learning algorithm. We employ a non-linear machine learning algorithm to detect feature interaction. We introduce the Community Risk Score (CRS), a synthetic feature designed to quantify the collective disease association of each variable subset. Our experiment showcases the effectiveness of the utilized GA-based feature selection method in identifying feature interactions through synthetic data analysis. Furthermore, we apply our novel approach to a case-control colorectal cancer GWAS dataset. The resulting synthetic features are then used to explain the genetic heterogeneity in an additional case-only GWAS dataset

    Comparison of metaheuristic strategies for peakbin selection in proteomic mass spectrometry data

    Get PDF
    Mass spectrometry (MS) data provide a promising strategy for biomarker discovery. For this purpose, the detection of relevant peakbins in MS data is currently under intense research. Data from mass spectrometry are challenging to analyze because of their high dimensionality and the generally low number of samples available. To tackle this problem, the scientific community is becoming increasingly interested in applying feature subset selection techniques based on specialized machine learning algorithms. In this paper, we present a performance comparison of some metaheuristics: best first (BF), genetic algorithm (GA), scatter search (SS) and variable neighborhood search (VNS). Up to now, all the algorithms, except for GA, have been first applied to detect relevant peakbins in MS data. All these metaheuristic searches are embedded in two different filter and wrapper schemes coupled with Naive Bayes and SVM classifiers

    Practical Attacks Against Graph-based Clustering

    Full text link
    Graph modeling allows numerous security problems to be tackled in a general way, however, little work has been done to understand their ability to withstand adversarial attacks. We design and evaluate two novel graph attacks against a state-of-the-art network-level, graph-based detection system. Our work highlights areas in adversarial machine learning that have not yet been addressed, specifically: graph-based clustering techniques, and a global feature space where realistic attackers without perfect knowledge must be accounted for (by the defenders) in order to be practical. Even though less informed attackers can evade graph clustering with low cost, we show that some practical defenses are possible.Comment: ACM CCS 201

    Data-driven design of intelligent wireless networks: an overview and tutorial

    Get PDF
    Data science or "data-driven research" is a research approach that uses real-life data to gain insight about the behavior of systems. It enables the analysis of small, simple as well as large and more complex systems in order to assess whether they function according to the intended design and as seen in simulation. Data science approaches have been successfully applied to analyze networked interactions in several research areas such as large-scale social networks, advanced business and healthcare processes. Wireless networks can exhibit unpredictable interactions between algorithms from multiple protocol layers, interactions between multiple devices, and hardware specific influences. These interactions can lead to a difference between real-world functioning and design time functioning. Data science methods can help to detect the actual behavior and possibly help to correct it. Data science is increasingly used in wireless research. To support data-driven research in wireless networks, this paper illustrates the step-by-step methodology that has to be applied to extract knowledge from raw data traces. To this end, the paper (i) clarifies when, why and how to use data science in wireless network research; (ii) provides a generic framework for applying data science in wireless networks; (iii) gives an overview of existing research papers that utilized data science approaches in wireless networks; (iv) illustrates the overall knowledge discovery process through an extensive example in which device types are identified based on their traffic patterns; (v) provides the reader the necessary datasets and scripts to go through the tutorial steps themselves

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
    • …
    corecore