81 research outputs found

    A Survey of Feature Selection Strategies for DNA Microarray Classification

    Get PDF
    Classification tasks are difficult and challenging in the bioinformatics field, that used to predict or diagnose patients at an early stage of disease by utilizing DNA microarray technology. However, crucial characteristics of DNA microarray technology are a large number of features and small sample sizes, which means the technology confronts a "dimensional curse" in its classification tasks because of the high computational execution needed and the discovery of biomarkers difficult. To reduce the dimensionality of features to find the significant features that can employ feature selection algorithms and not affect the performance of classification tasks. Feature selection helps decrease computational time by removing irrelevant and redundant features from the data. The study aims to briefly survey popular feature selection methods for classifying DNA microarray technology, such as filters, wrappers, embedded, and hybrid approaches. Furthermore, this study describes the steps of the feature selection process used to accomplish classification tasks and their relationships to other components such as datasets, cross-validation, and classifier algorithms. In the case study, we chose four different methods of feature selection on two-DNA microarray datasets to evaluate and discuss their performances, namely classification accuracy, stability, and the subset size of selected features. Keywords: Brief survey; DNA microarray data; feature selection; filter methods; wrapper methods; embedded methods; and hybrid methods. DOI: 10.7176/CEIS/14-2-01 Publication date:March 31st 202

    Metaproteogenomic analysis of saliva samples from Parkinson's disease patients with cognitive impairment

    Get PDF
    Cognitive impairment (CI) is very common in patients with Parkinson's Disease (PD) and progressively develops on a spectrum from mild cognitive impairment (PD-MCI) to full dementia (PDD). Identification of PD patients at risk of developing cognitive decline, therefore, is unmet need in the clinic to manage the disease. Previous studies reported that oral microbiota of PD patients was altered even at early stages and poor oral hygiene is associated with dementia. However, data from single modalities are often unable to explain complex chronic diseases in the brain and cannot reliably predict the risk of disease progression. Here, we performed integrative metaproteogenomic characterization of salivary microbiota and tested the hypothesis that biological molecules of saliva and saliva microbiota dynamically shift in association with the progression of cognitive decline and harbor discriminatory key signatures across the spectrum of CI in PD. We recruited a cohort of 115 participants in a multi-center study and employed multi-omics factor analysis (MOFA) to integrate amplicon sequencing and metaproteomic analysis to identify signature taxa and proteins in saliva. Our baseline analyses revealed contrasting interplay between the genus Neisseria and Lactobacillus and Ligilactobacillus genera across the spectrum of CI. The group specific signature profiles enabled us to identify bacterial genera and protein groups associated with CI stages in PD. Our study describes compositional dynamics of saliva across the spectrum of CI in PD and paves the way for developing non-invasive biomarker strategies to predict the risk of CI progression in PD.FEMS Research and Training Gran

    Group testing performance evaluation for SARS-CoV-2 massive scale screening and testing

    No full text
    BackgroundThe capacity of the current molecular testing convention does not allow high-throughput and community level scans of COVID-19 infections. The diameter in the current paradigm of shallow tracing is unlikely to reach the silent clusters that might be as important as the symptomatic cases in the spread of the disease. Group testing is a feasible and promising approach when the resources are scarce and when a relatively low prevalence regime is observed on the population.MethodsWe employed group testing with a sparse random pooling scheme and conventional group test decoding algorithms both for exact and inexact recovery.ResultsOur simulations showed that significant reduction in per case test numbers (or expansion in total test numbers preserving the number of actual tests conducted) for very sparse prevalence regimes is available. Currently proposed COVID-19 group testing schemes offer a gain up to 15X-20X scale-up. There is a good probability that the required scale up to achieve massive scale testing might be greater in certain scenarios. We investigated if further improvement is available, especially in sparse prevalence occurrence where outbreaks are needed to be avoided by population scans.ConclusionOur simulations show that sparse random pooling can provide improved efficiency gains compared to conventional group testing or Reed-Solomon error correcting codes. Therefore, we propose that special designs for different scenarios could be available and it is possible to scale up testing capabilities significantly

    Information Theoretic Metagenome Assembly Allows the Discovery of Disease Biomarkers in Human Microbiome

    No full text
    Quantitative metagenomics is an important field that has delivered successful microbiome biomarkers associated with host phenotypes. The current convention mainly depends on unsupervised assembly of metagenomic contigs with a possibility of leaving interesting genetic material unassembled. Additionally, biomarkers are commonly defined on the differential relative abundance of compositional or functional units. Accumulating evidence supports that microbial genetic variations are as important as the differential abundance content, implying the need for novel methods accounting for the genetic variations in metagenomics studies. We propose an information theoretic metagenome assembly algorithm, discovering genomic fragments with maximal self-information, defined by the empirical distributions of nucleotides across the phenotypes and quantified with the help of statistical tests. Our algorithm infers fragments populating the most informative genetic variants in a single contig, named supervariant fragments. Experiments on simulated metagenomes, as well as on a colorectal cancer and an atherosclerotic cardiovascular disease dataset consistently discovered sequences strongly associated with the disease phenotypes. Moreover, the discriminatory power of these putative biomarkers was mainly attributed to the genetic variations rather than relative abundance. Our results support that a focus on metagenomics methods considering microbiome population genetics might be useful in discovering disease biomarkers with a great potential of translating to molecular diagnostics and biotherapeutics applications
    corecore