316 research outputs found

    Benchmark of structured machine learning methods for microbial identification from mass-spectrometry data

    Full text link
    Microbial identification is a central issue in microbiology, in particular in the fields of infectious diseases diagnosis and industrial quality control. The concept of species is tightly linked to the concept of biological and clinical classification where the proximity between species is generally measured in terms of evolutionary distances and/or clinical phenotypes. Surprisingly, the information provided by this well-known hierarchical structure is rarely used by machine learning-based automatic microbial identification systems. Structured machine learning methods were recently proposed for taking into account the structure embedded in a hierarchy and using it as additional a priori information, and could therefore allow to improve microbial identification systems. We test and compare several state-of-the-art machine learning methods for microbial identification on a new Matrix-Assisted Laser Desorption/Ionization Time-of-Flight mass spectrometry (MALDI-TOF MS) dataset. We include in the benchmark standard and structured methods, that leverage the knowledge of the underlying hierarchical structure in the learning process. Our results show that although some methods perform better than others, structured methods do not consistently perform better than their "flat" counterparts. We postulate that this is partly due to the fact that standard methods already reach a high level of accuracy in this context, and that they mainly confuse species close to each other in the tree, a case where using the known hierarchy is not helpful

    Construction of a potato consensus map and QTL meta-analysis offer new insights into the genetic architecture of late blight resistance and plant maturity traits

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Integrating QTL results from independent experiments performed on related species helps to survey the genetic diversity of loci/alleles underlying complex traits, and to highlight potential targets for breeding or QTL cloning. Potato (<it>Solanum tuberosum </it>L.) late blight resistance has been thoroughly studied, generating mapping data for many Rpi-genes (R-genes to <it>Phytophthora infestans</it>) and QTLs (quantitative trait loci). Moreover, late blight resistance was often associated with plant maturity. To get insight into the genomic organization of late blight resistance loci as compared to maturity QTLs, a QTL meta-analysis was performed for both traits.</p> <p>Results</p> <p>Nineteen QTL publications for late blight resistance were considered, seven of them reported maturity QTLs. Twenty-one QTL maps and eight reference maps were compiled to construct a 2,141-marker consensus map on which QTLs were projected and clustered into meta-QTLs. The whole-genome QTL meta-analysis reduced by six-fold late blight resistance QTLs (by clustering 144 QTLs into 24 meta-QTLs), by <it>ca</it>. five-fold maturity QTLs (by clustering 42 QTLs into eight meta-QTLs), and by <it>ca</it>. two-fold QTL confidence interval mean. Late blight resistance meta-QTLs were observed on every chromosome and maturity meta-QTLs on only six chromosomes.</p> <p>Conclusions</p> <p>Meta-analysis helped to refine the genomic regions of interest frequently described, and provided the closest flanking markers. Meta-QTLs of late blight resistance and maturity juxtaposed along chromosomes IV, V and VIII, and overlapped on chromosomes VI and XI. The distribution of late blight resistance meta-QTLs is significantly independent from those of Rpi-genes, resistance gene analogs and defence-related loci. The anchorage of meta-QTLs to the potato genome sequence, recently publicly released, will especially improve the candidate gene selection to determine the genes underlying meta-QTLs. All mapping data are available from the Sol Genomics Network (SGN) database.</p

    Bayesian Model Selection in Complex Linear Systems, as Illustrated in Genetic Association Studies

    Full text link
    Motivated by examples from genetic association studies, this paper considers the model selection problem in a general complex linear model system and in a Bayesian framework. We discuss formulating model selection problems and incorporating context-dependent {\it a priori} information through different levels of prior specifications. We also derive analytic Bayes factors and their approximations to facilitate model selection and discuss their theoretical and computational properties. We demonstrate our Bayesian approach based on an implemented Markov Chain Monte Carlo (MCMC) algorithm in simulations and a real data application of mapping tissue-specific eQTLs. Our novel results on Bayes factors provide a general framework to perform efficient model comparisons in complex linear model systems

    Association mapping reveals gene action and interactions in the determination of flowering time in barley

    Get PDF
    The interaction between members of a gene network has an important impact on the variation of quantitative traits, and can influence the outcome of phenotype/genotype association studies. Three genes (Ppd-H1, HvCO1, HvFT1) known to play an essential role in the regulation of flowering time under long days in barley were subjected to an analysis of nucleotide diversity in a collection of 220 spring barley accessions. The coding region of Ppd-H1 was highly diverse, while both HvCO1 and HvFT1 showed a rather limited level of diversity. Within all three genes, the extent of linkage disequilibrium was variable, but on average only moderate. Ppd-H1 is strongly associated with flowering time across four environments, showing a difference of five to ten days between the most extreme haplotypes. The association between flowering time and the variation at HvFT1 and HvCO1 was strongly dependent on the haplotype present at Ppd-H1. The interaction between HvCO1 and Ppd-H1 was statistically significant, but this association disappeared when the analysis was corrected for the geographical origin of the accessions. No association existed between flowering time and allelic variation at HvFT1. In contrast to Ppd-H1, functional variation at both HvCO1 and HvFT1 is limited in cultivated barley

    De novo finished 2.8 Mbp Staphylococcus aureus genome assembly from 100 bp short and long range paired-end reads

    Get PDF
    Motivation: Paired-end sequencing allows circumventing the shortness of the reads produced by second generation sequencers and is essential for de novo assembly of genomes. However, obtaining a finished genome from short reads is still an open challenge. We present an algorithm that exploits the pairing information issued from inserts of potentially any length. The method determines paths through an overlaps graph by using a constrained search tree. We also present a method that automatically determines suited overlaps cutoffs according to the contextual coverage, reducing thus the need for manual parameterization. Finally, we introduce an interactive mode that allows querying an assembly at targeted regions. Results: We assess our methods by assembling two Staphylococcus aureus strains that were sequenced on the Illumina platform. Using 100 bp paired-end reads and minimal manual curation, we produce a finished genome sequence for the previously undescribed isolate SGH-10-168. Availability and implementation: The presented algorithms are implemented in the standalone Edena software, freely available under the General Public License (GPLv3) at www.genomic.ch/edena.php. Contact: [email protected] Supplementary Information: Supplementary data are available at Bioinformatics onlin

    Gene Expression Levels Are a Target of Recent Natural Selection in the Human Genome

    Get PDF
    Changes in gene expression may represent an important mode of human adaptation. However, to date, there are relatively few known examples in which selection has been shown to act directly on levels or patterns of gene expression. In order to test whether single nucleotide polymorphisms (SNPs) that affect gene expression in cis are frequently targets of positive natural selection in humans, we analyzed genome-wide SNP and expression data from cell lines associated with the International HapMap Project. Using a haplotype-based test for selection that was designed to detect incomplete selective sweeps, we found that SNPs showing signals of selection are more likely than random SNPs to be associated with gene expression levels in cis. This signal is significant in the Yoruba (which is the population that shows the strongest signals of selection overall) and shows a trend in the same direction in the other HapMap populations. Our results argue that selection on gene expression levels is an important type of human adaptation. Finally, our work provides an analytical framework for tackling a more general problem that will become increasingly important: namely, testing whether selection signals overlap significantly with SNPs that are associated with phenotypes of interest

    Gene set of nuclear-encoded mitochondrial regulators is enriched for common inherited variation in obesity

    Get PDF
    There are hints of an altered mitochondrial function in obesity. Nuclear-encoded genes are relevant for mitochondrial function (3 gene sets of known relevant pathways: (1) 16 nuclear regulators of mitochondrial genes, (2) 91 genes for oxidative phosphorylation and (3) 966 nuclear-encoded mitochondrial genes). Gene set enrichment analysis (GSEA) showed no association with type 2 diabetes mellitus in these gene sets. Here we performed a GSEA for the same gene sets for obesity. Genome wide association study (GWAS) data from a case-control approach on 453 extremely obese children and adolescents and 435 lean adult controls were used for GSEA. For independent confirmation, we analyzed 705 obesity GWAS trios (extremely obese child and both biological parents) and a population-based GWAS sample (KORA F4, n = 1,743). A meta-analysis was performed on all three samples. In each sample, the distribution of significance levels between the respective gene set and those of all genes was compared using the leading-edge-fraction-comparison test (cut-offs between the 50(th) and 95(th) percentile of the set of all gene-wise corrected p-values) as implemented in the MAGENTA software. In the case-control sample, significant enrichment of associations with obesity was observed above the 50(th) percentile for the set of the 16 nuclear regulators of mitochondrial genes (p(GSEA,50) = 0.0103). This finding was not confirmed in the trios (p(GSEA,50) = 0.5991), but in KORA (p(GSEA,50) = 0.0398). The meta-analysis again indicated a trend for enrichment (p(MAGENTA,50) = 0.1052, p(MAGENTA,75) = 0.0251). The GSEA revealed that weak association signals for obesity might be enriched in the gene set of 16 nuclear regulators of mitochondrial genes

    Exon-Specific QTLs Skew the Inferred Distribution of Expression QTLs Detected Using Gene Expression Array Data

    Get PDF
    Mapping of expression quantitative trait loci (eQTLs) is an important technique for studying how genetic variation affects gene regulation in natural populations. In a previous study using Illumina expression data from human lymphoblastoid cell lines, we reported that cis-eQTLs are especially enriched around transcription start sites (TSSs) and immediately upstream of transcription end sites (TESs). In this paper, we revisit the distribution of eQTLs using additional data from Affymetrix exon arrays and from RNA sequencing. We confirm that most eQTLs lie close to the target genes; that transcribed regions are generally enriched for eQTLs; that eQTLs are more abundant in exons than introns; and that the peak density of eQTLs occurs at the TSS. However, we find that the intriguing TES peak is greatly reduced or absent in the Affymetrix and RNA-seq data. Instead our data suggest that the TES peak observed in the Illumina data is mainly due to exon-specific QTLs that affect 3′ untranslated regions, where most of the Illumina probes are positioned. Nonetheless, we do observe an overall enrichment of eQTLs in exons versus introns in all three data sets, consistent with an important role for exonic sequences in gene regulation
    corecore