21 research outputs found

    De novo assembly of viral quasispecies using overlap graphs

    Get PDF
    Baaijens JA, Aabidine AZE, Rivals E, Schönhuth A. De novo assembly of viral quasispecies using overlap graphs. Genome Research. 2017;27(5):835-848

    Co2Vis: A Visual Analytics Tool for Mining Co-Expressed and Co-Regulated Genes Implied in HIV Infections

    Get PDF
    International audienceOne of the key challenges in human health is the identification of disease-causing genes like AIDS (Acquired ImmunoDeficiency Syndrome). Numerous studies have addressed this challenge through gene expression analysis. Due to the amount of data available, processing DNA microarrays in a way that makes biomedical sense is still a major issue.Statistical methods and data mining techniques play a key role in discovering previously unknown knowledge. However, applying such techniques in this context is difficult because the number of measurement points (i.e., gene expression levels) is much higher than the number of samples resulting in the well-known curse of dimensionality problem also called the high feature-to-sample ratio.We propose a combination of data mining and visual analytics methods to identify and render groups of genes implied in HIV infections and sharing common behaviors

    Colib'read on galaxy : a tools suite dedicated to biological information extraction from raw NGS reads

    Get PDF
    Background: With next-generation sequencing (NGS) technologies, the life sciences face a deluge of raw data. Classical analysis processes for such data often begin with an assembly step, needing large amounts of computing resources, and potentially removing or modifying parts of the biological information contained in the data. Our approach proposes to focus directly on biological questions, by considering raw unassembled NGS data, through a suite of six command-line tools. Findings: Dedicated to 'whole-genome assembly-free' treatments, the Colib'read tools suite uses optimized algorithms for various analyses of NGS datasets, such as variant calling or read set comparisons. Based on the use of a de Bruijn graph and bloom filter, such analyses can be performed in a few hours, using small amounts of memory. Applications using real data demonstrate the good accuracy of these tools compared to classical approaches. To facilitate data analysis and tools dissemination, we developed Galaxy tools and tool shed repositories. Conclusions: With the Colib'read Galaxy tools suite, we enable a broad range of life scientists to analyze raw NGS data. More importantly, our approach allows the maximum biological information to be retained in the data, and uses a very low memory footprint.Peer reviewe

    De novo assembly of viral quasispecies using overlap graphs

    Get PDF
    Aviral quasispecies, the ensemble of viral strains populating an infected person, can be highly diverse. For optimal assessment of virulence, pathogenesis, and therapy selection, determining the haplotypes of the individual strains can play a key role. As many viruses are subject to high mutation and recombination rates, high-quality reference genomes are often not available at the time of a new disease outbreak. We present SAVAGE, a computational tool for reconstructing individual haplotypes of intra-host virus strains without the need for a high-quality reference genome. SAVAGE makes use of either FM-index-based data structures or ad hoc consensus reference sequence for constructing overlap graphs frompatient sample data. In this overlap graph, nodes represent reads and/or contigs, while edges reflect that two reads/contigs, based on sound statistical considerations, represent identical haplotypic sequence. Following an iterative scheme, a new overlap assembly algorithm that is based on the enumeration of statistically well-calibrated groups of reads/contigs then efficiently reconstructs the individual haplotypes from this overlap graph. In benchmark experiments on simulated and on real deep-coverage data, SAVAGEdrastically outperforms generic de novo assemblers as well as the only specialized de novo viral quasispecies assembler available so far. When run on ad hoc consensus reference sequence, SAVAGE performs very favorably in comparison with state-of-theart reference genome-guided tools. We also apply SAVAGE on two deep-coverage samples of patients infected by the Zika and the hepatitis C virus, respectively, which sheds light on the genetic structures of the respective viral quasispecies

    Resistance of olive tree to Spilocaea oleagina is mediated by the synthesis of phenolic compounds

    No full text
    E-mail Addresses: [email protected] Publication Inra prise en compte dans l'analyse bibliomĂ©trique des publications scientifiques mondiales sur les Fruits, les LĂ©gumes et la Pomme de terre. PĂ©riode 2000-2012. http://prodinra.inra.fr/record/256699International audienceTo understand the resistance of the olive tree to the leaf-spot disease caused by Spilocaea oleagina, the constitutive and postinfectional synthesis phenolic compounds of the leaves were analyzed by HPLC in 110 genotypes F1 (susceptible cultivar “Picholine marocaine” x resistant cultivar “Picholine du Languedoc”) presenting of the differential behaviours to this disease (highly resistant, resistant, intermediate, susceptible & highly susceptible genotypes). The HPLC analysis distinguished 15 majors phenolic compounds according to their chromatographic and spectral characteristics into five phenolic families (hydroxycinnamic derivatives, flavonoids, verbascoside derivatives, tyrosol derivatives, oleuropein derivatives). No qualitative difference was observed between cultivars. Principal components analysis (PCA) highlighted three multifactorial components distinguishing the various genotypes according to their behaviour to the disease. These components were determined by the postinfectional contents of oleuropein and rutin and by the constitutive contents of tyrosol and its derivatives. The tyrosol and its derivatives were associated with constitutive resistance, whereas the oleuropein and rutin were associated with induced resistance. These results suggest that the activity ratio of the enzymes implied in various biosynthesis ways of these phenolic compounds and/or the expression rate of the corresponding genes would be at the origin of the resistance degree of olive tree to S. oleagina

    Co2Vis: A Visual Analytics Tool for Mining Co-Expressed and Co-Regulated Genes Implied in HIV Infections

    No full text
    International audienceOne of the key challenges in human health is the identification of disease-causing genes like AIDS (Acquired ImmunoDeficiency Syndrome). Numerous studies have addressed this challenge through gene expression analysis. Due to the amount of data available, processing DNA microarrays in a way that makes biomedical sense is still a major issue.Statistical methods and data mining techniques play a key role in discovering previously unknown knowledge. However, applying such techniques in this context is difficult because the number of measurement points (i.e., gene expression levels) is much higher than the number of samples resulting in the well-known curse of dimensionality problem also called the high feature-to-sample ratio.We propose a combination of data mining and visual analytics methods to identify and render groups of genes implied in HIV infections and sharing common behaviors

    How can we efficiently characterize genes of agronomic interest in Olive: towards the genetic association studies?

    No full text
    Despite the socio-economic importance of olive oil and the need of olive breeding, genetic studies on agronomic traits are restricted to few biparental populations limiting the efficiency of QTL (Quantitative Trait Loci) mapping strategy. Association mapping based on a diversified collection of olive germplasm can be proposed as a complementary strategy to genetically map agronomic traits. Here, we aimed to develop tools for association mapping studies by defining a Mediterranean olive core collection and characterizing a massive set of microsatellite markers (SSRs). The worldwide olive germplasm bank of Marrakech, Morocco (561 accessions from 14 Mediterranean countries) was characterized using 17 nuclear SSRs and cpDNA markers and classified into east, centre and west Mediterranean gene pools. Combining two sampling methods maximizing the capture of diversity and genetic distance, we proposed two core collections of 50 and 94 accessions including all nuclear SSR alleles, cpDNA haplotypes and states of agro-morphology (from Olea databases) from the WOGB Marrakech. These core collections include cultivars considered as the most important in Mediterranean olive producing countries and display a limited genetic structure between east and west/center gene pools. Hence, they are efficient candidates for phenotyping agronomic trait and to explore the largest variability in distinct environmental conditions. Concurrently, we developed a set of genomic and Expressed Sequence Tag (EST)-derived SSRs that were used to complete the genetic map of ‘Oliviùre’ × ‘Arbequina’ segregating population. Molecular markers will be used on the proposed core collections to assess the linkage disequilibrium decay according to the genetic distance and to further develop association mapping studies
    corecore