111 research outputs found

    Do you cov me? Effect of coverage reduction on species identification and genome reconstruction in complex biological matrices by metagenome shotgun high-throughput sequencing

    Get PDF
    Shotgun metagenomics sequencing is a powerful tool for the characterization of complex biological matrices, enabling analysis of prokaryotic and eukaryotic organisms and viruses in a single experiment, with the possibility of reconstructing de novo the whole metagenome or a set of genes of interest. One of the main factors limiting the use of shotgun metagenomics on wide scale projects is the high cost associated with the approach. We set out to determine if it is possible to use shallow shotgun metagenomics to characterize complex biological matrices while reducing costs. We measured the variation of several summary statistics simulating a decrease in sequencing depth by randomly subsampling a number of reads. The main statistics that were compared are alpha diversity estimates, species abundance, and ability of reconstructing de novo the metagenome in terms of length and completeness. Our results show that diversity indices of complex prokaryotic, eukaryotic and viral communities can be accurately estimated with 500,000 reads or less, although particularly complex samples may require 1,000,000 reads. On the contrary, any task involving the reconstruction of the metagenome performed poorly, even with the largest simulated subsample (1,000,000 reads). The length of the reconstructed assembly was smaller than the length obtained with the full dataset, and the proportion of conserved genes that were identified in the meta-genome was drastically reduced compared to the full sample. Shallow shotgun metagenomics can be a useful tool to describe the structure of complex matrices, but it is not adequate to reconstruct—even partially—the metagenome

    Physiological and RNA sequencing data of white lupin plants grown under Fe and P deficiency

    Get PDF
    This DIB article provides details about transcriptional and physiological response of Fe- and P-deficient white lupin roots, an extensive and complete description of plant response is shown in the research article \u201cPhysiological and transcriptomic data highlight common features between iron and phosphorus acquisition mechanisms in white lupin roots\u201d Venuti et al. [1]. White lupin plants were grown under hydroponic system and three different nutritional regimes: Fe deficiency (-Fe), P deficiency (-P), or Fe and P sufficiency (+P + Fe). Depending on nutritional treatment, white lupin plants showed changes in the fresh weights, in root external acidification and FeIII-reductase activity. Moreover, the transcriptomic changes occurring in apices and clusters of Fe-deficient lupin roots were investigated and compared with differences of gene expression occurring in P-deficient plants (-P) and in Fe- and P-sufficient plants (+P + Fe). Transcriptomic data are available in the public repository Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo) under the series entry (GSE112220). The annotation, mapping and enrichment analyses of differentially modulated transcripts were assessed

    Haplotypic structure of the X chromosome in the COGA population sample and the quality of its reconstruction by extant software packages

    Get PDF
    BACKGROUND: The haplotypes of the X chromosome are accessible to direct count in males, whereas the diplotypes of the females may be inferred knowing the haplotype of their sons or fathers. Here, we investigated: 1) the possible large-scale haplotypic structure of the X chromosome in a Caucasian population sample, given the single-nucleotide polymorphism (SNP) maps and genotypes provided by Illumina and Affimetrix for Genetic Analysis Workshop 14, and, 2) the performances of widely used programs in reconstructing haplotypes from population genotypic data, given their known distribution in a sample of unrelated individuals. RESULTS: All possible unrelated mother-son pairs of Caucasian ancestry (N = 104) were selected from the 143 families of the Collaborative Study on the Genetics of Alcoholism pedigree files, and the diplotypes of the mothers were inferred from the X chromosomes of their sons. The marker set included 313 SNPs at an average density of 0.47 Mb. Linkage disequilibrium between pairs of markers was computed by the parameter D', whereas for measuring multilocus disequilibrium, we developed here an index called D*, and applied it to all possible sliding windows of 5 markers each. Results showed a complex pattern of haplotypic structure, with regions of low linkage disequilibrium separated by regions of high values of D*. The following programs were evaluated for their accuracy in inferring population haplotype frequencies: 1) ARLEQUIN 2.001; 2) PHASE 2.1.1; 3) SNPHAP 1.1; 4) HAPLOBLOCK 1.2; 5) HAPLOTYPER 1.0. Performances were evaluated by Pearson correlation (r) coefficient between the true and the inferred distribution of haplotype frequencies. CONCLUSION: The SNP haplotypic structure of the X chromosome is complex, with regions of high haplotype conservation interspersed among regions of higher haplotype diversity. All the tested programs were accurate (r = 1) in reconstructing the distribution of haplotype frequencies in case of high D* values. However, only the program PHASE realized a high correlation coefficient (r > 0.7) in conditions of low linkage disequilibrium

    Metagenomic profiles of different types of Italian high-moisture Mozzarella cheese

    Get PDF
    The microbiota of different types of Italian high-moisture Mozzarella cheese produced using cow or buffalo milk, acidified with natural or selected cultures, and sampled at the dairy or at the mass market, was evaluated using a Next Generation Sequencing approach, in order to identify possible drivers of the bacterial diversity. Cow Mozzarella and buffalo Mozzarella acidified with commercial cultures were dominated by Streptococcus thermophilus, while buffalo samples acidified with natural whey cultures showed similar prevalence of L. delbrueckii subsp. bulgaricus, L. helveticus and S. thermophilus. Moreover, several species of non-starter lactic acid bacteria were frequently detected. The diversity in cow Mozzarella microbiota was much higher than that of water buffalo samples. Cluster analysis clearly separated cow's cheeses from buffalo's ones, the former having a higher prevalence of psychrophilic taxa, and the latter of Lactobacillus and Streptococcus. A higher prevalence of psychrophilic species and potential spoilers was observed in samples collected at the mass retail, suggesting that longer exposures to cooling temperatures and longer production-to-consumption times could significantly affect microbiota diversity. Our results could help in detecting some kind of thermal abuse during the production or storage of mozzarella cheese

    Characterization of the Poplar Pan-Genome by Genome-Wide Identification of Structural Variation

    Get PDF
    Many recent studies have emphasized the important role of structural variation (SV) in determining human genetic and phenotypic variation. In plants, studies aimed at elucidating the extent of SV are still in their infancy. Evidence has indicated a high presence and an active role of SV in driving plant genome evolution in different plant species.With the aim of characterizing the size and the composition of the poplar pan-genome, we performed a genome-wide analysis of structural variation in three intercrossable poplar species: Populus nigra, Populus deltoides, and Populus trichocarpa We detected a total of 7,889 deletions and 10,586 insertions relative to the P. trichocarpa reference genome, covering respectively 33.2\u2009Mb and 62.9\u2009Mb of genomic sequence, and 3,230 genes affected by copy number variation (CNV). The majority of the detected variants are inter-specific in agreement with a recent origin following separation of species.Insertions and deletions (INDELs) were preferentially located in low-gene density regions of the poplar genome and were, for the majority, associated with the activity of transposable elements. Genes affected by SV showed lower-than-average expression levels and higher levels of dN/dS, suggesting that they are subject to relaxed selective pressure or correspond to pseudogenes.Functional annotation of genes affected by INDELs showed over-representation of categories associated with transposable elements activity, while genes affected by genic CNVs showed enrichment in categories related to resistance to stress and pathogens. This study provides a genome-wide catalogue of SV and the first insight on functional and structural properties of the poplar pan-genome

    A genome-wide association scan of RR and QT interval duration in 3 European genetically isolated populations:the EUROSPAN project

    Get PDF
    We set out to identify common genetic determinants of the length of the RR and QT intervals in 2325 individuals from isolated European populations.We analyzed the heart rate at rest, measured as the RR interval, and the length of the corrected QT interval for association with 318 237 single-nucleotide polymorphisms. The RR interval was associated with common variants within GPR133, a G-protein-coupled receptor (rs885389, P=3.9 x 10(-8)). The QT interval was associated with the earlier reported NOS1AP gene (rs2880058, P=2.00 x 10(-10)) and with a region on chromosome 13 (rs2478333, P=4.34 x 10(-8)), which is 100 kb from the closest known transcript LOC730174 and has previously not been associated with the length of the QT interval.Our results suggested an association between the RR interval and GPR133 and confirmed an association between the QT interval and NOS1AP

    Importance of Different Types of Prior Knowledge in Selecting Genome‐Wide Findings for Follow‐Up

    Full text link
    Biological plausibility and other prior information could help select genome‐wide association ( GWA ) findings for further follow‐up, but there is no consensus on which types of knowledge should be considered or how to weight them. We used experts’ opinions and empirical evidence to estimate the relative importance of 15 types of information at the single‐nucleotide polymorphism ( SNP ) and gene levels. Opinions were elicited from 10 experts using a two‐round Delphi survey. Empirical evidence was obtained by comparing the frequency of each type of characteristic in SNP s established as being associated with seven disease traits through GWA meta‐analysis and independent replication, with the corresponding frequency in a randomly selected set of SNP s. SNP and gene characteristics were retrieved using a specially developed bioinformatics tool. Both the expert and the empirical evidence rated previous association in a meta‐analysis or more than one study as conferring the highest relative probability of true association, whereas previous association in a single study ranked much lower. High relative probabilities were also observed for location in a functional protein domain, although location in a region evolutionarily conserved in vertebrates was ranked high by the data but not by the experts. Our empirical evidence did not support the importance attributed by the experts to whether the gene encodes a protein in a pathway or shows interactions relevant to the trait. Our findings provide insight into the selection and weighting of different types of knowledge in SNP or gene prioritization, and point to areas requiring further research.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/96262/1/gepi21705.pd

    SNP Prioritization Using a B ayesian Probability of Association

    Full text link
    Prioritization is the process whereby a set of possible candidate genes or SNP s is ranked so that the most promising can be taken forward into further studies. In a genome‐wide association study, prioritization is usually based on the P ‐values alone, but researchers sometimes take account of external annotation information about the SNP s such as whether the SNP lies close to a good candidate gene. Using external information in this way is inherently subjective and is often not formalized, making the analysis difficult to reproduce. Building on previous work that has identified 14 important types of external information, we present an approximate B ayesian analysis that produces an estimate of the probability of association. The calculation combines four sources of information: the genome‐wide data, SNP information derived from bioinformatics databases, empirical SNP weights, and the researchers’ subjective prior opinions. The calculation is fast enough that it can be applied to millions of SNPS and although it does rely on subjective judgments, those judgments are made explicit so that the final SNP selection can be reproduced. We show that the resulting probability of association is intuitively more appealing than the P ‐value because it is easier to interpret and it makes allowance for the power of the study. We illustrate the use of the probability of association for SNP prioritization by applying it to a meta‐analysis of kidney function genome‐wide association studies and demonstrate that SNP selection performs better using the probability of association compared with P ‐values alone.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/96317/1/gepi21704.pd

    Genetic Determinants of Circulating Sphingolipid Concentrations in European Populations

    Get PDF
    Sphingolipids have essential roles as structural components of cell membranes and in cell signalling, and disruption of their metabolism causes several diseases, with diverse neurological, psychiatric, and metabolic consequences. Increasingly, variants within a few of the genes that encode enzymes involved in sphingolipid metabolism are being associated with complex disease phenotypes. Direct experimental evidence supports a role of specific sphingolipid species in several common complex chronic disease processes including atherosclerotic plaque formation, myocardial infarction (MI), cardiomyopathy, pancreatic beta-cell failure, insulin resistance, and type 2 diabetes mellitus. Therefore, sphingolipids represent novel and important intermediate phenotypes for genetic analysis, yet little is known about the major genetic variants that influence their circulating levels in the general population. We performed a genome-wide association study (GWAS) between 318,237 single-nucleotide polymorphisms (SNPs) and levels of circulating sphingomyelin (SM), dihydrosphingomyelin (Dih-SM), ceramide (Cer), and glucosylceramide (GluCer) single lipid species (33 traits); and 43 matched metabolite ratios measured in 4,400 subjects from five diverse European populations. Associated variants (32) in five genomic regions were identified with genome-wide significant corrected p-values ranging down to 9.08 x 10(-66). The strongest associations were observed in or near 7 genes functionally involved in ceramide biosynthesis and trafficking: SPTLC3, LASS4, SGPP1, ATP10D, and FADS1-3. Variants in 3 loci (ATP10D, FADS3, and SPTLC3) associate with MI in a series of three German MI studies. An additional 70 variants across 23 candidate genes involved in sphingolipid-metabolizing pathways also demonstrate association (p = 10(-4) or less). Circulating concentrations of several key components in sphingolipid metabolism are thus under strong genetic control, and variants in these loci can be tested for a role in the development of common cardiovascular, metabolic, neurological, and psychiatric diseases

    On the origin of European sheep as revealed by the diversity of the Balkan breeds and by optimizing population-genetic analysis tools

    Get PDF
    Background: In the Neolithic, domestic sheep migrated into Europe and subsequently spread in westerly and northwesterly directions. Reconstruction of these migrations and subsequent genetic events requires a more detailed characterization of the current phylogeographic differentiation. Results: We collected 50 K single nucleotide polymorphism (SNP) profiles of Balkan sheep that are currently found near the major Neolithic point of entry into Europe, and combined these data with published genotypes from southwest-Asian, Mediterranean, central-European and north-European sheep and from Asian and European mouflons. We detected clines, ancestral components and admixture by using variants of common analysis tools: geography-informative supervised principal component analysis (PCA), breed-specific admixture analysis, across-breed f 4 profiles and phylogenetic analysis of regional pools of breeds. The regional Balkan sheep populations exhibit considerable genetic overlap, but are clearly distinct from the breeds in surrounding regions. The Asian mouflon did not influence the differentiation of the European domestic sheep and is only distantly related to present-day sheep, including those from Iran where the mouflons were sampled. We demonstrate the occurrence, from southeast to northwest Europe, of a continuously increasing ancestral component of up to 20% contributed by the European mouflon, which is assumed to descend from the original Neolithic domesticates. The overall patterns indicate that the Balkan region and Italy served as post-domestication migration hubs, from which wool sheep reached Spain and north Italy with subsequent migrations northwards. The documented dispersal of Tarentine wool sheep during the Roman period may have been part of this process. Our results also reproduce the documented 18th century admixture of Spanish Merino sheep into several central-European breeds. Conclusions: Our results contribute to a better understanding of the events that have created the present diversity pattern, which is relevant for the management of the genetic resources represented by the European sheep population
    • 

    corecore