896 research outputs found

    How and why DNA barcodes underestimate the diversity of microbial eukaryotes

    Get PDF
    Background: Because many picoplanktonic eukaryotic species cannot currently be maintained in culture, direct sequencing of PCR-amplified 18S ribosomal gene DNA fragments from filtered sea-water has been successfully used to investigate the astounding diversity of these organisms. The recognition of many novel planktonic organisms is thus based solely on their 18S rDNA sequence. However, a species delimited by its 18S rDNA sequence might contain many cryptic species, which are highly differentiated in their protein coding sequences. Principal Findings: Here, we investigate the issue of species identification from one gene to the whole genome sequence. Using 52 whole genome DNA sequences, we estimated the global genetic divergence in protein coding genes between organisms from different lineages and compared this to their ribosomal gene sequence divergences. We show that this relationship between proteome divergence and 18S divergence is lineage dependant. Unicellular lineages have especially low 18S divergences relative to their protein sequence divergences, suggesting that 18S ribosomal genes are too conservative to assess planktonic eukaryotic diversity. We provide an explanation for this lineage dependency, which suggests that most species with large effective population sizes will show far less divergence in 18S than protein coding sequences. Conclusions: There is therefore a trade-off between using genes that are easy to amplify in all species, but which by their nature are highly conserved and underestimate the true number of species, and using genes that give a better description of the number of species, but which are more difficult to amplify. We have shown that this trade-off differs between unicellular and multicellular organisms as a likely consequence of differences in effective population sizes. We anticipate that biodiversity of microbial eukaryotic species is underestimated and that numerous ''cryptic species'' will become discernable with the future acquisition of genomic and metagenomic sequences

    Allele-specific miRNA-binding analysis identifies candidate target genes for breast cancer risk

    Get PDF
    Most breast cancer (BC) risk-associated single-nucleotide polymorphisms (raSNPs) identified in genome-wide association studies (GWAS) are believed to cis-regulate the expression of genes. We hypothesise that cis-regulatory variants contributing to disease risk may be affecting microRNA (miRNA) genes and/or miRNA binding. To test this, we adapted two miRNA-binding prediction algorithms-TargetScan and miRanda-to perform allele-specific queries, and integrated differential allelic expression (DAE) and expression quantitative trait loci (eQTL) data, to query 150 genome-wide significant ( P≤5×10-8 ) raSNPs, plus proxies. We found that no raSNP mapped to a miRNA gene, suggesting that altered miRNA targeting is an unlikely mechanism involved in BC risk. Also, 11.5% (6 out of 52) raSNPs located in 3'-untranslated regions of putative miRNA target genes were predicted to alter miRNA::mRNA (messenger RNA) pair binding stability in five candidate target genes. Of these, we propose RNF115, at locus 1q21.1, as a strong novel target gene associated with BC risk, and reinforce the role of miRNA-mediated cis-regulation at locus 19p13.11. We believe that integrating allele-specific querying in miRNA-binding prediction, and data supporting cis-regulation of expression, improves the identification of candidate target genes in BC risk, as well as in other common cancers and complex diseases.Funding Agency Portuguese Foundation for Science and Technology CRESC ALGARVE 2020 European Union (EU) 303745 Maratona da Saude Award DL 57/2016/CP1361/CT0042 SFRH/BPD/99502/2014 CBMR-UID/BIM/04773/2013 POCI-01-0145-FEDER-022184info:eu-repo/semantics/publishedVersio

    A Model-Based Analysis of GC-Biased Gene Conversion in the Human and Chimpanzee Genomes

    Get PDF
    GC-biased gene conversion (gBGC) is a recombination-associated process that favors the fixation of G/C alleles over A/T alleles. In mammals, gBGC is hypothesized to contribute to variation in GC content, rapidly evolving sequences, and the fixation of deleterious mutations, but its prevalence and general functional consequences remain poorly understood. gBGC is difficult to incorporate into models of molecular evolution and so far has primarily been studied using summary statistics from genomic comparisons. Here, we introduce a new probabilistic model that captures the joint effects of natural selection and gBGC on nucleotide substitution patterns, while allowing for correlations along the genome in these effects. We implemented our model in a computer program, called phastBias, that can accurately detect gBGC tracts about 1 kilobase or longer in simulated sequence alignments. When applied to real primate genome sequences, phastBias predicts gBGC tracts that cover roughly 0.3% of the human and chimpanzee genomes and account for 1.2% of human-chimpanzee nucleotide differences. These tracts fall in clusters, particularly in subtelomeric regions; they are enriched for recombination hotspots and fast-evolving sequences; and they display an ongoing fixation preference for G and C alleles. They are also significantly enriched for disease-associated polymorphisms, suggesting that they contribute to the fixation of deleterious alleles. The gBGC tracts provide a unique window into historical recombination processes along the human and chimpanzee lineages. They supply additional evidence of long-term conservation of megabase-scale recombination rates accompanied by rapid turnover of hotspots. Together, these findings shed new light on the evolutionary, functional, and disease implications of gBGC. The phastBias program and our predicted tracts are freely available. © 2013 Capra et al

    Targeted genetic testing for familial hypercholesterolaemia using next generation sequencing:a population-based study

    Get PDF
    Background<p></p> Familial hypercholesterolaemia (FH) is a common Mendelian condition which, untreated, results in premature coronary heart disease. An estimated 88% of FH cases are undiagnosed in the UK. We previously validated a method for FH mutation detection in a lipid clinic population using next generation sequencing (NGS), but this did not address the challenge of identifying index cases in primary care where most undiagnosed patients receive healthcare. Here, we evaluate the targeted use of NGS as a potential route to diagnosis of FH in a primary care population subset selected for hypercholesterolaemia.<p></p> Methods<p></p> We used microfluidics-based PCR amplification coupled with NGS and multiplex ligation-dependent probe amplification (MLPA) to detect mutations in LDLR, APOB and PCSK9 in three phenotypic groups within the Generation Scotland: Scottish Family Health Study including 193 individuals with high total cholesterol, 232 with moderately high total cholesterol despite cholesterol-lowering therapy, and 192 normocholesterolaemic controls.<p></p> Results<p></p> Pathogenic mutations were found in 2.1% of hypercholesterolaemic individuals, in 2.2% of subjects on cholesterol-lowering therapy and in 42% of their available first-degree relatives. In addition, variants of uncertain clinical significance (VUCS) were detected in 1.4% of the hypercholesterolaemic and cholesterol-lowering therapy groups. No pathogenic variants or VUCS were detected in controls.<p></p> Conclusions<p></p> We demonstrated that population-based genetic testing using these protocols is able to deliver definitive molecular diagnoses of FH in individuals with high cholesterol or on cholesterol-lowering therapy. The lower cost and labour associated with NGS-based testing may increase the attractiveness of a population-based approach to FH detection compared to genetic testing with conventional sequencing. This could provide one route to increasing the present low percentage of FH cases with a genetic diagnosis

    Quantitative analysis of chromatin interaction changes upon a 4.3 Mb deletion at mouse 4E2

    Get PDF
    BACKGROUND: Circular chromosome conformation capture (4C) has provided important insights into three dimensional (3D) genome organization and its critical impact on the regulation of gene expression. We developed a new quantitative framework based on polymer physics for the analysis of paired-end sequencing 4C (PE-4Cseq) data. We applied this strategy to the study of chromatin interaction changes upon a 4.3 Mb DNA deletion in mouse region 4E2. RESULTS: A significant number of differentially interacting regions (DIRs) and chromatin compaction changes were detected in the deletion chromosome compared to a wild-type (WT) control. Selected DIRs were validated by 3D DNA FISH experiments, demonstrating the robustness of our pipeline. Interestingly, significant overlaps of DIRs with CTCF/Smc1 binding sites and differentially expressed genes were observed. CONCLUSIONS: Altogether, our PE-4Cseq analysis pipeline provides a comprehensive characterization of DNA deletion effects on chromatin structure and function

    Extending reference assembly models

    Get PDF
    The human genome reference assembly is crucial for aligning and analyzing sequence data, and for genome annotation, among other roles. However, the models and analysis assumptions that underlie the current assembly need revising to fully represent human sequence diversity. Improved analysis tools and updated data reporting formats are also required

    Sequence locally, think globally:The Darwin tree of life project

    Get PDF
    The goals of the Earth Biogenome Project—to sequence the genomes of all eukaryotic life on earth—are as daunting as they are ambitious. The Darwin Tree of Life Project was founded to demonstrate the credibility of these goals and to deliver at-scale genome sequences of unprecedented quality for a biogeographic region: the archipelago of islands that constitute Britain and Ireland. The Darwin Tree of Life Project is a collaboration between biodiversity organizations (museums, botanical gardens, and biodiversity institutes) and genomics institutes. Together, we have built a workflow that collects specimens from the field, robustly identifies them, performs sequencing, generates high-quality, curated assemblies, and releases these openly for the global community to use to build future science and conservation efforts.</jats:p

    Deriving a mutation index of carcinogenicity using protein structure and protein interfaces

    Get PDF
    With the advent of Next Generation Sequencing the identification of mutations in the genomes of healthy and diseased tissues has become commonplace. While much progress has been made to elucidate the aetiology of disease processes in cancer, the contributions to disease that many individual mutations make remain to be characterised and their downstream consequences on cancer phenotypes remain to be understood. Missense mutations commonly occur in cancers and their consequences remain challenging to predict. However, this knowledge is becoming more vital, for both assessing disease progression and for stratifying drug treatment regimes. Coupled with structural data, comprehensive genomic databases of mutations such as the 1000 Genomes project and COSMIC give an opportunity to investigate general principles of how cancer mutations disrupt proteins and their interactions at the molecular and network level. We describe a comprehensive comparison of cancer and neutral missense mutations; by combining features derived from structural and interface properties we have developed a carcinogenicity predictor, InCa (Index of Carcinogenicity). Upon comparison with other methods, we observe that InCa can predict mutations that might not be detected by other methods. We also discuss general limitations shared by all predictors that attempt to predict driver mutations and discuss how this could impact high-throughput predictions. A web interface to a server implementation is publicly available at http://inca.icr.ac.uk/
    • …
    corecore