108 research outputs found
The importance of replicating genomic analyses to verify phylogenetic signal for recently evolved lineages
Genomewide SNP data generated by nontargeted methods such as RAD and GBS are increasingly being used in phylogenetic and phylogeographic analyses. When these methods are used in the absence of a reference genome, however, little is known about the locations and evolution of the SNPs. In using such data to address phylogenetic questions, researchers risk drawing false conclusions, particularly if a representative number of SNPs is not obtained. Here, we empirically test the robustness of phylogenetic inference based on SNP data for closely related lineages. We conducted a genomewide analysis of 75 712 SNPs, generated via GBS, of southern bull-kelp (Durvillaea). Durvillaea chathamensis co-occurs with D. antarctica on Chatham Island, but the two species have previously been found to be so genetically similar that the status of the former has been questioned. Our results show that D. chathamensis, which differs from D. antarctica ecologically as well as morphologically, is indeed a reproductively isolated species. Furthermore, our replicated analyses show that D. chathamensis cannot be reliably distinguished phylogenetically from closely related D. antarctica using subsets (ranging in size from 400 to 10 000 sites) of the 40 912 parsimony-informative SNPs in our data set and that bootstrap values alone can give misleading impressions of the strength of phylogenetic inferences. These results highlight the importance of independently replicating SNP analyses to verify that phylogenetic inferences based on nontargeted SNP data are robust. Our study also demonstrates that modern genomic approaches can be used to identify cases of recent or incipient speciation that traditional approaches (e.g. Sanger sequencing of a few loci) may be unable to detect or resolve.This research was supported by an Australian Research
Council Discovery Early Career Research Award (DE140101715
to CIF) and University of Otago Performance Based Research
Funding (to JMW)
Species trees from consensus single nucleotide polymorphism (SNP) data: Testing phylogenetic approaches with simulated and empirical data
Datasets of hundreds or thousands of SNPs (Single Nucleotide Polymorphisms) from multiple individuals per species are increasingly used to study population structure, species delimitation and shallow phylogenetics. The principal software tool to infer species or population trees from SNP data is currently the BEAST template SNAPP which uses a Bayesian coalescent analysis. However, it is computationally extremely demanding and tolerates only small amounts of missing data. We used simulated and empirical SNPs from plants (Australian Craspedia, Asteraceae, and Pelargonium, Geraniaceae) to compare species trees produced (1) by SNAPP, (2) using SVD quartets, and (3) using Bayesian and parsimony analysis with several different approaches to summarising data from multiple samples into one set of traits per species. Our aims were to explore the impact of tree topology and missing data on the results, and to test which data summarising and analyses approaches would best approximate the results obtained from SNAPP for empirical data. SVD quartets retrieved the correct topology from simulated data, as did SNAPP except in the case of a very unbalanced phylogeny. Both methods failed to retrieve the correct topology when large amounts of data were missing. Bayesian analysis of species level summary data scoring the two alleles of each SNP as independent characters and parsimony analysis of data scoring each SNP as one character produced trees with branch length distributions closest to the true trees on which SNPs were simulated. For empirical data, Bayesian inference and Dollo parsimony analysis of data scored allele-wise produced phylogenies most congruent with the results of SNAPP. In the case of study groups divergent enough for missing data to be phylogenetically informative (because of additional mutations preventing amplification of genomic fragments or bioinformatic establishment of homology), scoring of SNP data as a presence/absence matrix irrespective of allele content might be an additional option. As this depends on sampling across species being reasonably even and a random distribution of non-informative instances of missing data, however, further exploration of this approach is needed. Properly chosen data summary approaches to inferring species trees from SNP data may represent a potential alternative to currently available individual-level coalescent analyses especially for quick data exploration and when dealing with computationally demanding or patchy datasets.This study was partly supported by a Centre of Biodiversity Analysis Ignition Grant to A.N.S.-L. and Justin Borevitz in 2013/14
Magnetic field generation by pointwise zero-helicity three-dimensional steady flow of incompressible electrically conducting fluid
We introduce six families of three-dimensional space-periodic steady
solenoidal flows, whose kinetic helicity density is zero at any point. Four
families are analytically defined. Flows in four families have zero helicity
spectrum. Sample flows from five families are used to demonstrate numerically
that neither zero kinetic helicity density, nor zero helicity spectrum prohibit
generation of large-scale magnetic field by the two most prominent dynamo
mechanisms: the magnetic -effect and negative eddy diffusivity. Our
computations also attest that such flows often generate small-scale field for
sufficiently small magnetic molecular diffusivity. These findings indicate that
kinetic helicity and helicity spectrum are not the quantities controlling the
dynamo properties of a flow regardless of whether scale separation is present
or not.Comment: 37 pages, 11 figures, 54 reference
Recurrent miscalling of missense variation from short-read genome sequence data
Background: Short-read resequencing of genomes produces abundant information of the genetic variation of individuals. Due to their numerous nature, these variants are rarely exhaustively validated. Furthermore, low levels of undetected variant miscalling will have a systematic and disproportionate impact on the interpretation of individual genome sequence information, especially should these also be carried through into in reference databases ofgenomic variation.
Results: We find that sequence variation from short-read sequence data is subject to recurrent-yet-intermittent miscalling that occurs in a sequence intrinsic manner and is very sensitive to sequence read length. The miscalls arise from difficulties aligning short reads to redundant genomic regions, where the rate of sequencing error
approaches the sequence diversity between redundant regions. We find the resultant miscalled variants to be sensitive to small sequence variations between genomes, and thereby are often intrinsic to an individual, pedigree, strain or human ethnic group. In human exome sequences, we identify 2–300 recurrent false positive variants per individual, almost all of which are present in public databases of human genomic variation. From the exomes of non-reference strains of inbred mice, we identify 3–5000 recurrent false positive variants per mouse – the number of which increasing with greater distance between an individual mouse strain and the reference C57BL6 mouse genome. We show that recurrently miscalled variants may be reproduced for a given genome from repeated simulation rounds of read resampling, realignment and recalling. As such, it is possible to identify more than two-thirds of false positive variation from only ten rounds of simulation.
Conclusion: Identification and removal of recurrent false positive variants from specific individual variant sets will improve overall data quality. Variant miscalls arising are highly sequence intrinsic and are often specific to an individual, pedigree or ethnicity. Further, read length is a strong determinant of whether given false variants will be called for any given genome – which has profound significance for cohort studies that pool datasets collected and sequenced at different points in time
Dynamic Interplay of Innate and Adaptive Immunity During Sterile Retinal Inflammation: Insights From the Transcriptome
The pathogenesis of many retinal degenerations, such as age-related macular degeneration (AMD), is punctuated by an ill-defined network of sterile inflammatory responses. The delineation of innate and adaptive immune milieu among the broad leukocyte infiltrate, and the gene networks, which construct these responses, are poorly described in the eye. Using photo-oxidative damage in a rodent model of subretinal inflammation, we employed a novel RNA-sequencing framework to map the global gene network signature of retinal leukocytes. This revealed a previously uncharted interplay of adaptive immunity during subretinal inflammation, including prolonged enrichment of myeloid and lymphocyte migration, antigen presentation, and the alternative arm of the complement cascade involving Factor B. We demonstrate Factor B-deficient mice are protected against macrophage infiltration and subretinal inflammation. Suppressing the drivers of retinal leukocyte proliferation, or their capacity to elicit complement responses, may help preserve retinal structure and function during sterile inflammation in diseases such as AMD
Isolation by distance and isolation by environment contribute to population differentiation in Protea repens (Proteaceae L.), a widespread South African species
PREMISE OF THE STUDY: The Cape Floristic Region (CFR) of South Africa is renowned for its botanical diversity, but the evolutionary origins of this diversity remain controversial. Both neutral and adaptive processes have been implicated in driving diversification, but population-level studies of plants in the CFR are rare. Here, we investigate the limits to gene flow and potential environmental drivers of selection in Protea repens L. (Proteaceae L.), a widespread CFR species. METHODS: We sampled 19 populations across the range of P. repens and used genotyping by sequencing to identify 2066 polymorphic loci in 663 individuals. We used a Bayesian FST outlier analysis to identify single-nucleotide polymorphisms (SNPs) marking genomic regions that may be under selection; we used those SNPs to identify potential drivers of selection and excluded them from analyses of gene flow and genetic structure. RESULTS: A pattern of isolation by distance suggested limited gene flow between nearby populations. The populations of P. repens fell naturally into two or three groupings, which corresponded to an east-west split. Differences in rainfall seasonality contributed to diversification in highly divergent loci, as do barriers to gene flow that have been identified in other species. CONCLUSIONS: The strong pattern of isolation by distance is in contrast to the findings in the only other widespread species in the CFR that has been similarly studied, while the effects of rainfall seasonality are consistent with well-known patterns. Assessing the generality of these results will require investigations of other CFR species.This work was supported
by the National Science Foundation (DEB-1046328). Seeds were
collected under Cape Nature permits AAA005-00214-0028 and
AAA005-00224-0028 and Eastern Cape Province permit CRO
4/11 C
Systems-guided forward genetic screen reveals a critical role of the replication stress response protein ETAA1 in T cell clonal expansion
T-cell immunity requires extremely rapid clonal proliferation of rare, antigen-specific T lymphocytes to form effector cells. Here we identify a critical role for ETAA1 in this process by surveying random germ line mutations in mice using exome sequencing and bioinformatic annotation to prioritize mutations in genes of unknown function with potential effects on the immune system, followed by breeding to homozygosity and testing for immune system phenotypes. Effector CD8+ and CD4+ T-cell formation following immunization, lymphocytic choriomeningitis virus (LCMV) infection, or herpes simplex virus 1 (HSV1) infection was profoundly decreased despite normal immune cell development in adult mice homozygous for two different Etaa1 mutations: an exon 2 skipping allele that deletes Gly78-Leu119, and a Cys166Stop truncating allele that eliminates most of the 877-aa protein. ETAA1 deficiency decreased clonal expansion cell autonomously within the responding T cells, causing no decrease in their division rate but increasing TP53-induced mRNAs and phosphorylation of H2AX, a marker of DNA replication stress induced by the ATM and ATR kinases. Homozygous ETAA1-deficient adult mice were otherwise normal, healthy, and fertile, although slightly smaller, and homozygotes were born at lower frequency than expected, consistent with partial lethality after embryonic day 12. Taken together with recently reported evidence in human cancer cell lines that ETAA1 activates ATR kinase through an exon 2-encoded domain, these findings reveal a surprisingly specific requirement for this ATR activator in adult mice restricted to rapidly dividing effector T cells. This specific requirement may provide new ways to suppress pathological T-cell responses in transplantation or autoimmunity.This work was funded by National
Institutes of Health Grant U19-AI100627; by the National Health and Medical
Research Council through Program Grants 1016953 and 1113904, Australia
Fellowship 585490, Senior Principal Research Fellowship 1081858, and C. J. Martin
Early Career Fellowship 585518 (to I.A.P.); and by the National Collaborative
Research Infrastructure Strategy
Dichloroacetate prevents cisplatin-induced nephrotoxicity without compromising cisplatin anticancer properties
Cisplatin is an effective anticancer drug; however, cisplatin use often leads to nephrotoxicity, which limits its clinical effectiveness. In this study, we determined the effect of dichloroacetate, a novel anticancer agent, in a mouse model of cisplatin-induced AKI. Pretreatment with dichloroacetate significantly attenuated the cisplatin-induced increase in BUN and serum creatinine levels, renal tubular apoptosis, and oxidative stress. Additionally, pretreatment with dichloroacetate accelerated tubular regeneration after cisplatin-induced renal damage. Whole transcriptome sequencing revealed that dichloroacetate prevented mitochondrial dysfunction and preserved the energy-generating capacity of the kidneys by preventing the cisplatin-induced downregulation of fatty acid and glucose oxidation, and of genes involved in the Krebs cycle and oxidative phosphorylation. Notably, dichloroacetate did not interfere with the anticancer activity of cisplatin in vivo. These data provide strong evidence that dichloroacetate preserves renal function when used in conjunction with cisplatin
Tissue and cell-specific transcriptomes in cotton reveal the subtleties of gene regulation underlying the diversity of plant secondary cell walls
Background
Knowledge of plant secondary cell wall (SCW) regulation and deposition is mainly based on the Arabidopsis model of a ‘typical’ lignocellulosic SCW. However, SCWs in other plants can vary from this. The SCW of mature cotton seed fibres is highly cellulosic and lacks lignification whereas xylem SCWs are lignocellulosic. We used cotton as a model to study different SCWs and the expression of the genes involved in their formation via RNA deep sequencing and chemical analysis of stem and seed fibre.
Results
Transcriptome comparisons from cotton xylem and pith as well as from a developmental series of seed fibres revealed tissue-specific and developmentally regulated expression of several NAC transcription factors some of which are likely to be important as top tier regulators of SCW formation in xylem and/or seed fibre. A so far undescribed hierarchy was identified between the top tier NAC transcription factors SND1-like and NST1/2 in cotton. Key SCW MYB transcription factors, homologs of Arabidopsis MYB46/83, were practically absent in cotton stem xylem. Lack of expression of other lignin-specific MYBs in seed fibre relative to xylem could account for the lack of lignin deposition in seed fibre. Expression of a MYB103 homolog correlated with temporal expression of SCW CesAs and cellulose synthesis in seed fibres. FLAs were highly expressed and may be important structural components of seed fibre SCWs. Finally, we made the unexpected observation that cell walls in the pith of cotton stems contained lignin and had a higher S:G ratio than in xylem, despite that tissue’s lacking many of the gene transcripts normally associated with lignin biosynthesis.
Conclusions
Our study in cotton confirmed some features of the currently accepted gene regulatory cascade for ‘typical’ plant SCWs, but also revealed substantial differences, especially with key downstream NACs and MYBs. The lignocellulosic SCW of cotton xylem appears to be achieved differently from that in Arabidopsis. Pith cell walls in cotton stems are compositionally very different from that reported for other plant species, including Arabidopsis. The current definition of a ‘typical’ primary or secondary cell wall might not be applicable to all cell types in all plant species.CPM was funded by Cotton Breeding Australia, a joint venture between
Cotton Seed Distributors and CSIRO (Project No. CBA19). HB was funded
by the CSIRO’s Office of the Chief Executive (OCE) Postdoctoral Fellowship
program. YT and JR were funded in part by Stanford University’s Global
Climate and Energy Program, and in part by the DOE Great Lakes Bioenergy
Research Center (DOE BER Office of Science, DE-FC02–07ER6449
Recurrent miscalling of missense variation from short-read genome sequence data
Background
Short-read resequencing of genomes produces abundant information of the genetic variation of individuals. Due to their numerous nature, these variants are rarely exhaustively validated. Furthermore, low levels of undetected variant miscalling will have a systematic and disproportionate impact on the interpretation of individual genome sequence information, especially should these also be carried through into in reference databases of genomic variation.
Results
We find that sequence variation from short-read sequence data is subject to recurrent-yet-intermittent miscalling that occurs in a sequence intrinsic manner and is very sensitive to sequence read length. The miscalls arise from difficulties aligning short reads to redundant genomic regions, where the rate of sequencing error approaches the sequence diversity between redundant regions. We find the resultant miscalled variants to be sensitive to small sequence variations between genomes, and thereby are often intrinsic to an individual, pedigree, strain or human ethnic group. In human exome sequences, we identify 2–300 recurrent false positive variants per individual, almost all of which are present in public databases of human genomic variation. From the exomes of non-reference strains of inbred mice, we identify 3–5000 recurrent false positive variants per mouse – the number of which increasing with greater distance between an individual mouse strain and the reference C57BL6 mouse genome. We show that recurrently miscalled variants may be reproduced for a given genome from repeated simulation rounds of read resampling, realignment and recalling. As such, it is possible to identify more than two-thirds of false positive variation from only ten rounds of simulation.
Conclusion
Identification and removal of recurrent false positive variants from specific individual variant sets will improve overall data quality. Variant miscalls arising are highly sequence intrinsic and are often specific to an individual, pedigree or ethnicity. Further, read length is a strong determinant of whether given false variants will be called for any given genome – which has profound significance for cohort studies that pool datasets collected and sequenced at different points in time.This work has been funded by National Institutes of Health Grant AI100627
and the National Collaborative Research Infrastructure Strategy (Australia).
Publication costs are funded by National Institutes of Health Grant AI100627
and the National Collaborative Research Infrastructure Strategy (Australia)
- …