243 research outputs found
Effect of the assignment of ancestral CpG state on the estimation of nucleotide substitution rates in mammals
<p>Abstract</p> <p>Background</p> <p>Molecular evolutionary studies in mammals often estimate nucleotide substitution rates within and outside CpG dinucleotides separately. Frequently, in alignments of two sequences, the division of sites into CpG and non-CpG classes is based simply on the presence or absence of a CpG dinucleotide in either sequence, a procedure that we refer to as CpG/non-CpG assignment. Although it likely that this procedure is biased, it is generally assumed that the bias is negligible if species are very closely related.</p> <p>Results</p> <p>Using simulations of DNA sequence evolution we show that assignment of the ancestral CpG state based on the simple presence/absence of the CpG dinucleotide can seriously bias estimates of the substitution rate, because many true non-CpG changes are misassigned as CpG. Paradoxically, this bias is most severe between closely related species, because a minimum of two substitutions are required to misassign a true ancestral CpG site as non-CpG whereas only a single substitution is required to misassign a true ancestral non-CpG site as CpG in a two branch tree. We also show that CpG misassignment bias differentially affects fourfold degenerate and noncoding sites due to differences in base composition such that fourfold degenerate sites can appear to be evolving more slowly than noncoding sites. We demonstrate that the effects predicted by our simulations occur in a real evolutionary setting by comparing substitution rates estimated from human-chimp coding and intronic sequence using CpG/non-CpG assignment with estimates derived from a method that is largely free from bias.</p> <p>Conclusion</p> <p>Our study demonstrates that a common method of assigning sites into CpG and non CpG classes in pairwise alignments is seriously biased and recommends against the adoption of <it>ad hoc </it>methods of ancestral state assignment.</p
Recommended from our members
Selective Constraints in Experimentally Defined Primate Regulatory Regions
Changes in gene regulation may be important in evolution. However, the evolutionary properties of regulatory mutations are currently poorly understood. This is partly the result of an incomplete annotation of functional regulatory DNA in many species. For example, transcription factor binding sites (TFBSs), a major component of eukaryotic regulatory architecture, are typically short, degenerate, and therefore difficult to differentiate from randomly occurring, nonfunctional sequences. Furthermore, although sites such as TFBSs can be computationally predicted using evolutionary conservation as a criterion, estimates of the true level of selective constraint (defined as the fraction of strongly deleterious mutations occurring at a locus) in regulatory regions will, by definition, be upwardly biased in datasets that are a priori evolutionarily conserved. Here we investigate the fitness effects of regulatory mutations using two complementary datasets of human TFBSs that are likely to be relatively free of ascertainment bias with respect to evolutionary conservation but, importantly, are supported by experimental data. The first is a collection of almost >2,100 human TFBSs drawn from the literature in the TRANSFAC database, and the second is derived from several recent high-throughput chromatin immunoprecipitation coupled with genomic microarray (ChIP-chip) analyses. We also define a set of putative cis-regulatory modules (pCRMs) by spatially clustering multiple TFBSs that regulate the same gene. We find that a relatively high proportion (∼37%) of mutations at TFBSs are strongly deleterious, similar to that at a 2-fold degenerate protein-coding site. However, constraint is significantly reduced in human and chimpanzee pCRMS and ChIP-chip sequences, relative to macaques. We estimate that the fraction of regulatory mutations that have been driven to fixation by positive selection in humans is not significantly different from zero. We also find that the level of selective constraint in our TFBSs, pCRMs, and ChIP-chip sequences is negatively correlated with the expression breadth of the regulated gene, whereas the opposite relationship holds at that gene's nonsynonymous and synonymous sites. Finally, we find that the rate of protein evolution in a transcription factor appears to be positively correlated with the breadth of expression of the gene it regulates. Our study suggests that strongly deleterious regulatory mutations are considerably more likely (1.6-fold) to occur in tissue-specific than in housekeeping genes, implying that there is a fitness cost to increasing “complexity” of gene expression.</p
Cell reprogramming shapes the mitochondrial DNA landscape.
Individual induced pluripotent stem cells (iPSCs) show considerable phenotypic heterogeneity, but the reasons for this are not fully understood. Comprehensively analysing the mitochondrial genome (mtDNA) in 146 iPSC and fibroblast lines from 151 donors, we show that most age-related fibroblast mtDNA mutations are lost during reprogramming. However, iPSC-specific mutations are seen in 76.6% (108/141) of iPSC lines at a mutation rate of 8.62 × 10-5/base pair. The mutations observed in iPSC lines affect a higher proportion of mtDNA molecules, favouring non-synonymous protein-coding and tRNA variants, including known disease-causing mutations. Analysing 11,538 single cells shows stable heteroplasmy in sub-clones derived from the original donor during differentiation, with mtDNA variants influencing the expression of key genes involved in mitochondrial metabolism and epidermal cell differentiation. Thus, the dynamic mtDNA landscape contributes to the heterogeneity of human iPSCs and should be considered when using reprogrammed cells experimentally or as a therapy
Perspective: Is It Time for Advocacy Training in Medical Education?
As the modern medical system becomes increasingly complex, a debate has arisen over the place of advocacy efforts within the medical profession. The authors argue that advocacy can help physicians fulfill their social contract. For physicians to become competent in patient-centered, clinical, administrative, or legislative advocacy, they require professional training. Many professional organizations have called for curricular reform to meet society's health needs during the past 30 years, and the inclusion of advocacy training in undergraduate, graduate, and continuing medical education is supported on both pragmatic and ethical grounds. Undergraduate medical education, especially, is an ideal time for this training because a standard competency can be instilled across all specialties. Although the Accreditation Council for Graduate Medical Education includes advocacy training in curricula for residency programs, few medical schools or residency programs have advocacy electives. By understanding the challenges of the health care system and how to change it for the better, physicians can experience increased professional satisfaction and effectiveness in improving patient care, systems-based practice, and public health
Gene Expression and Isoform Variation Analysis using Affymetrix Exon Arrays
Correction to Bemmo A, Benovoy D, Kwan T, Gaffney DJ, Jensen RV, Majewski J: Gene expression and isoform variation analysis using Affymetrix Exon Arrays. BMC Genomics 2008, 9: 529
Recommended from our members
Souporcell: robust clustering of single-cell RNA-seq data by genotype without reference genotypes.
Methods to deconvolve single-cell RNA-sequencing (scRNA-seq) data are necessary for samples containing a mixture of genotypes, whether they are natural or experimentally combined. Multiplexing across donors is a popular experimental design that can avoid batch effects, reduce costs and improve doublet detection. By using variants detected in scRNA-seq reads, it is possible to assign cells to their donor of origin and identify cross-genotype doublets that may have highly similar transcriptional profiles, precluding detection by transcriptional profile. More subtle cross-genotype variant contamination can be used to estimate the amount of ambient RNA. Ambient RNA is caused by cell lysis before droplet partitioning and is an important confounder of scRNA-seq analysis. Here we develop souporcell, a method to cluster cells using the genetic variants detected within the scRNA-seq reads. We show that it achieves high accuracy on genotype clustering, doublet detection and ambient RNA estimation, as demonstrated across a range of challenging scenarios.We acknowledge the Wellcome Sanger Institute’s DNA Pipelines for construction of the 10x sequencing libraries. We thank Allan Muhwezi and Andrew Russell for assistance with parasite culture and 10x Single-cell 3’ RNA-seq respectively. In addition, we would like to thank Matthew Young for useful conversations about ambient RNA, Mirjana Efremova for providing information about the maternal/fetal data, and Katie Gray for assistance in interpreting the previously unannotated cluster. The Wellcome Sanger Institute is funded by the Wellcome Trust (grant 206194/Z/17/Z), which supports MKNL
and MH. This work was supported by an MRC Career Development Award (G1100339) to MKNL. We would like to acknowledge the Wellcome Trust Sanger Institute as the source of the human induced pluripotent cell lines that were generated under the Human Induced Pluripotent Stem Cell Initiative funded by a grant from the Wellcome Trust and Medical Research Council, supported by the Wellcome Trust (WT098051) and the NIHR/Wellcome Trust Clinical Research Facility, and acknowledges Life Science Technologies Corporation as the provider of Cytotune (HipSci.org). The Cardiovascular Epidemiology Unit is supported by core funding from the UK Medical Research Council (MR/L003120/1), the British Heart Foundation (RG/13/13/30194; RG/18/13/33946) and the National Institute for Health Research [Cambridge Biomedical Research Centre at the Cambridge University Hospital’s NHS Foundation Trust]. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care
Genomic Selective Constraints in Murid Noncoding DNA
Recent work has suggested that there are many more selectively constrained, functional noncoding than coding sites in mammalian genomes. However, little is known about how selective constraint varies amongst different classes of noncoding DNA. We estimated the magnitude of selective constraint on a large dataset of mouse-rat gene orthologs and their surrounding noncoding DNA. Our analysis indicates that there are more than three times as many selectively constrained, nonrepetitive sites within noncoding DNA as in coding DNA in murids. The majority of these constrained noncoding sites appear to be located within intergenic regions, at distances greater than 5 kilobases from known genes. Our study also shows that in murids, intron length and mean intronic selective constraint are negatively correlated with intron ordinal number. Our results therefore suggest that functional intronic sites tend to accumulate toward the 5' end of murid genes. Our analysis also reveals that mean number of selectively constrained noncoding sites varies substantially with the function of the adjacent gene. We find that, among others, developmental and neuronal genes are associated with the greatest numbers of putatively functional noncoding sites compared with genes involved in electron transport and a variety of metabolic processes. Combining our estimates of the total number of constrained coding and noncoding bases we calculate that over twice as many deleterious mutations have occurred in intergenic regions as in known genic sequence and that the total genomic deleterious point mutation rate is 0.91 per diploid genome, per generation. This estimated rate is over twice as large as a previous estimate in murids
Controls of nucleosome positioning in the human genome
Nucleosomes are important for gene regulation because their arrangement on the genome can control which proteins bind to DNA. Currently, few human nucleosomes are thought to be consistently positioned across cells; however, this has been difficult to assess due to the limited resolution of existing data. We performed paired-end sequencing of micrococcal nuclease-digested chromatin (MNase-seq) from seven lymphoblastoid cell lines and mapped over 3.6 billion MNase-seq fragments to the human genome to create the highest-resolution map of nucleosome occupancy to date in a human cell type. In contrast to previous results, we find that most nucleosomes have more consistent positioning than expected by chance and a substantial fraction (8.7%) of nucleosomes have moderate to strong positioning. In aggregate, nucleosome sequences have 10 bp periodic patterns in dinucleotide frequency and DNase I sensitivity; and, across cells, nucleosomes frequently have translational offsets that are multiples of 10 bp. We estimate that almost half of the genome contains regularly spaced arrays of nucleosomes, which are enriched in active chromatin domains. Single nucleotide polymorphisms that reduce DNase I sensitivity can disrupt the phasing of nucleosome arrays, which indicates that they often result from positioning against a barrier formed by other proteins. However, nucleosome arrays can also be created by DNA sequence alone. The most striking example is an array of over 400 nucleosomes on chromosome 12 that is created by tandem repetition of sequences with strong positioning properties. In summary, a large fraction of nucleosomes are consistently positioned--in some regions because they adopt favored sequence positions, and in other regions because they are forced into specific arrangements by chromatin remodeling or DNA binding proteins
Recommended from our members
Controls of Nucleosome Positioning in the Human Genome
Nucleosomes are important for gene regulation because their arrangement on the genome can control which proteins bind to DNA. Currently, few human nucleosomes are thought to be consistently positioned across cells; however, this has been difficult to assess due to the limited resolution of existing data. We performed paired-end sequencing of micrococcal nuclease-digested chromatin (MNase–seq) from seven lymphoblastoid cell lines and mapped over 3.6 billion MNase–seq fragments to the human genome to create the highest-resolution map of nucleosome occupancy to date in a human cell type. In contrast to previous results, we find that most nucleosomes have more consistent positioning than expected by chance and a substantial fraction (8.7%) of nucleosomes have moderate to strong positioning. In aggregate, nucleosome sequences have 10 bp periodic patterns in dinucleotide frequency and DNase I sensitivity; and, across cells, nucleosomes frequently have translational offsets that are multiples of 10 bp. We estimate that almost half of the genome contains regularly spaced arrays of nucleosomes, which are enriched in active chromatin domains. Single nucleotide polymorphisms that reduce DNase I sensitivity can disrupt the phasing of nucleosome arrays, which indicates that they often result from positioning against a barrier formed by other proteins. However, nucleosome arrays can also be created by DNA sequence alone. The most striking example is an array of over 400 nucleosomes on chromosome 12 that is created by tandem repetition of sequences with strong positioning properties. In summary, a large fraction of nucleosomes are consistently positioned—in some regions because they adopt favored sequence positions, and in other regions because they are forced into specific arrangements by chromatin remodeling or DNA binding proteins.</p
- …