202 research outputs found

    Genome-wide comparative analysis reveals human-mouse regulatory landscape and evolution.

    Get PDF
    BACKGROUND: Because species-specific gene expression is driven by species-specific regulation, understanding the relationship between sequence and function of the regulatory regions in different species will help elucidate how differences among species arise. Despite active experimental and computational research, relationships among sequence, conservation, and function are still poorly understood. RESULTS: We compared transcription factor occupied segments (TFos) for 116 human and 35 mouse TFs in 546 human and 125 mouse cell types and tissues from the Human and the Mouse ENCODE projects. We based the map between human and mouse TFos on a one-to-one nucleotide cross-species mapper, bnMapper, that utilizes whole genome alignments (WGA). Our analysis shows that TFos are under evolutionary constraint, but a substantial portion (25.1% of mouse and 25.85% of human on average) of the TFos does not have a homologous sequence on the other species; this portion varies among cell types and TFs. Furthermore, 47.67% and 57.01% of the homologous TFos sequence shows binding activity on the other species for human and mouse respectively. However, 79.87% and 69.22% is repurposed such that it binds the same TF in different cells or different TFs in the same cells. Remarkably, within the set of repurposed TFos, the corresponding genome regions in the other species are preferred locations of novel TFos. These events suggest exaptation of some functional regulatory sequences into new function. Despite TFos repurposing, we did not find substantial changes in their predicted target genes, suggesting that CRMs buffer evolutionary events allowing little or no change in the TFos - target gene associations. Thus, the small portion of TFos with strictly conserved occupancy underestimates the degree of conservation of regulatory interactions. CONCLUSION: We mapped regulatory sequences from an extensive number of TFs and cell types between human and mouse using WGA. A comparative analysis of this correspondence unveiled the extent of the shared regulatory sequence across TFs and cell types under study. Importantly, a large part of the shared regulatory sequence is repurposed on the other species. This sequence, fueled by turnover events, provides a strong case for exaptation in regulatory elements

    Approximating Weighted Duo-Preservation in Comparative Genomics

    Full text link
    Motivated by comparative genomics, Chen et al. [9] introduced the Maximum Duo-preservation String Mapping (MDSM) problem in which we are given two strings s1s_1 and s2s_2 from the same alphabet and the goal is to find a mapping π\pi between them so as to maximize the number of duos preserved. A duo is any two consecutive characters in a string and it is preserved in the mapping if its two consecutive characters in s1s_1 are mapped to same two consecutive characters in s2s_2. The MDSM problem is known to be NP-hard and there are approximation algorithms for this problem [3, 5, 13], but all of them consider only the "unweighted" version of the problem in the sense that a duo from s1s_1 is preserved by mapping to any same duo in s2s_2 regardless of their positions in the respective strings. However, it is well-desired in comparative genomics to find mappings that consider preserving duos that are "closer" to each other under some distance measure [19]. In this paper, we introduce a generalized version of the problem, called the Maximum-Weight Duo-preservation String Mapping (MWDSM) problem that captures both duos-preservation and duos-distance measures in the sense that mapping a duo from s1s_1 to each preserved duo in s2s_2 has a weight, indicating the "closeness" of the two duos. The objective of the MWDSM problem is to find a mapping so as to maximize the total weight of preserved duos. In this paper, we give a polynomial-time 6-approximation algorithm for this problem.Comment: Appeared in proceedings of the 23rd International Computing and Combinatorics Conference (COCOON 2017

    Platypus globin genes and flanking loci suggest a new insertional model for beta-globin evolution in birds and mammals

    Get PDF
    Background: Vertebrate alpha (α)- and beta (β)-globin gene families exemplify the way in which genomes evolve to produce functional complexity. From tandem duplication of a single globin locus, the α- and β-globin clusters expanded, and then were separated onto different chromosomes. The previous finding of a fossil β-globin gene (ω) in the marsupial α-cluster, however, suggested that duplication of the α-β cluster onto two chromosomes, followed by lineage-specific gene loss and duplication, produced paralogous α- and β-globin clusters in birds and mammals. Here we analyse genomic data from an egg-laying monotreme mammal, the platypus (Ornithorhynchus anatinus), to explore haemoglobin evolution at the stem of the mammalian radiation. Results: The platypus α-globin cluster (chromosome 21) contains embryonic and adult α- globin genes, a β-like ω-globin gene, and the GBY globin gene with homology to cytoglobin, arranged as 5'-ζ-ζ'-αD-α3-α2-α1-ω-GBY-3'. The platypus β-globin cluster (chromosome 2) contains single embryonic and adult globin genes arranged as 5'-ε-β-3'. Surprisingly, all of these globin genes were expressed in some adult tissues. Comparison of flanking sequences revealed that all jawed vertebrate α-globin clusters are flanked by MPG-C16orf35 and LUC7L, whereas all bird and mammal β-globin clusters are embedded in olfactory genes. Thus, the mammalian α- and β-globin clusters are orthologous to the bird α- and β-globin clusters respectively. Conclusion: We propose that α- and β-globin clusters evolved from an ancient MPG-C16orf35-α-β-GBY-LUC7L arrangement 410 million years ago. A copy of the original β (represented by ω in marsupials and monotremes) was inserted into an array of olfactory genes before the amniote radiation (>315 million years ago), then duplicated and diverged to form orthologous clusters of β-globin genes with different expression profiles in different lineages.Vidushi S. Patel, Steven J.B. Cooper, Janine E. Deakin, Bob Fulton, Tina Graves, Wesley C. Warren, Richard K. Wilson and Jennifer A.M. Grave

    Hb H disease resulting from the association of an α0-thalassemia allele [-(α)20.5] with an unstable α-globin variant [Hb Icaria]: First report on the occurrence in Brazil

    Get PDF
    Hb H Disease is caused by the loss or inactivation of three of the four functional α-globin genes. Patients present chronic hemolytic anemia and splenomegaly. In some cases, occasional blood transfusions are required. Deletions are the main cause of this type of thalassemia ( α-thalassemia). We describe here an unusual case of Hb H disease caused by the combination of a common α0 deletion [-( α) 20.5 ] with a rare point mutation (c.427T > A), thus resulting in an elongated and unstable α-globin variant, Hb Icaria, (X142K), with 31 additional amino-acid residues. Very high levels of Hb H and Hb Bart's were detected in the patient's red blood cells (14.7 and 19.0%, respectively). This is the first description of this infrequent association in the Brazilian population

    Conversion events in gene clusters

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Gene clusters containing multiple similar genomic regions in close proximity are of great interest for biomedical studies because of their associations with inherited diseases. However, such regions are difficult to analyze due to their structural complexity and their complicated evolutionary histories, reflecting a variety of large-scale mutational events. In particular, conversion events can mislead inferences about the relationships among these regions, as traced by traditional methods such as construction of phylogenetic trees or multi-species alignments.</p> <p>Results</p> <p>To correct the distorted information generated by such methods, we have developed an automated pipeline called CHAP (Cluster History Analysis Package) for detecting conversion events. We used this pipeline to analyze the conversion events that affected two well-studied gene clusters (α-globin and β-globin) and three gene clusters for which comparative sequence data were generated from seven primate species: CCL (chemokine ligand), IFN (interferon), and CYP2abf (part of cytochrome P450 family 2). CHAP is freely available at <url>http://www.bx.psu.edu/miller_lab</url>.</p> <p>Conclusions</p> <p>These studies reveal the value of characterizing conversion events in the context of studying gene clusters in complex genomes.</p

    The Hellenic type of nondeletional hereditary persistence of fetal hemoglobin results from a novel mutation (g.-109G>T) in the HBG2 gene promoter

    Get PDF
    Nondeletional hereditary persistence of fetal hemoglobin (nd-HPFH), a rare hereditary condition resulting in elevated levels of fetal hemoglobin (Hb F) in adults, is associated with promoter mutations in the human fetal globin (HBG1 and HBG2) genes. In this paper, we report a novel type of nd-HPFH due to a HBG2 gene promoter mutation (HBG2:g.-109G>T). This mutation, located at the 3′ end of the HBG2 distal CCAAT box, was initially identified in an adult female subject of Central Greek origin and results in elevated Hb F levels (4.1%) and significantly increased Gγ-globin chain production (79.2%). Family studies and DNA analysis revealed that the HBG2:g.-109G>T mutation is also found in the family members in compound heterozygosity with the HBG2:g.-158C>T single nucleotide polymorphism or the silent HBB:g.-101C>T β-thalassemia mutation, resulting in the latter case in significantly elevated Hb F levels (14.3%). Electrophoretic mobility shift analysis revealed that the HBG2:g.-109G>T mutation abolishes a transcription factor binding site, consistent with previous observations using DNA footprinting analysis, suggesting that guanine at position HBG2/1:g.-109 is critical for NF-E3 binding. These data suggest that the HBG2:g-109G>T mutation has a functional role in increasing HBG2 transcription and is responsible for the HPFH phenotype observed in our index cases

    A Common Genetic Variant (97906C>A) of DAB2IP/AIP1 Is Associated with an Increased Risk and Early Onset of Lung Cancer in Chinese Males

    Get PDF
    DOC-2/DAB2 interactive protein (DAB2IP) is a novel identified tumor suppressor gene that inhibits cell growth and facilitates cell apoptosis. One genetic variant in DAB2IP gene was reported to be associated with an increased risk of aggressive prostate cancer recently. Since DAB2IP involves in the development of lung cancer and low expression of DAB2IP are observed in lung cancer, we hypothesized that the variations in DAB2IP gene can increase the genetic susceptibility to lung cancer. In a case-control study of 1056 lung cancer cases and 1056 sex and age frequency-matched cancer-free controls, we investigated the association between two common polymorphisms in DAB2IP gene (−1420T>G, rs7042542; 97906C>A, rs1571801) and the risk of lung cancer. We found that compared with the 97906CC genotypes, carriers of variant genotypes (97906AC+AA) had a significant increased risk of lung cancer (adjusted odds ratio [OR] = 1.33, 95%CI = 1.04–1.70, P = 0.023) and the number of variant (risk) allele worked in a dose-response manner (Ptrend = 0.0158). Further stratification analysis showed that the risk association was more pronounced in subjects aged less than 60 years old, males, non-smokers, non-drinkers, overweight groups and in those with family cancer history in first or second-degree relatives, and the 97906A interacted with overweight on lung cancer risk. We further found the number of risk alleles (97906A allele) were negatively correlated with early diagnosis age of lung cancer in male patients (P = 0.003). However, no significant association was observed on the −1420T>G polymorphism. Our data suggested that the 97906A variant genotypes are associated with the increased risk and early onset of lung cancer, particularly in males

    Local conservation scores without a priori assumptions on neutral substitution rates

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Comparative genomics aims to detect signals of evolutionary conservation as an indicator of functional constraint. Surprisingly, results of the ENCODE project revealed that about half of the experimentally verified functional elements found in non-coding DNA were classified as unconstrained by computational predictions. Following this observation, it has been hypothesized that this may be partly explained by biased estimates on neutral evolutionary rates used by existing sequence conservation metrics. All methods we are aware of rely on a comparison with the neutral rate and conservation is estimated by measuring the deviation of a particular genomic region from this rate. Consequently, it is a reasonable assumption that inaccurate neutral rate estimates may lead to biased conservation and constraint estimates.</p> <p>Results</p> <p>We propose a conservation signal that is produced by local Maximum Likelihood estimation of evolutionary parameters using an optimized sliding window and present a Kullback-Leibler projection that allows multiple different estimated parameters to be transformed into a conservation measure. This conservation measure does not rely on assumptions about neutral evolutionary substitution rates and little a priori assumptions on the properties of the conserved regions are imposed. We show the accuracy of our approach (KuLCons) on synthetic data and compare it to the scores generated by state-of-the-art methods (phastCons, GERP, SCONE) in an ENCODE region. We find that KuLCons is most often in agreement with the conservation/constraint signatures detected by GERP and SCONE while qualitatively very different patterns from phastCons are observed. Opposed to standard methods KuLCons can be extended to more complex evolutionary models, e.g. taking insertion and deletion events into account and corresponding results show that scores obtained under this model can diverge significantly from scores using the simpler model.</p> <p>Conclusion</p> <p>Our results suggest that discriminating among the different degrees of conservation is possible without making assumptions about neutral rates. We find, however, that it cannot be expected to discover considerably different constraint regions than GERP and SCONE. Consequently, we conclude that the reported discrepancies between experimentally verified functional and computationally identified constraint elements are likely not to be explained by biased neutral rate estimates.</p

    Systematic documentation and analysis of human genetic variation in hemoglobinopathies using the microattribution approach

    Get PDF
    We developed a series of interrelated locus-specific databases to store all published and unpublished genetic variation related to hemoglobinopathies and thalassemia and implemented microattribution to encourage submission of unpublished observations of genetic variation to these public repositories. A total of 1,941 unique genetic variants in 37 genes, encoding globins and other erythroid proteins, are currently documented in these databases, with reciprocal attribution of microcitations to data contributors. Our project provides the first example of implementing microattribution to incentivise submission of all known genetic variation in a defined system. It has demonstrably increased the reporting of human variants, leading to a comprehensive online resource for systematically describing human genetic variation in the globin genes and other genes contributing to hemoglobinopathies and thalassemias. The principles established here will serve as a model for other systems and for the analysis of other common and/or complex human genetic diseases

    Cataloguing functionally relevant polymorphisms in gene DNA ligase I: a computational approach

    Get PDF
    A computational approach for identifying functionally relevant SNPs in gene LIG1 has been proposed. LIG1 is a crucial gene which is involved in excision repair pathways and mutations in this gene may lead to increase sensitivity towards DNA damaging agents. A total of 792 SNPs were reported to be associated with gene LIG1 in dbSNP. Different web server namely SIFT, PolyPhen, CUPSAT, FASTSNP, MAPPER and dbSMR were used to identify potentially functional SNPs in gene LIG1. SIFT, PolyPhen and CUPSAT servers predicted eleven nsSNPs to be intolerant, thirteen nsSNP to be damaging and two nsSNPs have the potential to destabilize protein structure. The nsSNP rs11666150 was predicted to be damaging by all three servers and its mutant structure showed significant increase in overall energy. FASTSNP predicted twenty SNPs to be present in splicing modifier binding sites while rSNP module from MAPPER server predicted nine SNPs to influence the binding of transcription factors. The results from the study may provide vital clues in establishing affect of polymorphism on phenotype and in elucidating drug response
    corecore