8 research outputs found

    Partial correlation analysis indicates causal relationships between GC-content, exon density and recombination rate in the human genome

    Get PDF
    {\bf Background}: Several features are known to correlate with the GC-content in the human genome, including recombination rate, gene density and distance to telomere. However, by testing for pairwise correlation only, it is impossible to distinguish direct associations from indirect ones and to distinguish between causes and effects. {\bf Results}: We use partial correlations to construct partially directed graphs for the following four variables: GC-content, recombination rate, exon density and distance-to-telomere. Recombination rate and exon density are unconditionally uncorrelated, but become inversely correlated by conditioning on GC-content. This pattern indicates a model where recombination rate and exon density are two independent causes of GC-content variation. {\bf Conclusions}: Causal inference and graphical models are useful methods to understand genome evolution and the mechanisms of isochore evolution in the human genome

    Mapping Recombination Rate on the Autosomal Chromosomes Based on the Persistency of Linkage Disequilibrium Phase Among Autochthonous Beef Cattle Populations in Spain

    Get PDF
    In organisms with sexual reproduction, genetic diversity, and genome evolution are governed by meiotic recombination caused by crossing-over, which is known to vary within the genome. In this study, we propose a simple method to estimate the recombination rate that makes use of the persistency of linkage disequilibrium (LD) phase among closely related populations. The biological material comprised 171 triplets (sire/dam/offspring) from seven populations of autochthonous beef cattle in Spain (Asturiana de los Valles, Avileña-Negra Ibérica, Bruna dels Pirineus, Morucha, Pirenaica, Retinta, and Rubia Gallega), which were genotyped for 777, 962 SNPs with the BovineHD BeadChip. After standard quality filtering, we reconstructed the haplotype phases in the parental individuals and calculated the LD by the correlation -r- between each pair of markers that had a genetic distance < 1 Mb. Subsequently, these correlations were used to calculate the persistency of LD phase between each pair of populations along the autosomal genome. Therefore, the distribution of the recombination rate along the genome can be inferred since the effect of the number of generations of divergence should be equivalent throughout the genome. In our study, the recombination rate was highest in the largest chromosomes and at the distal portion of the chromosomes. In addition, the persistency of LD phase was highly heterogeneous throughout the genome, with a ratio of 25.4 times between the estimates of the recombination rates from the genomic regions that had the highest (BTA18-7.1 Mb) and the lowest (BTA12-42.4 Mb) estimates. Finally, an overrepresentation enrichment analysis (ORA) showed differences in the enriched gene ontology (GO) terms between the genes located in the genomic regions with estimates of the recombination rate over (or below) the 95th (or 5th) percentile throughout the autosomal genome

    Fine-Scale Population Recombination Rates, Hotspots, and Correlates of Recombination in the Medicago truncatula Genome

    Get PDF
    Recombination rates vary across the genome and in many species show significant relationships with several genomic features, including distance to the centromere, gene density, and GC content. Studies of fine-scale recombination rates have also revealed that in several species, there are recombination hotspots, that is, short regions with recombination rates 10–100 greater than those in surrounding regions. In this study, we analyzed whole-genome resequence data from 26 accessions of the model legume Medicago truncatula to gain insight into the genomic features that are related to high- and low-recombination rates and recombination hotspots at 1 kb scales. We found that high-recombination regions (1-kb windows among those in the highest 5% of the distribution) on all three chromosomes were significantly closer to the centromere, had higher gene density, and lower GC content than low-recombination windows. High-recombination windows are also significantly overrepresented among some gene functional categories—most strongly NB–ARC and LRR genes, both of which are important in plant defense against pathogens. Similar to high-recombination windows, recombination hotspots (1-kb windows with significantly higher recombination than the surrounding region) are significantly nearer to the centromere than nonhotspot windows. By contrast, we detected no difference in gene density or GC content between hotspot and nonhotspot windows. Using linear model wavelet analysis to examine the relationship between recombination and genomic features across multiple spatial scales, we find a significant negative correlation with distance to the centromere across scales up to 512 kb, whereas gene density and GC content show significantly positive and negative correlations, respectively, only up to 64 kb. Correlations between recombination and genomic features, particularly gene density and polymorphism, suggest that they are scale dependent and need to be assessed at scales relevant to the evolution of those features

    Mapping recombination rate on the autosomal chromosomes based on the persistency of linkage disequilibrium phase among autochthonous beef cattle populations in Spain

    Get PDF
    Mouresan, Elena Flavia. Universidad de Zaragoza. Departamento de Anatomía, Embriología y Genética Animal. Zaragoza, España.González Rodríguez, Aldemar. Universidad de Zaragoza. Departamento de Anatomía, Embriología y Genética Animal. Zaragoza, España.Cañas Alvarez, Jhon J. Universitat Autònoma de Barcelona. Departament de Ciència Animal i dels Aliments. Barcelona, España.Munilla Leguizamón, Sebastián. Universidad de Zaragoza. Departamento de Anatomía, Embriología y Genética Animal. Zaragoza, España.Altarriba, Juan. Universidad de Zaragoza. Departamento de Anatomía, Embriología y Genética Animal. Zaragoza, España.Díaz, Clara. Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA). Departamento de Mejora Genética Animal. Madrid, España.Baro, Jesus A. Instituto Agroalimentario de Aragón (IA2). Zaragoza, España.Molina, Antonio. Universidad de Valladolid. Departamento de Ciencias Agroforestales. Valladolid, España.12In organisms with sexual reproduction, genetic diversity, and genome evolution are governed by meiotic recombination caused by crossing-over, which is known to vary within the genome. In this study, we propose a simple method to estimate the recombination rate that makes use of the persistency of linkage disequilibrium (LD) phase among closely related populations. The biological material comprised 171 triplets (sire/ dam/offspring) from seven populations of autochthonous beef cattle in Spain (Asturiana de los Valles, Avileña-Negra Ibérica, Bruna dels Pirineus, Morucha, Pirenaica, Retinta, and Rubia Gallega), which were genotyped for 777,962 SNPs with the BovineHD BeadChip. After standard quality filtering, we reconstructed the haplotype phases in the parental individuals and calculated the LD by the correlation -r- between each pair of markers that had a genetic distance smaller than 1 Mb. Subsequently, these correlations were used to calculate the persistency of LD phase between each pair of populations along the autosomal genome. Therefore, the distribution of the recombination rate along the genome can be inferred since the effect of the number of generations of divergence should be equivalent throughout the genome. In our study, the recombination rate was highest in the largest chromosomes and at the distal portion of the chromosomes. In addition, the persistency of LD phase was highly heterogeneous throughout the genome, with a ratio of 25.4 times between the estimates of the recombination rates from the genomic regions that had the highest (BTA18-7.1 Mb) and the lowest (BTA12- 42.4 Mb) estimates. Finally, an over representation enrichment analysis (ORA) showed differences in the enriched gene ontology (GO) terms between the genes located in the genomic regions with estimates of the recombination rate over (or below) the 95th (or 5th) percentile throughout the autosomal genome

    Conditional Distance Correlation Test for Gene Expression Level, DNA Methylation Level and Copy Number

    Get PDF
    Over the past years, efforts have been devoted to the genome-wide analysis of genetic and epigenetic profiles to better understand the underlying biological mechanisms of complex diseases such as cancer. It is of great importance to unravel the complex dependence structure between biological factors, and many conditional dependence tests have been developed to meet this need. The traditional partial correlation method can only capture the linear partial correlation, but not the nonlinear correlation. To overcome this limitation, we propose to use the innovative conditional distance correlation (CDC), which measures the conditional dependence between random vectors and detect nonlinear relations. In this thesis, the CDC measure is applied to the rich Cancer Genome Atlas (TCGA) ovarian cancer data, and we identify a list of interesting genes with nonlinear features. We integrate three important types of molecular features including gene expression, DNA methylation and copy number variation, and implement the partial correlation test and CDC test to infer the relations between the three measurements for each gene. Out of 196 candidate oncogenes and tumor suppressors, we identify 19 genes in which two of the molecular features are nonlinearly dependent given the third variable. Of these 19 genes, many were reported to be associated with ovarian cancer or breast cancer in the literature. Our findings could shed new light on the biological relations between the three important molecular aspects. This thesis is structured as follows: we begin with a brief introduction to ovarian cancer, TCGA data, the three molecular measurements, and two testing methods in Chapter 1. In the second chapter, we review different statistical methods including Pearson’s partial correlation and conditional distance correlation. In Chapter 3, we conduct an extensive simulation study to compare the empirical performance of different methods. In Chapter 4, we apply the new method to the TCGA ovarian data. We conclude the thesis with future directions in Chapter 5

    Global target mRNA specification and regulation by the RNA-binding protein ZFP36

    Get PDF
    BACKGROUND: ZFP36, also known as tristetraprolin or TTP, and ELAVL1, also known as HuR, are two disease-relevant RNA-binding proteins (RBPs) that both interact with AU-rich sequences but have antagonistic roles. While ELAVL1 binding has been profiled in several studies, the precise in vivo binding specificity of ZFP36 has not been investigated on a global scale. We determined ZFP36 binding preferences using cross-linking and immunoprecipitation in human embryonic kidney cells, and examined the combinatorial regulation of AU-rich elements by ZFP36 and ELAVL1. RESULTS: Targets bound and negatively regulated by ZFP36 include transcripts encoding proteins necessary for immune function and cancer, and transcripts encoding other RBPs. Using partial correlation analysis, we were able to quantify the association between ZFP36 binding sites and differential target RNA abundance upon ZFP36 overexpression independent of effects from confounding features. Genes with increased mRNA half-lives in ZFP36 knockout versus wild-type mouse cells were significantly enriched for our human ZFP36 targets. We identified thousands of overlapping ZFP36 and ELAVL1 binding sites, in 1,313 genes, and found that ZFP36 degrades transcripts through specific AU-rich sequences, representing a subset of the U-rich sequences ELAVL1 interacts with to stabilize transcripts. CONCLUSIONS: ZFP36-RNA target specificities in vivo are quantitatively similar to previously reported in vitro binding affinities. ZFP36 and ELAVL1 bind an overlapping spectrum of RNA sequences, yet with differential relative preferences that dictate combinatorial regulatory potential. Our findings and methodology delineate an approach to unravel in vivo combinatorial regulation by RNA-binding proteins

    A computational characterisation of the relationship between genome structure and disease genes

    Get PDF
    >Magister Scientiae - MScThis is a pilot study to investigate the relationship between disease gene status and the structure of the human genome with specific reference to regions of recombination. It compares certain characteristics of a control set of genes, with no reported association or function in any known disease, with a second set of well-curated genes with a known association to a disease. One of the benefits of recombination is the introduction of new combinations of genetic variation in the genome. Recombination hotspots are regions on the chromosome where higher than normal frequencies of breaking and rejoining between homologous chromosomes occur during meiosis. The hotspot regions exhibit both a non-random distribution across the human genome and varying frequencies of breaking and rejoining. The study analyzed a set of features that represent general properties of human genes; namely base composition (percentage GC content), genetic variation (single nucleotide polymorphisms - SNPs), gene length, and positional effect (distance from chromosome end), in both the disease-associated gene set and the control set. These features were linked to recombination hotspots in the human genome and the frequency of recombination at these hotspots. Descriptive statistics was used to determine differences between the occurrences of these features in disease-associated genes compared to the control set, as well as differences in the occurrence of these same features in subset of genes containing an internal recombination hotspot compared to the genes with no internal recombination hotspot. The study found that disease-associated genes are generally longer than those in the control set, which is consistent with previous studies. It also found that disease-associated genes are much more likely to contain a recombination hotspot than those genes with no disease association. The study did not, however, find any association between disease gene status and the other set of features; namely GC content, SNP numbers or the position of a gene on the chromosome. Further analysis of the data suggested that the increased probability of disease-associated genes containing a recombination hotspot is most likely an effect of longer gene length and that the presence of a recombination hotspot is not sufficient in its own right to cause disease gene status

    Isolation and Genomic Analysis of the Cetacean Y-chromosome

    Get PDF
    The male-specific mammalian Y-chromosome represents a powerful tool for studying malemediated gene flow and genome evolution. Here it was possible to identify 7 polymorphic microsatellites for the first time in an odontocete species, using a combination of cell culture, cytogenetics and molecular approaches. Initially, the development of an efficient and repeatable methodology for obtaining a growing lymphocyte culture that facilitated the isolation of individual chromosomes is described. Flow karyotypic characterization and isolation of individual chromosomes via flow sorting or microdissection is reported for the killer whale (Orcinus orca). Microdissected Y-chromosomes from the killer whale and bottlenose dolphin (Tursiops truncatus) were screened for sequences containing microsatellite motifs. 15 and 10 male-specific microsatellites were identified from the killer whale and bottlenose dolphin, respectively. Additional microsatellite loci were identified from previously published fin whale Y-chromosome sequence. 6 markers designed from heterologous sequences amplified from sperm whales (Physeter macrocephalus), were also screened for variation. All 31 markers were monomorphic in the bottlenose dolphin, only 2 loci showed 2 variants in the killer whale and 7 were polymorphic in the sperm whale. In addition 162 anonymous regions of the Y-chromosome, isolated from the delphinid species were used to characterize the comparative composition of the ‘Y’ relative to the autosomes in these species. Characteristics are discussed in the context of the genome as a whole, species-specific history and with reference to the expected patterns of mammalian Y-chromosome evolution
    corecore