223 research outputs found

    Using machine learning to detect the differential usage of novel gene isoforms

    Get PDF
    BACKGROUND: Differential isoform usage is an important driver of inter-individual phenotypic diversity and is linked to various diseases and traits. However, accurately detecting the differential usage of different gene transcripts between groups can be difficult, in particular in less well annotated genomes where the spectrum of transcript isoforms is largely unknown. RESULTS: We investigated whether machine learning approaches can detect differential isoform usage based purely on the distribution of reads across a gene region. We illustrate that gradient boosting and elastic net approaches can successfully identify large numbers of genes showing potential differential isoform usage between Europeans and Africans, that are enriched among relevant biological pathways and significantly overlap those identified by previous approaches. We demonstrate that diversity at the 3â€Č and 5â€Č ends of genes are primary drivers of these differences between populations. CONCLUSION: Machine learning methods can effectively detect differential isoform usage from read fraction data, and can provide novel insights into the biological differences between groups. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04576-3

    Chromatin structure and evolution in the human genome

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Evolutionary rates are not constant across the human genome but genes in close proximity have been shown to experience similar levels of divergence and selection. The higher-order organisation of chromosomes has often been invoked to explain such phenomena but previously there has been insufficient data on chromosome structure to investigate this rigorously. Using the results of a recent genome-wide analysis of open and closed human chromatin structures we have investigated the global association between divergence, selection and chromatin structure for the first time.</p> <p>Results</p> <p>In this study we have shown that, paradoxically, synonymous site divergence (dS) at non-CpG sites is highest in regions of open chromatin, primarily as a result of an increased number of transitions, while the rates of other traditional measures of mutation (intergenic, intronic and ancient repeat divergence as well as SNP density) are highest in closed regions of the genome. Analysis of human-chimpanzee divergence across intron-exon boundaries indicates that although genes in relatively open chromatin generally display little selection at their synonymous sites, those in closed regions show markedly lower divergence at their fourfold degenerate sites than in neighbouring introns and intergenic regions. Exclusion of known Exonic Splice Enhancer hexamers has little affect on the divergence observed at fourfold degenerate sites across chromatin categories; however, we show that closed chromatin is enriched with certain classes of ncRNA genes whose RNA secondary structure may be particularly important.</p> <p>Conclusion</p> <p>We conclude that, overall, non-CpG mutation rates are lowest in open regions of the genome and that regions of the genome with a closed chromatin structure have the highest background mutation rate. This might reflect lower rates of DNA damage or enhanced DNA repair processes in regions of open chromatin. Our results also indicate that dS is a poor measure of mutation rates, particularly when used in closed regions of the genome, as genes in closed regions generally display relatively strong levels of selection at their synonymous sites.</p

    The conservation of human functional variants and their effects across mammals

    Get PDF
    Despite the clear potential of livestock models of human functional variants to provide important insights into the biological mechanisms driving human diseases and traits, their use to date has been limited. Generating such models via genome editing is costly and time consuming, and it is unclear which variants will have conserved effects across species. In this study we address these issues by studying naturally occurring livestock models of human functional variants. We show that orthologues of over 1.6 million human variants are already segregating in domesticated mammalian species, including several hundred previously directly linked to human traits and diseases. Models of variants linked to particular phenotypes, including metabolomic disorders and height, are preferentially shared across species, meaning studying the genetic basis of these phenotypes is particularly tractable in livestock. Using machine learning we demonstrate it is possible to identify human variants that are more likely to have an existing livestock orthologue, and, importantly, we show that the effects of functional variants are often conserved in livestock, acting on orthologous genes with the same direction of effect. Consequently, this work demonstrates the substantial potential of naturally occurring livestock carriers of orthologues of human functional variants to disentangle their functional impacts

    Sequence level mechanisms of human epigenome evolution

    Get PDF
    DNA methylation and chromatin states play key roles in development and disease. However, the extent of recent evolutionary divergence in the human epigenome and the influential factors that have shaped it are poorly understood. To determine the links between genome sequence and human epigenome evolution, we examined the divergence of DNA methylation and chromatin states following segmental duplication events in the human lineage. Chromatin and DNA methylation states were found to have been generally well conserved following a duplication event, with the evolution of the epigenome largely uncoupled from the total number of genetic changes in the surrounding DNA sequence. However, the epigenome at tissue-specific, distal regulatory regions was observed to be unusually prone to diverge following duplication, with particular sequence differences, altering known sequence motifs, found to be associated with divergence in patterns of DNA methylation and chromatin. Alu elements were found to have played a particularly prominent role in shaping human epigenome evolution, and we show that human-specific AluY insertion events are strongly linked to the evolution of the DNA methylation landscape and gene expression levels, including at key neurological genes in the human brain. Studying paralogous regions within the same sample enables the study of the links between genome and epigenome evolution while controlling for biological and technical variation. We show DNA methylation and chromatin divergence between duplicated regions are linked to the divergence of particular genetic motifs, with Alu elements having played a disproportionate role in the evolution of the epigenome in the human lineage

    Sequencing and analysis of an Irish human genome

    Get PDF
    BACKGROUND: Recent studies generating complete human sequences from Asian, African and European subgroups have revealed population-specific variation and disease susceptibility loci. Here, choosing a DNA sample from a population of interest due to its relative geographical isolation and genetic impact on further populations, we extend the above studies through the generation of 11-fold coverage of the first Irish human genome sequence. RESULTS: Using sequence data from a branch of the European ancestral tree as yet unsequenced, we identify variants that may be specific to this population. Through comparisons with HapMap and previous genetic association studies, we identified novel disease-associated variants, including a novel nonsense variant putatively associated with inflammatory bowel disease. We describe a novel method for improving SNP calling accuracy at low genome coverage using haplotype information. This analysis has implications for future re-sequencing studies and validates the imputation of Irish haplotypes using data from the current Human Genome Diversity Cell Line Panel (HGDP-CEPH). Finally, we identify gene duplication events as constituting significant targets of recent positive selection in the human lineage. CONCLUSIONS: Our findings show that there remains utility in generating whole genome sequences to illustrate both general principles and reveal specific instances of human biology. With increasing access to low cost sequencing we would predict that even armed with the resources of a small research group a number of similar initiatives geared towards answering specific biological questions will emerge

    Clinical evaluation of Corridor disease in Bos indicus (Boran) cattle naturally infected with buffalo-derived Theileria parva

    Get PDF
    Corridor disease (CD) is a fatal condition of cattle caused by buffalo-derived Theileria parva. Unlike the related condition, East Coast fever, which results from infection with cattle-derived T. parva, CD has not been extensively studied. We describe in detail the clinical and laboratory findings in cattle naturally infected with buffalo-derived T. parva. Forty-six cattle were exposed to buffalo-derived T. parva under field conditions at the Ol Pejeta Conservancy, Kenya, between 2013 and 2018. The first signs of disease observed in all animals were nasal discharge (mean day of onset was 9 days post-exposure), enlarged lymph nodes (10 days post-exposure), and pyrexia (13.7 days post-exposure). Coughing and labored breathing were observed in more than 50% of animals (14 days post-exposure). Less commonly observed signs, corneal edema (22%) and diarrhea (11%), were observed later in the disease progression (19 days post-exposure). All infections were considered clinically severe, and 42 animals succumbed to infection. The mean time to death across all studies was 18.4 days. The mean time from onset of clinical signs to death was 9 days and from pyrexia to death was 4.8 days, indicating a relatively short duration of clinical illness. There were significant relationships between days to death and the days to first temperature (chi2 = 4.00, p = 0.046), and days to peak temperature (chi2 = 25.81, p = 0.001), animals with earlier onset pyrexia died sooner. These clinical indicators may be useful for assessing the severity of disease in the future. All infections were confirmed by the presence of macroschizonts in lymph node biopsies (mean time to parasitosis was 11 days). Piroplasms were detected in the blood of two animals (4%) and 20 (43%) animals seroconverted. In this study, we demonstrate the successful approach to an experimental field study for CD in cattle. We also describe the clinical progression of CD in naturally infected cattle, including the onset and severity of clinical signs and pathology. Laboratory diagnoses based on examination of blood samples are unreliable, and alternatives may not be available to cattle keepers. The rapid development of CD requires recognition of the clinical signs, which may be useful for early diagnosis of the disease and effective intervention for affected animals
    • 

    corecore