495 research outputs found

    Doctor of Philosophy

    Get PDF
    dissertationEndogenous retroviruses (ERVs), derived from exogenous retroviruses (XRVs), comprise about 5 to 10 % of most mammalian genomes. We can study retroviral infection which originated millions years ago and understand long term evolution of infectious viruses by working on ERVs. At the same time, it has been suggested that multiple, new emerging viruses that infect human populations have been come from different bat species, and bats have become recognized as the reservoir of zoonotic viruses. However, we know little about retroviruses in bats. Here, we mined ERVs in the little brown bat genome, and found that the overall ERV amount in the little brown bat is comparable to other mammals. However, we still find hundreds of lineage-specific ERVs in the little brown bat genome. With identified bat ERVs, we subsequently investigated if there is any related retroviral cross-species transmission and independent endogenization. Using sequence homologous method to search bat ERV sequences against 107 available mammalian genomes, we found highly similar sequences in cat, tiger, and pangolin genomes in addition to related bat genomes. We found the ERV sequence is patchy distributed among mammalian lineages, and their high sequence similarity is incongruent with their host divergence. We also narrowed down the ERV insertion time to 10 to 20 million years ago. To understand how they evolved in different lineages, we investigated their evolution after integration in both bat and cat genomes. In the cat genome, the ERV lost its envelope domain and transformed to intracellular retrotransposon. While in the bat genome, multiple related infectious viruses became endogenized, and, at least in one lineage, the infectious capability has been maintained. Finally, I developed a computational pipeline and statistical framework which allows our method to be applied to the ERV population of virtually any species. When applied to 53 available vertebrate genomes, the approach identified ERVs previously known to have spread by reinfection in humans, mouse, and pig as well as additional ERV families carrying signature of recent infections in these and other species, including nonhuman primates, revealing their potential for zoonotic transmission

    Amniotes co-opt intrinsic genetic instability to protect germ-line genome integrity

    Get PDF
    Unlike PIWI-interacting RNA (piRNA) in other species that mostly target transposable elements (TEs), \u3e80% of piRNAs in adult mammalian testes lack obvious targets. However, mammalian piRNA sequences and piRNA-producing loci evolve more rapidly than the rest of the genome for unknown reasons. Here, through comparative studies of chickens, ducks, mice, and humans, as well as long-read nanopore sequencing on diverse chicken breeds, we find that piRNA loci across amniotes experience: (1) a high local mutation rate of structural variations (SVs, mutations ≥ 50 bp in size); (2) positive selection to suppress young and actively mobilizing TEs commencing at the pachytene stage of meiosis during germ cell development; and (3) negative selection to purge deleterious SV hotspots. Our results indicate that genetic instability at pachytene piRNA loci, while producing certain pathogenic SVs, also protects genome integrity against TE mobilization by driving the formation of rapid-evolving piRNA sequences

    Epigenomic differences in the human and chimpanzee genomes are associated with structural variation

    Get PDF
    Structural variation (SV), including insertions and deletions (indels), is a primary mechanism of genome evolution. However, the mechanism by which SV contributes to epigenome evolution is poorly understood. In this study, we characterized the association between lineage-specific indels and epigenome differences between human and chimpanzee to investigate how SVs might have shaped the epigenetic landscape. By intersecting medium-to-large human-chimpanzee indels (20bp-50kb) with putative promoters and enhancers in cranial neural crest cells (CNCC) and repressed regions in induced pluripotent cells (iPSC), we found that ~12% indels overlap putative regulatory and repressed regions (RRRs), and 15% of these indels are associated with lineage-biased RRRs. Indel-associated putative enhancer and repressive regions are ~1.3 and ~3 times as likely to be lineage-biased, respectively, as those not associated with indels. We found a 2-fold enrichment of medium-sized indels (20bp to 50bp) in CpG island (CGI)-containing promoters than expected by chance. Lastly, from human-specific transposable element insertions, we identified putative regulatory elements, including NR2F1-bound putative CNCC enhancers derived from SVAs and putative iPSC promoters derived from LTR5s. Our results demonstrate that different types of indels are associated with specific epigenomic diversity between human and chimpanzee

    Regulatory transposable elements in the encyclopedia of DNA elements

    Get PDF
    Transposable elements (TEs) comprise ~50% of our genome, but knowledge of how TEs affect genome evolution remains incomplete. Leveraging ENCODE4 data, we provide the most comprehensive study to date of TE contributions to the regulatory genome. We find 236,181 (~25%) human candidate cis-regulatory elements (cCREs) are TE-derived, with over 90% lineage-specific since the human-mouse split, accounting for 8-36% of lineage-specific cCREs. Except for SINEs, cCRE-associated transcription factor (TF) motifs in TEs are derived from ancestral TE sequence more than expected by chance. We show that TEs may adopt similar regulatory activities of elements near their integration site. Since human-mouse divergence, TEs have contributed 3-56% of TF binding site turnover events across 30 examined TFs. Finally, TE-derived cCREs are similar to non-TE cCREs in terms of MPRA activity and GWAS variant enrichment. Overall, our results substantiate the notion that TEs have played an important role in shaping the human regulatory genome

    Conserved and divergent gene regulatory programs of the mammalian neocortex

    Get PDF
    Divergence of cis-regulatory elements drives species-specific trait

    LawBench: Benchmarking Legal Knowledge of Large Language Models

    Full text link
    Large language models (LLMs) have demonstrated strong capabilities in various aspects. However, when applying them to the highly specialized, safe-critical legal domain, it is unclear how much legal knowledge they possess and whether they can reliably perform legal-related tasks. To address this gap, we propose a comprehensive evaluation benchmark LawBench. LawBench has been meticulously crafted to have precise assessment of the LLMs' legal capabilities from three cognitive levels: (1) Legal knowledge memorization: whether LLMs can memorize needed legal concepts, articles and facts; (2) Legal knowledge understanding: whether LLMs can comprehend entities, events and relationships within legal text; (3) Legal knowledge applying: whether LLMs can properly utilize their legal knowledge and make necessary reasoning steps to solve realistic legal tasks. LawBench contains 20 diverse tasks covering 5 task types: single-label classification (SLC), multi-label classification (MLC), regression, extraction and generation. We perform extensive evaluations of 51 LLMs on LawBench, including 20 multilingual LLMs, 22 Chinese-oriented LLMs and 9 legal specific LLMs. The results show that GPT-4 remains the best-performing LLM in the legal domain, surpassing the others by a significant margin. While fine-tuning LLMs on legal specific text brings certain improvements, we are still a long way from obtaining usable and reliable LLMs in legal tasks. All data, model predictions and evaluation code are released in https://github.com/open-compass/LawBench/. We hope this benchmark provides in-depth understanding of the LLMs' domain-specified capabilities and speed up the development of LLMs in the legal domain

    Functional characterization of enhancer activity during a long terminal repeat\u27s evolution

    Get PDF
    Many transposable elements (TEs) contain transcription factor binding sites and are implicated as potential regulatory elements. However, TEs are rarely functionally tested for regulatory activity, which in turn limits our understanding of how TE regulatory activity has evolved. We systematically tested the human LTR18A subfamily for regulatory activity using massively parallel reporter assay (MPRA) and found AP-1- and CEBP-related binding motifs as drivers of enhancer activity. Functional analysis of evolutionarily reconstructed ancestral sequences revealed that LTR18A elements have generally lost regulatory activity over time through sequence changes, with the largest effects occurring owing to mutations in the AP-1 and CEBP motifs. We observed that the two motifs are conserved at higher rates than expected based on neutral evolution. Finally, we identified LTR18A elements as potential enhancers in the human genome, primarily in epithelial cells. Together, our results provide a model for the origin, evolution, and co-option of TE-derived regulatory elements

    Comparing genomic and epigenomic features across species using the WashU Comparative Epigenome Browser

    Get PDF
    Genome browsers have become an intuitive and critical tool to visualize and analyze genomic features and data. Conventional genome browsers display data/annotations on a single reference genome/assembly; there are also genomic alignment viewer/browsers that help users visualize alignment, mismatch, and rearrangement between syntenic regions. However, there is a growing need for a comparative epigenome browser that can display genomic and epigenomic data sets across different species and enable users to compare them between syntenic regions. Here, we present the WashU Comparative Epigenome Browser. It allows users to load functional genomic data sets/annotations mapped to different genomes and display them over syntenic regions simultaneously. The browser also displays genetic differences between the genomes from single-nucleotide variants (SNVs) to structural variants (SVs) to visualize the association between epigenomic differences and genetic differences. Instead of anchoring all data sets to the reference genome coordinates, it creates independent coordinates of different genome assemblies to faithfully present features and data mapped to different genomes. It uses a simple, intuitive genome-align track to illustrate the syntenic relationship between different species. It extends the widely used WashU Epigenome Browser infrastructure and can be expanded to support multiple species. This new browser function will greatly facilitate comparative genomic/epigenomic research, as well as support the recent growing needs to directly compare and benchmark the T2T CHM13 assembly and other human genome assemblies

    NSs, the Silencing Suppressor of Tomato Spotted Wilt Orthotospovirus, Interferes with JA-Regulated Host Terpenoids Expression to Attract \u3cem\u3eFrankliniella occidentalis\u3c/em\u3e

    Get PDF
    Tomato spotted wilt orthotospovirus (TSWV) causes serious crop losses worldwide and is transmitted by Frankliniella occidentalis (Pergande) (Thysanoptera: Thripidae). NSs protein is the silencing suppressor of TSWV and plays an important role in virus infection, cycling, and transmission process. In this research, we investigated the influences of NSs protein on the interaction of TSWV, plants, and F. occidentalis with the transgenic Arabidopsis thaliana. Compared with the wild-type Col-0 plant, F. occidentalis showed an increased number and induced feeding behavior on transgenic Arabidopsis thaliana expressing exogenous NSs. Further analysis showed that NSs reduced the expression of terpenoids synthesis-related genes and the content of monoterpene volatiles in Arabidopsis. These monoterpene volatiles played a repellent role in respect to F. occidentalis. In addition, the expression level of plant immune-related genes and the content of the plant resistance hormone jasmonic acid (JA) in transgenic Arabidopsis were reduced. The silencing suppressor of TSWV NSs alters the emission of plant volatiles and reduces the JA-regulated plant defenses, resulting in enhanced attractiveness of plants to F. occidentalis and may increase the transmission probability of TSWV

    Mechanistic understanding of \u3ci\u3eN\u3c/i\u3e-glycosylation in Ebola virus glycoprotein maturation and function

    Get PDF
    The Ebola virus (EBOV) trimeric envelope glycoprotein (GP) precursors are cleaved into the receptor-binding GP1 and the fusion-mediating GP2 subunits and incorporated into virions to initiate infection. GP1 and GP2 form heterodimers that have 15 or two N-glycosylation sites (NGSs), respectively. Here we investigated the mechanism of how N-glycosylation contributes to GP expression, maturation, and function. As reported before, we found that, although GP1 NGSs are not critical, the two GP2 NGSs, Asn563 and Asn618, are essential for GP function. Further analysis uncovered that Asn563 and Asn618 regulate GP processing, demannosylation, oligomerization, and conformation. Consequently, these two NGSs are required for GP incorporation into EBOV-like particles and HIV type 1 (HIV-1) pseudovirions and determine viral transduction efficiency. Using CRISPR/Cas9 technology, we knocked out the two classical endoplasmic reticulum chaperones calnexin (CNX) and/or calreticulin (CRT) and found that bothCNXand CRT increase GP expression. Nevertheless, NGSs are not required for the GP interaction with CNX or CRT. Together, we conclude that, although Asn563 and Asn618 are not required for EBOV GP expression, they synergistically regulate its maturation, which determines its functionality
    corecore