495 research outputs found
Doctor of Philosophy
dissertationEndogenous retroviruses (ERVs), derived from exogenous retroviruses (XRVs), comprise about 5 to 10 % of most mammalian genomes. We can study retroviral infection which originated millions years ago and understand long term evolution of infectious viruses by working on ERVs. At the same time, it has been suggested that multiple, new emerging viruses that infect human populations have been come from different bat species, and bats have become recognized as the reservoir of zoonotic viruses. However, we know little about retroviruses in bats. Here, we mined ERVs in the little brown bat genome, and found that the overall ERV amount in the little brown bat is comparable to other mammals. However, we still find hundreds of lineage-specific ERVs in the little brown bat genome. With identified bat ERVs, we subsequently investigated if there is any related retroviral cross-species transmission and independent endogenization. Using sequence homologous method to search bat ERV sequences against 107 available mammalian genomes, we found highly similar sequences in cat, tiger, and pangolin genomes in addition to related bat genomes. We found the ERV sequence is patchy distributed among mammalian lineages, and their high sequence similarity is incongruent with their host divergence. We also narrowed down the ERV insertion time to 10 to 20 million years ago. To understand how they evolved in different lineages, we investigated their evolution after integration in both bat and cat genomes. In the cat genome, the ERV lost its envelope domain and transformed to intracellular retrotransposon. While in the bat genome, multiple related infectious viruses became endogenized, and, at least in one lineage, the infectious capability has been maintained. Finally, I developed a computational pipeline and statistical framework which allows our method to be applied to the ERV population of virtually any species. When applied to 53 available vertebrate genomes, the approach identified ERVs previously known to have spread by reinfection in humans, mouse, and pig as well as additional ERV families carrying signature of recent infections in these and other species, including nonhuman primates, revealing their potential for zoonotic transmission
Amniotes co-opt intrinsic genetic instability to protect germ-line genome integrity
Unlike PIWI-interacting RNA (piRNA) in other species that mostly target transposable elements (TEs), \u3e80% of piRNAs in adult mammalian testes lack obvious targets. However, mammalian piRNA sequences and piRNA-producing loci evolve more rapidly than the rest of the genome for unknown reasons. Here, through comparative studies of chickens, ducks, mice, and humans, as well as long-read nanopore sequencing on diverse chicken breeds, we find that piRNA loci across amniotes experience: (1) a high local mutation rate of structural variations (SVs, mutations ≥ 50 bp in size); (2) positive selection to suppress young and actively mobilizing TEs commencing at the pachytene stage of meiosis during germ cell development; and (3) negative selection to purge deleterious SV hotspots. Our results indicate that genetic instability at pachytene piRNA loci, while producing certain pathogenic SVs, also protects genome integrity against TE mobilization by driving the formation of rapid-evolving piRNA sequences
Epigenomic differences in the human and chimpanzee genomes are associated with structural variation
Structural variation (SV), including insertions and deletions (indels), is a primary mechanism of genome evolution. However, the mechanism by which SV contributes to epigenome evolution is poorly understood. In this study, we characterized the association between lineage-specific indels and epigenome differences between human and chimpanzee to investigate how SVs might have shaped the epigenetic landscape. By intersecting medium-to-large human-chimpanzee indels (20bp-50kb) with putative promoters and enhancers in cranial neural crest cells (CNCC) and repressed regions in induced pluripotent cells (iPSC), we found that ~12% indels overlap putative regulatory and repressed regions (RRRs), and 15% of these indels are associated with lineage-biased RRRs. Indel-associated putative enhancer and repressive regions are ~1.3 and ~3 times as likely to be lineage-biased, respectively, as those not associated with indels. We found a 2-fold enrichment of medium-sized indels (20bp to 50bp) in CpG island (CGI)-containing promoters than expected by chance. Lastly, from human-specific transposable element insertions, we identified putative regulatory elements, including NR2F1-bound putative CNCC enhancers derived from SVAs and putative iPSC promoters derived from LTR5s. Our results demonstrate that different types of indels are associated with specific epigenomic diversity between human and chimpanzee
Regulatory transposable elements in the encyclopedia of DNA elements
Transposable elements (TEs) comprise ~50% of our genome, but knowledge of how TEs affect genome evolution remains incomplete. Leveraging ENCODE4 data, we provide the most comprehensive study to date of TE contributions to the regulatory genome. We find 236,181 (~25%) human candidate cis-regulatory elements (cCREs) are TE-derived, with over 90% lineage-specific since the human-mouse split, accounting for 8-36% of lineage-specific cCREs. Except for SINEs, cCRE-associated transcription factor (TF) motifs in TEs are derived from ancestral TE sequence more than expected by chance. We show that TEs may adopt similar regulatory activities of elements near their integration site. Since human-mouse divergence, TEs have contributed 3-56% of TF binding site turnover events across 30 examined TFs. Finally, TE-derived cCREs are similar to non-TE cCREs in terms of MPRA activity and GWAS variant enrichment. Overall, our results substantiate the notion that TEs have played an important role in shaping the human regulatory genome
Conserved and divergent gene regulatory programs of the mammalian neocortex
Divergence of cis-regulatory elements drives species-specific trait
LawBench: Benchmarking Legal Knowledge of Large Language Models
Large language models (LLMs) have demonstrated strong capabilities in various
aspects. However, when applying them to the highly specialized, safe-critical
legal domain, it is unclear how much legal knowledge they possess and whether
they can reliably perform legal-related tasks. To address this gap, we propose
a comprehensive evaluation benchmark LawBench. LawBench has been meticulously
crafted to have precise assessment of the LLMs' legal capabilities from three
cognitive levels: (1) Legal knowledge memorization: whether LLMs can memorize
needed legal concepts, articles and facts; (2) Legal knowledge understanding:
whether LLMs can comprehend entities, events and relationships within legal
text; (3) Legal knowledge applying: whether LLMs can properly utilize their
legal knowledge and make necessary reasoning steps to solve realistic legal
tasks. LawBench contains 20 diverse tasks covering 5 task types: single-label
classification (SLC), multi-label classification (MLC), regression, extraction
and generation. We perform extensive evaluations of 51 LLMs on LawBench,
including 20 multilingual LLMs, 22 Chinese-oriented LLMs and 9 legal specific
LLMs. The results show that GPT-4 remains the best-performing LLM in the legal
domain, surpassing the others by a significant margin. While fine-tuning LLMs
on legal specific text brings certain improvements, we are still a long way
from obtaining usable and reliable LLMs in legal tasks. All data, model
predictions and evaluation code are released in
https://github.com/open-compass/LawBench/. We hope this benchmark provides
in-depth understanding of the LLMs' domain-specified capabilities and speed up
the development of LLMs in the legal domain
Functional characterization of enhancer activity during a long terminal repeat\u27s evolution
Many transposable elements (TEs) contain transcription factor binding sites and are implicated as potential regulatory elements. However, TEs are rarely functionally tested for regulatory activity, which in turn limits our understanding of how TE regulatory activity has evolved. We systematically tested the human LTR18A subfamily for regulatory activity using massively parallel reporter assay (MPRA) and found AP-1- and CEBP-related binding motifs as drivers of enhancer activity. Functional analysis of evolutionarily reconstructed ancestral sequences revealed that LTR18A elements have generally lost regulatory activity over time through sequence changes, with the largest effects occurring owing to mutations in the AP-1 and CEBP motifs. We observed that the two motifs are conserved at higher rates than expected based on neutral evolution. Finally, we identified LTR18A elements as potential enhancers in the human genome, primarily in epithelial cells. Together, our results provide a model for the origin, evolution, and co-option of TE-derived regulatory elements
Comparing genomic and epigenomic features across species using the WashU Comparative Epigenome Browser
Genome browsers have become an intuitive and critical tool to visualize and analyze genomic features and data. Conventional genome browsers display data/annotations on a single reference genome/assembly; there are also genomic alignment viewer/browsers that help users visualize alignment, mismatch, and rearrangement between syntenic regions. However, there is a growing need for a comparative epigenome browser that can display genomic and epigenomic data sets across different species and enable users to compare them between syntenic regions. Here, we present the WashU Comparative Epigenome Browser. It allows users to load functional genomic data sets/annotations mapped to different genomes and display them over syntenic regions simultaneously. The browser also displays genetic differences between the genomes from single-nucleotide variants (SNVs) to structural variants (SVs) to visualize the association between epigenomic differences and genetic differences. Instead of anchoring all data sets to the reference genome coordinates, it creates independent coordinates of different genome assemblies to faithfully present features and data mapped to different genomes. It uses a simple, intuitive genome-align track to illustrate the syntenic relationship between different species. It extends the widely used WashU Epigenome Browser infrastructure and can be expanded to support multiple species. This new browser function will greatly facilitate comparative genomic/epigenomic research, as well as support the recent growing needs to directly compare and benchmark the T2T CHM13 assembly and other human genome assemblies
NSs, the Silencing Suppressor of Tomato Spotted Wilt Orthotospovirus, Interferes with JA-Regulated Host Terpenoids Expression to Attract \u3cem\u3eFrankliniella occidentalis\u3c/em\u3e
Tomato spotted wilt orthotospovirus (TSWV) causes serious crop losses worldwide and is transmitted by Frankliniella occidentalis (Pergande) (Thysanoptera: Thripidae). NSs protein is the silencing suppressor of TSWV and plays an important role in virus infection, cycling, and transmission process. In this research, we investigated the influences of NSs protein on the interaction of TSWV, plants, and F. occidentalis with the transgenic Arabidopsis thaliana. Compared with the wild-type Col-0 plant, F. occidentalis showed an increased number and induced feeding behavior on transgenic Arabidopsis thaliana expressing exogenous NSs. Further analysis showed that NSs reduced the expression of terpenoids synthesis-related genes and the content of monoterpene volatiles in Arabidopsis. These monoterpene volatiles played a repellent role in respect to F. occidentalis. In addition, the expression level of plant immune-related genes and the content of the plant resistance hormone jasmonic acid (JA) in transgenic Arabidopsis were reduced. The silencing suppressor of TSWV NSs alters the emission of plant volatiles and reduces the JA-regulated plant defenses, resulting in enhanced attractiveness of plants to F. occidentalis and may increase the transmission probability of TSWV
Mechanistic understanding of \u3ci\u3eN\u3c/i\u3e-glycosylation in Ebola virus glycoprotein maturation and function
The Ebola virus (EBOV) trimeric envelope glycoprotein (GP) precursors are cleaved into the receptor-binding GP1 and the fusion-mediating GP2 subunits and incorporated into virions to initiate infection. GP1 and GP2 form heterodimers that have 15 or two N-glycosylation sites (NGSs), respectively. Here we investigated the mechanism of how N-glycosylation contributes to GP expression, maturation, and function. As reported before, we found that, although GP1 NGSs are not critical, the two GP2 NGSs, Asn563 and Asn618, are essential for GP function. Further analysis uncovered that Asn563 and Asn618 regulate GP processing, demannosylation, oligomerization, and conformation. Consequently, these two NGSs are required for GP incorporation into EBOV-like particles and HIV type 1 (HIV-1) pseudovirions and determine viral transduction efficiency. Using CRISPR/Cas9 technology, we knocked out the two classical endoplasmic reticulum chaperones calnexin (CNX) and/or calreticulin (CRT) and found that bothCNXand CRT increase GP expression. Nevertheless, NGSs are not required for the GP interaction with CNX or CRT. Together, we conclude that, although Asn563 and Asn618 are not required for EBOV GP expression, they synergistically regulate its maturation, which determines its functionality
- …