73 research outputs found
Genomic Studies of Gene Expression Errors and Their Evolutionary Ramifications
Gene expression produces biologically functional RNAs and proteins and is essential for life. Nevertheless, gene expression is subject to several types of errors that are generally harmful. Despite the prevalence and significant consequences of expression errors, their genome-wide patterns are not well characterized. Furthermore, the evolutionary ramifications of such errors are poorly understood. In my dissertation, I address the above questions using novel computational approaches. I focus on two types of gene expression errors: (i) stochastic gene expression, which leads to a variation of the expression level among isogenic cells in the same environment (gene expression noise), and (ii) mistranslation, which induces protein misfolding and can be toxic to the cells.
My thesis has three main chapters in addition to the introduction and conclusion chapters. First, in Chapter 2, I studied gene expression noises of individual genes. I decomposed noises of 3975 mouse genes into intrinsic noise and extrinsic noises and studied their biological mechanisms and evolution consequences. Next, in Chapter 3, I move forward to consider gene expression noises for pairs of genes simultaneously. I discovered chromosome-wide co-fluctuation in expression for linked genes, which is partly due to chromatin co-accessibilities of linked loci attributable to three-dimensional proximity. I further found that genes encoding components of the same protein complex are more likely to become linked during evolution due to natural selection for intracellular among-component dosage balance. Thus, selection for mitigating the harm of expression noise drives the nonrandom genomic distributions of genes. Finally, in Chapter 4, I studied yet another kind of expression error: mistranslation. I focused on the relationship between mistranslation and codon usage. Specifically, I provide the first direct and global evidence for a prominent but unresolved hypothesis: preferred codons are translated more accurately. Furthermore, I showed that this proposition is generally true across three domains of life. Interestingly, the relative translational accuracies of synonymous codons vary drastically among species, which is mainly explained by the variation of tRNA compositions. Together with other information, these findings suggest that codon usage coevolves with the cellular tRNA pool to maximize translational accuracy and efficiency.
In conclusion, my dissertation documents the genome-wide patterns of gene expression errors and demonstrates their profound impacts on both molecular and phenotypic evolution. The knowledge gained has implications beyond expression errors because of the universality of molecular errors in cellular life.PHDEcology and Evolutionary BiologyUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169993/1/mengysun_1.pd
DoDo-Code: a Deep Levenshtein Distance Embedding-based Code for IDS Channel and DNA Storage
Recently, DNA storage has emerged as a promising data storage solution,
offering significant advantages in storage density, maintenance cost
efficiency, and parallel replication capability. Mathematically, the DNA
storage pipeline can be viewed as an insertion, deletion, and substitution
(IDS) channel. Because of the mathematical terra incognita of the Levenshtein
distance, designing an IDS-correcting code is still a challenge. In this paper,
we propose an innovative approach that utilizes deep Levenshtein distance
embedding to bypass these mathematical challenges. By representing the
Levenshtein distance between two sequences as a conventional distance between
their corresponding embedding vectors, the inherent structural property of
Levenshtein distance is revealed in the friendly embedding space. Leveraging
this embedding space, we introduce the DoDo-Code, an IDS-correcting code that
incorporates deep embedding of Levenshtein distance, deep embedding-based
codeword search, and deep embedding-based segment correcting. To address the
requirements of DNA storage, we also present a preliminary algorithm for long
sequence decoding. As far as we know, the DoDo-Code is the first IDS-correcting
code designed using plausible deep learning methodologies, potentially paving
the way for a new direction in error-correcting code research. It is also the
first IDS code that exhibits characteristics of being `optimal' in terms of
redundancy, significantly outperforming the mainstream IDS-correcting codes of
the Varshamov-Tenengolts code family in code rate
Levenshtein Distance Embedding with Poisson Regression for DNA Storage
Efficient computation or approximation of Levenshtein distance, a widely-used
metric for evaluating sequence similarity, has attracted significant attention
with the emergence of DNA storage and other biological applications. Sequence
embedding, which maps Levenshtein distance to a conventional distance between
embedding vectors, has emerged as a promising solution. In this paper, a novel
neural network-based sequence embedding technique using Poisson regression is
proposed. We first provide a theoretical analysis of the impact of embedding
dimension on model performance and present a criterion for selecting an
appropriate embedding dimension. Under this embedding dimension, the Poisson
regression is introduced by assuming the Levenshtein distance between sequences
of fixed length following a Poisson distribution, which naturally aligns with
the definition of Levenshtein distance. Moreover, from the perspective of the
distribution of embedding distances, Poisson regression approximates the
negative log likelihood of the chi-squared distribution and offers advancements
in removing the skewness. Through comprehensive experiments on real DNA storage
data, we demonstrate the superior performance of the proposed method compared
to state-of-the-art approaches
CDSD: Chinese Dysarthria Speech Database
We present the Chinese Dysarthria Speech Database (CDSD) as a valuable
resource for dysarthria research. This database comprises speech data from 24
participants with dysarthria. Among these participants, one recorded an
additional 10 hours of speech data, while each recorded one hour, resulting in
34 hours of speech material. To accommodate participants with varying cognitive
levels, our text pool primarily consists of content from the AISHELL-1 dataset
and speeches by primary and secondary school students. When participants read
these texts, they must use a mobile device or the ZOOM F8n multi-track field
recorder to record their speeches. In this paper, we elucidate the data
collection and annotation processes and present an approach for establishing a
baseline for dysarthric speech recognition. Furthermore, we conducted a
speaker-dependent dysarthric speech recognition experiment using an additional
10 hours of speech data from one of our participants. Our research findings
indicate that, through extensive data-driven model training, fine-tuning
limited quantities of specific individual data yields commendable results in
speaker-dependent dysarthric speech recognition. However, we observe
significant variations in recognition results among different dysarthric
speakers. These insights provide valuable reference points for
speaker-dependent dysarthric speech recognition.Comment: 9 pages, 3 figure
Promoting reading comprehension and critical–analytic thinking: A comparison of three approaches with fourth and fifth graders
Comprehending and critically analyzing complex, content-rich text is an essential requirement of academic excellence as well as a life-long skill for students. Unfortunately, students often struggle to comprehend print and digital media, and subsequently, they are unable to complete essential tasks, such as identifying information, making inferences, examining arguments, or vetting sources. In the present study, we compared the effectiveness of three reading interventions (i.e., Quality Talk (QT), Think before reading, think While reading, think After reading (TWA), and TWA/QT Hybrid) in promoting fourth- and fifth-grade students’ reading comprehension and critical–analytic thinking. Specifically, teachers in each intervention delivered the respective instructional mini-lessons in their language arts classes and conducted weekly text-based discussions. The results suggested that the Hybrid and QT interventions were effective at promoting high-level comprehension among fourth- and fifth-grade students. Evidence supported that students participating in the Hybrid and QT interventions engaged in more critical–analytic thinking during text-based discussions than those who received the TWA intervention, as evidenced by statistically significantly greater numbers of student-generated authentic questions and elaborated explanations. The Hybrid and QT interventions were also found to effectively boost students’ oral reading fluency in both grades across two phases. Moreover, fifth-grade students who participated in the Hybrid intervention outperformed their peers from the TWA group on the post-discussion reading comprehension assessments
Annealing novel nucleobase-lipids with oligonucleotides or plasmid DNA based on H-bonding or π-π interaction:Assemblies and transfections
Lipid derivatives of nucleoside analogs have been highlighted for their potential for effective gene delivery. A novel class of nucleobase-lipids are rationally designed and readily synthesized, comprising thymine/cytosine, an ester/amide linker and an oleyl lipid. The diversity of four nucleobase-lipids termed DXBAs (DOTA, DNTA, DOCA and DNCA) is investigated. Besides, DNCA is demonstrated to be an effective neutral transfection material for nucleic acid delivery, which enbles to bind to oligonucleotides via H-bonding and π-π stacking with reduced toxicity in vitro and in vivo. Several kinds of nucleic acid drugs including aptamer, ssRNA, antisense oligonucleotide, and plasmid DNAs can be delivered by DXBAs, especially DNCA. In particular, G4-aptamer AS1411 encapsulated by DNCA exhibits cellular uptake enhancement, lysosome degradation reduction, cell apoptosis promotion, cell cycle phase alteration in vitro and duration prolongation in vivo, resulting in significant anti-proliferative activity. Our results demonstrate that DNCA is a promising transfection agent for G4-aptamers and exhibites bright application prospects in the permeation improvement of single-stranded oligonucleotides or plasmid DNAs
Integrated rocksalt–polyanion cathodes with excess lithium and stabilized cycling
Co- and Ni-free disordered rocksalt cathodes utilize oxygen redox to increase the energy density of lithium-ion batteries, but it is challenging to achieve good cycle life at high voltages >4.5 V (versus Li/Li+). Here we report a family of Li-excess Mn-rich cathodes that integrates rocksalt- and polyanion-type structures. Following design rules for cation filling and ordering, we demonstrate the bulk incorporation of polyanion groups into the rocksalt lattice. This integration bridges the two primary families of lithium-ion battery cathodes—layered/spinel and phosphate oxides—dramatically enhancing the cycling stability of disordered rocksalt cathodes with 4.8 V upper cut-off voltage. The cathode exhibits high gravimetric energy densities above 1,100 Wh kg−1 and >70% retention over 100 cycles. This study opens up a broad compositional space for developing battery cathodes using earth-abundant elements such as Mn and Fe
Associations of HLA-DP Variants with Hepatitis B Virus Infection in Southern and Northern Han Chinese Populations: A Multicenter Case-Control Study
) locus has been reported to be associated with hepatitis B virus (HBV) infection in populations of Japan and Thailand. We aimed to examine whether the association can be replicated in Han Chinese populations. = 0.097∼0.697 and 0.198∼0.615 in northern Chinese population, respectively). loci were strongly associated with HBV infection in southern and northern Han Chinese populations, but not with HBV progression
Native and recombinant production of the glucuronoyl esterase from the litter decomposing fungus Stropharia coronilla
As the sustainable energy is becoming increasingly important, utilization of lignocellulosic biomass for biofuel production is the central part of this area. Fungal enzymes play an important role in lignocellulose degradation. Glucuronoyl esterase (GE) is a less studied fungal enzyme which degrades the ester linkage between lignin alcohol and hemicellulose side chain 4-O-methyl D-glucuronic acids. Genes encoding GE have been identified from various fungal species and they have been expressed in different production systems to be able to study their biochemical properties in detail.
The gene encoding GE from the basidiomycete litter-decomposing fungus Stropharia coronilla was cloned and heterologous expressed in Pichia pastoris yeast. The expression and secretion of GE was induced by growing S. coronilla on lignocellulose supplemented cultivations. ScGE activity can be detected after the fifth day cultivation and it peaked on the 14th day. The heterologous expression of ScGE in P. pastoris showed that ScGE was produced as an enzymatically active protein. The commercial K-URONIC kit supplemented with a GE specific substrate benzyl-D-glucuronate was used to determine GE activity
Chromosome-wide co-fluctuation of stochastic gene expression in mammalian cells.
Gene expression is subject to stochastic noise, but to what extent and by which means such stochastic variations are coordinated among different genes are unclear. We hypothesize that neighboring genes on the same chromosome co-fluctuate in expression because of their common chromatin dynamics, and verify it at the genomic scale using allele-specific single-cell RNA-sequencing data of mouse cells. Unexpectedly, the co-fluctuation extends to genes that are over 60 million bases apart. We provide evidence that this long-range effect arises in part from chromatin co-accessibilities of linked loci attributable to three-dimensional proximity, which is much closer intra-chromosomally than inter-chromosomally. We further show that genes encoding components of the same protein complex tend to be chromosomally linked, likely resulting from natural selection for intracellular among-component dosage balance. These findings have implications for both the evolution of genome organization and optimal design of synthetic genomes in the face of gene expression noise
- …