73 research outputs found

    Genomic Studies of Gene Expression Errors and Their Evolutionary Ramifications

    Full text link
    Gene expression produces biologically functional RNAs and proteins and is essential for life. Nevertheless, gene expression is subject to several types of errors that are generally harmful. Despite the prevalence and significant consequences of expression errors, their genome-wide patterns are not well characterized. Furthermore, the evolutionary ramifications of such errors are poorly understood. In my dissertation, I address the above questions using novel computational approaches. I focus on two types of gene expression errors: (i) stochastic gene expression, which leads to a variation of the expression level among isogenic cells in the same environment (gene expression noise), and (ii) mistranslation, which induces protein misfolding and can be toxic to the cells. My thesis has three main chapters in addition to the introduction and conclusion chapters. First, in Chapter 2, I studied gene expression noises of individual genes. I decomposed noises of 3975 mouse genes into intrinsic noise and extrinsic noises and studied their biological mechanisms and evolution consequences. Next, in Chapter 3, I move forward to consider gene expression noises for pairs of genes simultaneously. I discovered chromosome-wide co-fluctuation in expression for linked genes, which is partly due to chromatin co-accessibilities of linked loci attributable to three-dimensional proximity. I further found that genes encoding components of the same protein complex are more likely to become linked during evolution due to natural selection for intracellular among-component dosage balance. Thus, selection for mitigating the harm of expression noise drives the nonrandom genomic distributions of genes. Finally, in Chapter 4, I studied yet another kind of expression error: mistranslation. I focused on the relationship between mistranslation and codon usage. Specifically, I provide the first direct and global evidence for a prominent but unresolved hypothesis: preferred codons are translated more accurately. Furthermore, I showed that this proposition is generally true across three domains of life. Interestingly, the relative translational accuracies of synonymous codons vary drastically among species, which is mainly explained by the variation of tRNA compositions. Together with other information, these findings suggest that codon usage coevolves with the cellular tRNA pool to maximize translational accuracy and efficiency. In conclusion, my dissertation documents the genome-wide patterns of gene expression errors and demonstrates their profound impacts on both molecular and phenotypic evolution. The knowledge gained has implications beyond expression errors because of the universality of molecular errors in cellular life.PHDEcology and Evolutionary BiologyUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169993/1/mengysun_1.pd

    DoDo-Code: a Deep Levenshtein Distance Embedding-based Code for IDS Channel and DNA Storage

    Full text link
    Recently, DNA storage has emerged as a promising data storage solution, offering significant advantages in storage density, maintenance cost efficiency, and parallel replication capability. Mathematically, the DNA storage pipeline can be viewed as an insertion, deletion, and substitution (IDS) channel. Because of the mathematical terra incognita of the Levenshtein distance, designing an IDS-correcting code is still a challenge. In this paper, we propose an innovative approach that utilizes deep Levenshtein distance embedding to bypass these mathematical challenges. By representing the Levenshtein distance between two sequences as a conventional distance between their corresponding embedding vectors, the inherent structural property of Levenshtein distance is revealed in the friendly embedding space. Leveraging this embedding space, we introduce the DoDo-Code, an IDS-correcting code that incorporates deep embedding of Levenshtein distance, deep embedding-based codeword search, and deep embedding-based segment correcting. To address the requirements of DNA storage, we also present a preliminary algorithm for long sequence decoding. As far as we know, the DoDo-Code is the first IDS-correcting code designed using plausible deep learning methodologies, potentially paving the way for a new direction in error-correcting code research. It is also the first IDS code that exhibits characteristics of being `optimal' in terms of redundancy, significantly outperforming the mainstream IDS-correcting codes of the Varshamov-Tenengolts code family in code rate

    Levenshtein Distance Embedding with Poisson Regression for DNA Storage

    Full text link
    Efficient computation or approximation of Levenshtein distance, a widely-used metric for evaluating sequence similarity, has attracted significant attention with the emergence of DNA storage and other biological applications. Sequence embedding, which maps Levenshtein distance to a conventional distance between embedding vectors, has emerged as a promising solution. In this paper, a novel neural network-based sequence embedding technique using Poisson regression is proposed. We first provide a theoretical analysis of the impact of embedding dimension on model performance and present a criterion for selecting an appropriate embedding dimension. Under this embedding dimension, the Poisson regression is introduced by assuming the Levenshtein distance between sequences of fixed length following a Poisson distribution, which naturally aligns with the definition of Levenshtein distance. Moreover, from the perspective of the distribution of embedding distances, Poisson regression approximates the negative log likelihood of the chi-squared distribution and offers advancements in removing the skewness. Through comprehensive experiments on real DNA storage data, we demonstrate the superior performance of the proposed method compared to state-of-the-art approaches

    CDSD: Chinese Dysarthria Speech Database

    Full text link
    We present the Chinese Dysarthria Speech Database (CDSD) as a valuable resource for dysarthria research. This database comprises speech data from 24 participants with dysarthria. Among these participants, one recorded an additional 10 hours of speech data, while each recorded one hour, resulting in 34 hours of speech material. To accommodate participants with varying cognitive levels, our text pool primarily consists of content from the AISHELL-1 dataset and speeches by primary and secondary school students. When participants read these texts, they must use a mobile device or the ZOOM F8n multi-track field recorder to record their speeches. In this paper, we elucidate the data collection and annotation processes and present an approach for establishing a baseline for dysarthric speech recognition. Furthermore, we conducted a speaker-dependent dysarthric speech recognition experiment using an additional 10 hours of speech data from one of our participants. Our research findings indicate that, through extensive data-driven model training, fine-tuning limited quantities of specific individual data yields commendable results in speaker-dependent dysarthric speech recognition. However, we observe significant variations in recognition results among different dysarthric speakers. These insights provide valuable reference points for speaker-dependent dysarthric speech recognition.Comment: 9 pages, 3 figure

    Promoting reading comprehension and critical–analytic thinking: A comparison of three approaches with fourth and fifth graders

    Get PDF
    Comprehending and critically analyzing complex, content-rich text is an essential requirement of academic excellence as well as a life-long skill for students. Unfortunately, students often struggle to comprehend print and digital media, and subsequently, they are unable to complete essential tasks, such as identifying information, making inferences, examining arguments, or vetting sources. In the present study, we compared the effectiveness of three reading interventions (i.e., Quality Talk (QT), Think before reading, think While reading, think After reading (TWA), and TWA/QT Hybrid) in promoting fourth- and fifth-grade students’ reading comprehension and critical–analytic thinking. Specifically, teachers in each intervention delivered the respective instructional mini-lessons in their language arts classes and conducted weekly text-based discussions. The results suggested that the Hybrid and QT interventions were effective at promoting high-level comprehension among fourth- and fifth-grade students. Evidence supported that students participating in the Hybrid and QT interventions engaged in more critical–analytic thinking during text-based discussions than those who received the TWA intervention, as evidenced by statistically significantly greater numbers of student-generated authentic questions and elaborated explanations. The Hybrid and QT interventions were also found to effectively boost students’ oral reading fluency in both grades across two phases. Moreover, fifth-grade students who participated in the Hybrid intervention outperformed their peers from the TWA group on the post-discussion reading comprehension assessments

    Annealing novel nucleobase-lipids with oligonucleotides or plasmid DNA based on H-bonding or π-π interaction:Assemblies and transfections

    Get PDF
    Lipid derivatives of nucleoside analogs have been highlighted for their potential for effective gene delivery. A novel class of nucleobase-lipids are rationally designed and readily synthesized, comprising thymine/cytosine, an ester/amide linker and an oleyl lipid. The diversity of four nucleobase-lipids termed DXBAs (DOTA, DNTA, DOCA and DNCA) is investigated. Besides, DNCA is demonstrated to be an effective neutral transfection material for nucleic acid delivery, which enbles to bind to oligonucleotides via H-bonding and π-π stacking with reduced toxicity in vitro and in vivo. Several kinds of nucleic acid drugs including aptamer, ssRNA, antisense oligonucleotide, and plasmid DNAs can be delivered by DXBAs, especially DNCA. In particular, G4-aptamer AS1411 encapsulated by DNCA exhibits cellular uptake enhancement, lysosome degradation reduction, cell apoptosis promotion, cell cycle phase alteration in vitro and duration prolongation in vivo, resulting in significant anti-proliferative activity. Our results demonstrate that DNCA is a promising transfection agent for G4-aptamers and exhibites bright application prospects in the permeation improvement of single-stranded oligonucleotides or plasmid DNAs

    Integrated rocksalt–polyanion cathodes with excess lithium and stabilized cycling

    Get PDF
    Co- and Ni-free disordered rocksalt cathodes utilize oxygen redox to increase the energy density of lithium-ion batteries, but it is challenging to achieve good cycle life at high voltages >4.5 V (versus Li/Li+). Here we report a family of Li-excess Mn-rich cathodes that integrates rocksalt- and polyanion-type structures. Following design rules for cation filling and ordering, we demonstrate the bulk incorporation of polyanion groups into the rocksalt lattice. This integration bridges the two primary families of lithium-ion battery cathodes—layered/spinel and phosphate oxides—dramatically enhancing the cycling stability of disordered rocksalt cathodes with 4.8 V upper cut-off voltage. The cathode exhibits high gravimetric energy densities above 1,100 Wh kg−1 and >70% retention over 100 cycles. This study opens up a broad compositional space for developing battery cathodes using earth-abundant elements such as Mn and Fe

    Associations of HLA-DP Variants with Hepatitis B Virus Infection in Southern and Northern Han Chinese Populations: A Multicenter Case-Control Study

    Get PDF
    ) locus has been reported to be associated with hepatitis B virus (HBV) infection in populations of Japan and Thailand. We aimed to examine whether the association can be replicated in Han Chinese populations. = 0.097∼0.697 and 0.198∼0.615 in northern Chinese population, respectively). loci were strongly associated with HBV infection in southern and northern Han Chinese populations, but not with HBV progression

    Native and recombinant production of the glucuronoyl esterase from the litter decomposing fungus Stropharia coronilla

    No full text
    As the sustainable energy is becoming increasingly important, utilization of lignocellulosic biomass for biofuel production is the central part of this area. Fungal enzymes play an important role in lignocellulose degradation. Glucuronoyl esterase (GE) is a less studied fungal enzyme which degrades the ester linkage between lignin alcohol and hemicellulose side chain 4-O-methyl D-glucuronic acids. Genes encoding GE have been identified from various fungal species and they have been expressed in different production systems to be able to study their biochemical properties in detail. The gene encoding GE from the basidiomycete litter-decomposing fungus Stropharia coronilla was cloned and heterologous expressed in Pichia pastoris yeast. The expression and secretion of GE was induced by growing S. coronilla on lignocellulose supplemented cultivations. ScGE activity can be detected after the fifth day cultivation and it peaked on the 14th day. The heterologous expression of ScGE in P. pastoris showed that ScGE was produced as an enzymatically active protein. The commercial K-URONIC kit supplemented with a GE specific substrate benzyl-D-glucuronate was used to determine GE activity

    Chromosome-wide co-fluctuation of stochastic gene expression in mammalian cells.

    No full text
    Gene expression is subject to stochastic noise, but to what extent and by which means such stochastic variations are coordinated among different genes are unclear. We hypothesize that neighboring genes on the same chromosome co-fluctuate in expression because of their common chromatin dynamics, and verify it at the genomic scale using allele-specific single-cell RNA-sequencing data of mouse cells. Unexpectedly, the co-fluctuation extends to genes that are over 60 million bases apart. We provide evidence that this long-range effect arises in part from chromatin co-accessibilities of linked loci attributable to three-dimensional proximity, which is much closer intra-chromosomally than inter-chromosomally. We further show that genes encoding components of the same protein complex tend to be chromosomally linked, likely resulting from natural selection for intracellular among-component dosage balance. These findings have implications for both the evolution of genome organization and optimal design of synthetic genomes in the face of gene expression noise
    • …
    corecore