1,720 research outputs found

    Statistical Algorithms and Bioinformatics Tools Development for Computational Analysis of High-throughput Transcriptomic Data

    Get PDF
    Next-Generation Sequencing technologies allow for a substantial increase in the amount of data available for various biological studies. In order to effectively and efficiently analyze this data, computational approaches combining mathematics, statistics, computer science, and biology are implemented. Even with the substantial efforts devoted to development of these approaches, numerous issues and pitfalls remain. One of these issues is mapping uncertainty, in which read alignment results are biased due to the inherent difficulties associated with accurately aligning RNA-Sequencing reads. GeneQC is an alignment quality control tool that provides insight into the severity of mapping uncertainty in each annotated gene from alignment results. GeneQC used feature extraction to identify three levels of information for each gene and implements elastic net regularization and mixture model fitting to provide insight in the severity of mapping uncertainty and the quality of read alignment. In combination with GeneQC, the Ambiguous Reads Mapping (ARM) algorithm works to re-align ambiguous reads through the integration of motif prediction from metabolic pathways to establish coregulatory gene modules for re-alignment using a negative binomial distribution-based probabilistic approach. These two tools work in tandem to address the issue of mapping uncertainty and provide more accurate read alignments, and thus more accurate expression estimates. Also presented in this dissertation are two approaches to interpreting the expression estimates. The first is IRIS-EDA, an integrated shiny web server that combines numerous analyses to investigate gene expression data generated from RNASequencing data. The second is ViDGER, an R/Bioconductor package that quickly generates high-quality visualizations of differential gene expression results to assist users in comprehensive interpretations of their differential gene expression results, which is a non-trivial task. These four presented tools cover a variety of aspects of modern RNASeq analyses and aim to address bottlenecks related to algorithmic and computational issues, as well as more efficient and effective implementation methods

    Dwelling Quietly in the Rich Club: Brain Network Determinants of Slow Cortical Fluctuations

    Full text link
    For more than a century, cerebral cartography has been driven by investigations of structural and morphological properties of the brain across spatial scales and the temporal/functional phenomena that emerge from these underlying features. The next era of brain mapping will be driven by studies that consider both of these components of brain organization simultaneously -- elucidating their interactions and dependencies. Using this guiding principle, we explored the origin of slowly fluctuating patterns of synchronization within the topological core of brain regions known as the rich club, implicated in the regulation of mood and introspection. We find that a constellation of densely interconnected regions that constitute the rich club (including the anterior insula, amygdala, and precuneus) play a central role in promoting a stable, dynamical core of spontaneous activity in the primate cortex. The slow time scales are well matched to the regulation of internal visceral states, corresponding to the somatic correlates of mood and anxiety. In contrast, the topology of the surrounding "feeder" cortical regions show unstable, rapidly fluctuating dynamics likely crucial for fast perceptual processes. We discuss these findings in relation to psychiatric disorders and the future of connectomics.Comment: 35 pages, 6 figure

    Improved Computational Prediction of Function and Structural Representation of Self-Cleaving Ribozymes with Enhanced Parameter Selection and Library Design

    Get PDF
    Biomolecules could be engineered to solve many societal challenges, including disease diagnosis and treatment, environmental sustainability, and food security. However, our limited understanding of how mutational variants alter molecular structures and functional performance has constrained the potential of important technological advances, such as high-throughput sequencing and gene editing. Ribonuleic Acid (RNA) sequences are thought to play a central role within many of these challenges. Their continual discovery throughout all domains of life is evidence of their significant biological importance (Weinreb et al., 2016). The self-cleaving ribozyme is a class of noncoding Ribonuleic Acid (ncRNA) that has been useful for relating sequence variants to structural features and their associated catalytic activities. Self-cleaving ribozymes possess tractable sequence spaces, perform easily identifiable catalytic functions, and have well documented structures. The determination of a self-cleaving ribozyme’s structure and catalytic activity within the laboratory is typically a slow and expensive process. Most current explorations of structure and function come from these empirical processes. Computational approaches to the prediction of catalytic activity and structure are fast and inexpensive, but have failed both to achieve atomic accuracy or to correctly identify all base-pair interactions (Watkins et al., 2018). One prominent impediment to computational approaches is the lack of existing structural and functional data typically required by predictive models (Jumper et al., 2021). Using data from deep-mutational scanning experiments and high-throughput sequencing technology, it is possible to computationally map mutational variants to their observed catalytic activity for a range of self-cleaving ribozymes. The resulting map reveals important base-pairing relationships that, in turn, facilitate accurate predictions of higher-order variants. Using sequence data from three experimental replicates of five model self-cleaving ribozymes, I will identify and map all single and double mutation variants to their observed cleavage activity. These mappings will be used to identify structural features within each ribozyme. Next, I will show within a training tool how observed cleavage for multiple reaction times can be used to identify the catalytic rates of our model ribozymes. Finally, I will predict the functional activity for model ribozyme variants of various mutational orders using machine learning models trained only on functionally labeled sequence variants. Together, these three dissertation chapters represent the kind of analysis needed to further the implementation of more accurate structural and functional prediction algorithms

    Improving the Ribozyme Toolbox: From Structure-Function Insights to Synthetic Biology Applications

    Get PDF
    Self-cleaving ribozymes are a naturally occurring class of catalytically active RNA molecules which cleave their own phosphate backbone. In nature, self-cleaving ribozymes are best known for their role in processing concatamers of viral genomes into monomers during viral replication in some RNA viruses, but to a lesser degree have also been implicated in mRNA regulation and processing in bacteria and eukaryotes. In addition to their biological relevance, these RNA enzymes have been harnessed as important biomolecular tools with a variety of applications in fields such as bioengineering. Self-cleaving ribozymes are relatively small and easy to generate in the lab using common molecular biology approaches, and have therefore been accessible and well exploited model systems used to interrogate RNA sequence-structure-function relationships. Furthermore, self-cleaving ribozymes are also being implemented as parts in the development of various biomolecular tools such as biosensors and gene regulatory elements. While much progress has been made in these areas, there are still challenges associated with the performance and implementation of such tools. The work contained in this dissertation aims to address several of these challenges and improve the ribozyme toolbox in several diverse areas. Chapter one provides an introduction to pertinent background information for this dissertation. Chapter two aims to improve the ribozyme toolbox by providing and analyzing new high-throughput sequence-structure-function data sets on five different self-cleaving ribozymes, and identifying how trends in epistasis relate to distinct structural elements. Chapter three uses such high-throughput data to train machine learning models that accurately predict the historically difficult to predict functional effects of higher order mutations in functional RNA’s. Finally, in chapter four, I developed a biologically relevant platform to study the real time performance and kinetics of self-cleaving ribozyme-based gene regulatory elements directly at the site of transcription in mammalian cells

    Quantification of C-type lectin receptors signal transduction

    Get PDF
    Ubiquitous glycans facilitate a plethora of important interactions namely cancer-host, host-pathogen, host-self interactions. Interaction with theses carbohydrates is enabled by lectins and the effects of these interactions can range from redundant to essential. Lectins are exposed on mammalian cell surfaces where they identify the information encoded in glycans and transfer it into signal transduction pathways. Such signal transduction pathways are complex and difficult to analyse. However, quantitative data with single cell resolution provides means to disentangle the associated signalling cascades. C-Type lectin receptors (CLRs) expressed on immune cells were chosen as a model system to study their capacity to transmit information encoded in glycans of incoming particles. To this end, monocytic cell lines cell lines expressing DC-SIGN, MCL, dectin-1, dectin-2, and mincle, as well as TNFAR and TLR-1&2 were established. Based on the study of Cheong et al., 2011 the amount of transmitted information was quantified by following NFκB dependent GFP expression. While most receptors did have a channel capacity of at least 1 bit, it was found that dectin-2 has a lower capacity to transmit information than other lectins. Especially the comparison to the related lectin mincle is interesting, since mincle uses the same pathway effectively. Furthermore, information transmission of dectin-2 could not be enhanced by other lectins or signalling molecules. Yet upon closer analysis it was found that the sensitivity of the dectin-2 signal transduction pathway can be enhanced by overexpression of its co- receptor FcRγ, but surprisingly its transmitted information cannot. Moreover, it was suggested how potential autoimmunity might be a cause for dectin-2’s inefficient signalling. The question of signal integration was also approached: How do cells combine the flow of information from multiple receptors? It was shown that the signal of dectin-2 and dectin-1 are being integrated as a compromise between both receptors. The reason for this compromise might be the activity of the phosphoprotein SYK, present in both dectin-1 and dectin-2 signal transduction pathways. By using the established assays and cell lines, soluble beta glucans (SBGs) were discovered to be potent stimulators of dectin-1, where sensitivity to the SBGs was highly variable and dependent on their β- glucan side chains. Various different ligands for mincle on the other hand resulted in a similar signalling behaviour. Building on insight in targeted delivery to lectins, it was shown how nucleic acids can be delivered to Langerin expressing cells and used to reprogramme the cells, a technology of tremendous potential for vaccination strategies and (non-germline) genetic editing. 8 Taken together, the concepts of information theory with single cell resolved data enabled the quantification of CLRs signalling behaviour and signal integration. By using dectin-2 and other lectins as example it was demonstrated how the receptor itself determine the efficiency and therefore outcome of the signal transduction pathways. Moreover, the potential to explore glycan lectins interactions in drug targeting was exemplified by delivering mRNA via Langerin or demonstrating the dependency of dectin-1 sensitivity upon the β-glucan side chains of its ligands

    Quantification of C-type lectin receptors signal transduction

    Get PDF

    Data Mining On Retail Banking Simulation Model To Devise Productivity Improvement Strategies

    Get PDF
    Retailing banking is service-oriented business. Customers visit different branches to procure services such as transaction inquiry, process bank account, etc. The scenario is operators of various positions of a branch have to render counter services and meet certain customer service level. Customers arrive at the counter at different patterns. Information can be collected, such as customer waiting time, operator on service time, number of tickets received at particular time. Such information allows data mining to be performed to discover patterns that could segregate the type of services. The study aims to demonstrate such possibility through computer simulation. Specifically, investigation is made for the data mining to determine the clustering and classification of the branches and operator productivity respectively. The research methodology involves seven steps. First, business understanding is performed to gain insight into the motive of initiating data mining exercise. Two levels, macro and micro levels were defined to differentiate inter and intra-branches comparisons. Second, a computer simulation would be constructed in WITNESS Horizon V.21, largely based on the description of a real branch. Different scenarios were built reflecting operating behaviors of different branches. The simulation stores data (about six thousands of records) in a database. The following steps involve different data selection processing strategies (selection, cleaning, transformation). Next is the data mining, primarily using Python Orange V.3.20. Last step will be pattern evaluation to develop suitable productivity improvement strategies. In this research, a variety of data mining tools have been deployed and multiple insights were generated. Notably, branches adopted lean management have shown improved in general productivity. Operator performances were able to differentiate based on years of experience. This research provides opportunities for researchers to examine the productivity of branches with other data mining techniques. It also helps bankers to focus on the right areas to be improved and increase the ability in decision making. There are two main limitations of the research. First, the simulation model does not capture the whole intricacy of retail banking front-end operations. Second, software limitations hindered more complex data mining tasks to be performed
    corecore