13 research outputs found

    Research News. Publications, 2019. Volume 1

    Get PDF
    Publications that appeared during the period January 1 through March 31, 201

    Computational methods for the discovery and analysis of genes and other functional DNA sequences

    Get PDF
    The need for automating genome analysis is a result of the tremendous amount of genomic data. As of today, a high-throughput DNA sequencing machine can run millions of sequencing reactions in parallel, and it is becoming faster and cheaper to sequence the entire genome of an organism. Public databases containing genomic data are growing exponentially, and hence the rise in demand for intuitive automated methods of DNA analysis and subsequent gene identification. However, the complexity of gene organization makes automation a challenging task, and smart algorithm design and parallelization are necessary to perform accurate analyses in reasonable amounts of time. This work describes two such automated methods for the identification of novel genes within given DNA sequences. The first method utilizes negative selection patterns as an evolutionary rationale for the identification of additional members of a gene family. As input it requires a known protein coding gene in that family. The second method is a massively parallel data mining algorithm that searches a whole genome for inverted repeats (palindromic sequences) and identifies potential precursors of non-coding RNA genes. Both methods were validated successfully on the fully sequenced and well studied plant species, Arabidopsis thaliana --Abstract, page iv

    A Predictive Model Which Uses Descriptors of RNA Secondary Structures Derived from Graph Theory.

    Get PDF
    The secondary structures of ribonucleic acid (RNA) have been successfully modeled with graph-theoretic structures. Often, simple graphs are used to represent secondary RNA structures; however, in this research, a multigraph representation of RNA is used, in which vertices represent stems and edges represent the internal motifs. Any type of RNA secondary structure may be represented by a graph in this manner. We define novel graphical invariants to quantify the multigraphs and obtain characteristic descriptors of the secondary structures. These descriptors are used to train an artificial neural network (ANN) to recognize the characteristics of secondary RNA structure. Using the ANN, we classify the multigraphs as either RNA-like or not RNA-like. This classification method produced results similar to other classification methods. Given the expanding library of secondary RNA motifs, this method may provide a tool to help identify new structures and to guide the rational design of RNA molecules

    Bayesian nonparametric clusterings in relational and high-dimensional settings with applications in bioinformatics.

    Get PDF
    Recent advances in high throughput methodologies offer researchers the ability to understand complex systems via high dimensional and multi-relational data. One example is the realm of molecular biology where disparate data (such as gene sequence, gene expression, and interaction information) are available for various snapshots of biological systems. This type of high dimensional and multirelational data allows for unprecedented detailed analysis, but also presents challenges in accounting for all the variability. High dimensional data often has a multitude of underlying relationships, each represented by a separate clustering structure, where the number of structures is typically unknown a priori. To address the challenges faced by traditional clustering methods on high dimensional and multirelational data, we developed three feature selection and cross-clustering methods: 1) infinite relational model with feature selection (FIRM) which incorporates the rich information of multirelational data; 2) Bayesian Hierarchical Cross-Clustering (BHCC), a deterministic approximation to Cross Dirichlet Process mixture (CDPM) and to cross-clustering; and 3) randomized approximation (RBHCC), based on a truncated hierarchy. An extension of BHCC, Bayesian Congruence Measuring (BCM), is proposed to measure incongruence between genes and to identify sets of congruent loci with identical evolutionary histories. We adapt our BHCC algorithm to the inference of BCM, where the intended structure of each view (congruent loci) represents consistent evolutionary processes. We consider an application of FIRM on categorizing mRNA and microRNA. The model uses latent structures to encode the expression pattern and the gene ontology annotations. We also apply FIRM to recover the categories of ligands and proteins, and to predict unknown drug-target interactions, where latent categorization structure encodes drug-target interaction, chemical compound similarity, and amino acid sequence similarity. BHCC and RBHCC are shown to have improved predictive performance (both in terms of cluster membership and missing value prediction) compared to traditional clustering methods. Our results suggest that these novel approaches to integrating multi-relational information have a promising future in the biological sciences where incorporating data related to varying features is often regarded as a daunting task

    Targeting protein kinases to manage or prevent Alzheimer’s disease

    Get PDF
    Due to the pressing need for new disease-modifying drugs for Alzheimer’s disease (AD), new treatment strategies and alternative drug targets are currently being heavily researched. One such strategy is to modulate protein kinases such as cyclin-dependent kinase 1 (CDK1), cyclin-dependent kinase 5 (CDK5), glycogen synthase kinase-3 (GSK-3α and GSK-3β), and the protein kinase RNA-like endoplasmic reticulum kinase (PERK). AD intervention by reduction of amyloid beta (Aβ) levels is also possible through development of protein kinase C-epsilon (PKC-ϵ) activators to recover α-secretase levels and decrease toxic Aβ levels, thereby restoring synaptogenesis and cognitive function. In this way, we aim to develop new AD drugs by targeting kinases that participate in AD pathophysiology. In our studies, comparative modeling was performed to construct 3D models for kinases whose crystal structures have not yet been identified. The information from structurally similar proteins was used to define the amino acid residues in the ATP binding site as well as other important sites and motifs. We searched for the comstructural motifs and domains of GSK-3β, CDK5 and PERK. Further, we identified the conserved water molecules in GSK-3β, CDK5 and PERK through calculation of the degree of water conservation. We investigated the protein-ligand interaction profiles of CDK1, CDK5, GSK-3α, GSK-3β and PERK based on molecular dynamics (MD) simulations, which provided a time-dependent demonstration of the interactions and contacts for each ligand. In addition, we explored the protein-protein interactions between CDK5 and p25. Small molecules which target this interaction may offer a prospective therapeutic benefit for AD. In order to identify new modulators for protein kinase targets in AD, we implemented three virtual screening protocols. The first protocol was a combined ligand- and protein structure-based approach to find new PERK inhibitors. In the second protocol, protein structure-based virtual screening was applied to find multiple-kinase inhibitors through parallel docking simulations into validated models of CDK1, CDK5 and GSK-3 kinases. In the third protocol, we searched for potential activators of PKC-ϵ based on the structure of its C1B domain

    A NOVEL COMPUTATIONAL FRAMEWORK FOR TRANSCRIPTOME ANALYSIS WITH RNA-SEQ DATA

    Get PDF
    The advance of high-throughput sequencing technologies and their application on mRNA transcriptome sequencing (RNA-seq) have enabled comprehensive and unbiased profiling of the landscape of transcription in a cell. In order to address the current limitation of analyzing accuracy and scalability in transcriptome analysis, a novel computational framework has been developed on large-scale RNA-seq datasets with no dependence on transcript annotations. Directly from raw reads, a probabilistic approach is first applied to infer the best transcript fragment alignments from paired-end reads. Empowered by the identification of alternative splicing modules, this framework then performs precise and efficient differential analysis at automatically detected alternative splicing variants, which circumvents the need of full transcript reconstruction and quantification. Beyond the scope of classical group-wise analysis, a clustering scheme is further described for mining prominent consistency among samples in transcription, breaking the restriction of presumed grouping. The performance of the framework has been demonstrated by a series of simulation studies and real datasets, including the Cancer Genome Atlas (TCGA) breast cancer analysis. The successful applications have suggested the unprecedented opportunity in using differential transcription analysis to reveal variations in the mRNA transcriptome in response to cellular differentiation or effects of diseases
    corecore