319 research outputs found

    Genomics and proteomics: a signal processor's tour

    Get PDF
    The theory and methods of signal processing are becoming increasingly important in molecular biology. Digital filtering techniques, transform domain methods, and Markov models have played important roles in gene identification, biological sequence analysis, and alignment. This paper contains a brief review of molecular biology, followed by a review of the applications of signal processing theory. This includes the problem of gene finding using digital filtering, and the use of transform domain methods in the study of protein binding spots. The relatively new topic of noncoding genes, and the associated problem of identifying ncRNA buried in DNA sequences are also described. This includes a discussion of hidden Markov models and context free grammars. Several new directions in genomic signal processing are briefly outlined in the end

    Evolutionary Modeling and Prediction of Non-Coding RNAs in Drosophila

    Get PDF
    We performed benchmarks of phylogenetic grammar-based ncRNA gene prediction, experimenting with eight different models of structural evolution and two different programs for genome alignment. We evaluated our models using alignments of twelve Drosophila genomes. We find that ncRNA prediction performance can vary greatly between different gene predictors and subfamilies of ncRNA gene. Our estimates for false positive rates are based on simulations which preserve local islands of conservation; using these simulations, we predict a higher rate of false positives than previous computational ncRNA screens have reported. Using one of the tested prediction grammars, we provide an updated set of ncRNA predictions for D. melanogaster and compare them to previously-published predictions and experimental data. Many of our predictions show correlations with protein-coding genes. We found significant depletion of intergenic predictions near the 3′ end of coding regions and furthermore depletion of predictions in the first intron of protein-coding genes. Some of our predictions are colocated with larger putative unannotated genes: for example, 17 of our predictions showing homology to the RFAM family snoR28 appear in a tandem array on the X chromosome; the 4.5 Kbp spanned by the predicted tandem array is contained within a FlyBase-annotated cDNA

    The long noncoding RNA neuroLNC regulates presynaptic activity by interacting with the neurodegeneration-associated protein TDP-43

    No full text
    The cellular and the molecular mechanisms by which long noncoding RNAs (lncRNAs) may regulate presynaptic function and neuronal activity are largely unexplored. Here, we established an integrated screening strategy to discover lncRNAs implicated in neurotransmitter and synaptic vesicle release. With this approach, we identified neuroLNC, a neuron-specific nuclear lncRNA conserved from rodents to humans. NeuroLNC is tuned by synaptic activity and influences several other essential aspects of neuronal development including calcium influx, neuritogenesis, and neuronal migration in vivo. We defined the molecular interactors of neuroLNC in detail using chromatin isolation by RNA purification, RNA interactome analysis, and protein mass spectrometry. We found that the effects of neuroLNC on synaptic vesicle release require interaction with the RNA-binding protein TDP-43 (TAR DNA binding protein-43) and the selective stabilization of mRNAs encoding for presynaptic proteins. These results provide the first proof of an lncRNA that orchestrates neuronal excitability by influencing presynaptic function

    Big data analytics in computational biology and bioinformatics

    Get PDF
    Big data analytics in computational biology and bioinformatics refers to an array of operations including biological pattern discovery, classification, prediction, inference, clustering as well as data mining in the cloud, among others. This dissertation addresses big data analytics by investigating two important operations, namely pattern discovery and network inference. The dissertation starts by focusing on biological pattern discovery at a genomic scale. Research reveals that the secondary structure in non-coding RNA (ncRNA) is more conserved during evolution than its primary nucleotide sequence. Using a covariance model approach, the stems and loops of an ncRNA secondary structure are represented as a statistical image against which an entire genome can be efficiently scanned for matching patterns. The covariance model approach is then further extended, in combination with a structural clustering algorithm and a random forests classifier, to perform genome-wide search for similarities in ncRNA tertiary structures. The dissertation then presents methods for gene network inference. Vast bodies of genomic data containing gene and protein expression patterns are now available for analysis. One challenge is to apply efficient methodologies to uncover more knowledge about the cellular functions. Very little is known concerning how genes regulate cellular activities. A gene regulatory network (GRN) can be represented by a directed graph in which each node is a gene and each edge or link is a regulatory effect that one gene has on another gene. By evaluating gene expression patterns, researchers perform in silico data analyses in systems biology, in particular GRN inference, where the “reverse engineering” is involved in predicting how a system works by looking at the system output alone. Many algorithmic and statistical approaches have been developed to computationally reverse engineer biological systems. However, there are no known bioin-formatics tools capable of performing perfect GRN inference. Here, extensive experiments are conducted to evaluate and compare recent bioinformatics tools for inferring GRNs from time-series gene expression data. Standard performance metrics for these tools based on both simulated and real data sets are generally low, suggesting that further efforts are needed to develop more reliable GRN inference tools. It is also observed that using multiple tools together can help identify true regulatory interactions between genes, a finding consistent with those reported in the literature. Finally, the dissertation discusses and presents a framework for parallelizing GRN inference methods using Apache Hadoop in a cloud environment

    Analysis of Machine Learning Based Methods for Identifying MicroRNA Precursors

    Get PDF
    MicroRNAs are a type of non-coding RNA that were discovered less than a decade ago but are now known to be incredibly important in regulating gene expression despite their small size. However, due to their small size, and several other limiting factors, experimental procedures have had limited success in discovering new microRNAs. Computational methods are therefore vital to discovering novel microRNAs. Many different approaches have been used to scan genomic sequences for novel microRNAs with varying degrees of success. This work provides an overview of these computational methods, focusing particularly on those methods based on machine learning techniques. The results of experiments performed on several of the machine learning based microRNA detectors are provided along with an analysis of their performance

    In silico prediction of active RNA genes in legumes

    No full text
    Accumulating evidence suggests that non-coding RNAs (ncRNAs) play key roles in gene regulation and may form the basis of an inter-gene communication system. MicroRNAs are a class of small non-coding RNAs found in both plants and animals that regulate the expression of other genes. Identification and analysis of microRNAs enhances our understanding of the important roles that microRNAs play in this complex regulatory network. The work presented in this thesis constitutes the first large-scale prediction and characterization of both ncRNAs and miRNAs in the model legume Medicago truncatula and Lotus japonicus, and provides a basis for further research on elucidating ncRNA function in legume genomics..

    DESIGNING SECONDARY STRUCTURE PROFILES FOR FAST NCRNA IDENTIFICATION

    Full text link

    Computational studies of RNA modification-dependent RNA binding protein networks

    Get PDF
    The covalent modification of RNA nucleotides is a powerful layer of post-transcriptional control of gene expression across the tree of life. Historically, only abundant modifications on abundant RNAs such as tRNA and rRNA could be studied, due to methodological limitations. In the past decade, leaps forward in biochemistry and high throughput sequencing methods have enabled mapping of RNA modifications across all RNA species. In particular this thesis focuses on the most abundant internal modification of mRNA, N6-methyladenosine (m6A), and how RNA binding proteins (RBPs) interact with RNA modifications to impact RNA life cycle. Alongside these experimental developments have come new computational challenges. Integration of many datasets must be approached carefully, with a view to extract as much biological information as possible. Throughout this work I describe the development of open source computational tools for the analysis and visualisation of CLIP data. A computational pipeline based on hierarchical pre-mapping steps enables accurate quantification of non-coding RNAs from individual nucleotide resolution crosslinking and immunoprecipitation (iCLIP) datasets. Using the pipeline I describe novel tRNA binding for the DEAH-box helicase DDX3X and identify widespread binding of NSun2 and Trmt2A to pre-tRNAs. In collaboration with the lab of Prof. Folkert van Werven, I integrate m6A miCLIP with m6A-reader protein iCLIP data, alongside functional datasets in WT and methyltransferase deletion conditions in order to uncover the role of m6A in early budding yeast meiosis. Surprisingly, we find that the sole yeast m6A-binding protein, Pho92p, binds in both an m6A-dependent and an m6A-independent manner. m6A-dependent Pho92p binding partners are implicated in mRNA decay coupled to translation. Taken together I present powerful computational tools that will be of use to the wider community, alongside the interesting biological insights they have already enabled

    Discovering cancer-associated transcripts by RNA sequencing

    Full text link
    High-throughput sequencing of poly-adenylated RNA (RNA-Seq) in human cancers shows remarkable potential to identify uncharacterized aspects of tumor biology, including gene fusions with therapeutic significance and disease markers such as long non-coding RNA (lncRNA) species. However, the analysis of RNA-Seq data places unprecedented demands upon computational infrastructures and algorithms, requiring novel bioinformatics approaches. To meet these demands, we present two new open-source software packages - ChimeraScan and AssemblyLine - designed to detect gene fusion events and novel lncRNAs, respectively. RNA-Seq studies utilizing ChimeraScan led to discoveries of new families of recurrent gene fusions in breast cancers and solitary fibrous tumors. Further, ChimeraScan was one of the key components of the repertoire of computational tools utilized in data analysis for MI-ONCOSEQ, a clinical sequencing initiative to identify potentially informative and actionable mutations in cancer patients’ tumors. AssemblyLine, by contrast, reassembles RNA sequencing data into full-length transcripts ab initio. In head-to-head analyses AssemblyLine compared favorably to existing ab initio approaches and unveiled abundant novel lncRNAs, including antisense and intronic lncRNAs disregarded by previous studies. Moreover, we used AssemblyLine to define the prostate cancer transcriptome from a large patient cohort and discovered myriad lncRNAs, including 121 prostate cancer-associated transcripts (PCATs) that could potentially serve as novel disease markers. Functional studies of two PCATs - PCAT-1 and SChLAP1 - revealed cancer-promoting roles for these lncRNAs. PCAT1, a lncRNA expressed from chromosome 8q24, promotes cell proliferation and represses the tumor suppressor BRCA2. SChLAP1, located in a chromosome 2q31 ‘gene desert’, independently predicts poor patient outcomes, including metastasis and cancer-specific mortality. Mechanistically, SChLAP1 antagonizes the genome-wide localization and regulatory functions of the SWI/SNF chromatin-modifying complex. Collectively, this work demonstrates the utility of ChimeraScan and AssemblyLine as open-source bioinformatics tools. Our applications of ChimeraScan and AssemblyLine led to the discovery of new classes of recurrent and clinically informative gene fusions, and established a prominent role for lncRNAs in coordinating aggressive prostate cancer, respectively. We expect that the methods and findings described herein will establish a precedent for RNA-Seq-based studies in cancer biology and assist the research community at large in making similar discoveries.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/120814/1/mkiyer_1.pd

    Nanosafety: an evolving concept to bring the safest possible nanomaterials to society and environment

    Get PDF
    The use of nanomaterials has been increasing in recent times, and they are widely used in industries such as cosmetics, drugs, food, water treatment, and agriculture. The rapid development of new nanomaterials demands a set of approaches to evaluate the potential toxicity and risks related to them. In this regard, nanosafety has been using and adapting already existing methods (toxicological approach), but the unique characteristics of nanomaterials demand new approaches (nanotoxicology) to fully understand the potential toxicity, immunotoxicity, and (epi)genotoxicity. In addition, new technologies, such as organs-on-chips and sophisticated sensors, are under development and/or adaptation. All the information generated is used to develop new in silico approaches trying to predict the potential effects of newly developed materials. The overall evaluation of nanomaterials from their production to their final disposal chain is completed using the life cycle assessment (LCA), which is becoming an important element of nanosafety considering sustainability and environmental impact. In this review, we give an overview of all these elements of nanosafety.European Union’s H2020 project Sinfonia (N.857253). SbDToolBox, with reference NORTE-01-0145-FEDER-000047, supported by Norte Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fun
    • …
    corecore