Search CORE

138 research outputs found

A computational study of off-target effects of RNA interference

Author: Adema Coen M.
Lane Terran
Qiu Shibin
Publication venue: Oxford University Press
Publication date: 01/01/2005
Field of study

RNA interference (RNAi) is an intracellular mechanism for post-transcriptional gene silencing that is frequently used to study gene function. RNAi is initiated by short interfering RNA (siRNA) of ∼21 nt in length, either generated from the double-stranded RNA (dsRNA) by using the enzyme Dicer or introduced experimentally. Following association with an RNAi silencing complex, siRNA targets mRNA transcripts that have sequence identity for destruction. A phenotype resulting from this knockdown of expression may inform about the function of the targeted gene. However, ‘off-target effects’ compromise the specificity of RNAi if sequence identity between siRNA and random mRNA transcripts causes RNAi to knockdown expression of non-targeted genes. The complete off-target effects must be investigated systematically on each gene in a genome by adjusting a group of parameters, which is too expensive to conduct experimentally and motivates a study in silico. This computational study examined the potential for off-target effects of RNAi, employing the genome and transcriptome sequence data of Homo sapiens, Caenorhabditis elegans and Schizosaccharomyces pombe. The chance for RNAi off-target effects proved considerable, ranging from 5 to 80% for each of the organisms, when using as parameter the exact identity between any possible siRNA sequences (arbitrary length ranging from 17 to 28 nt) derived from a dsRNA (range 100–400 nt) representing the coding sequences of target genes and all other siRNAs within the genome. Remarkably, high-sequence specificity and low probability for off-target reactivity were optimally balanced for siRNA of 21 nt, the length observed mostly in vivo. The chance for off-target RNAi increased (although not always significantly) with greater length of the initial dsRNA sequence, inclusion into the analysis of available untranslated region sequences and allowing for mismatches between siRNA and target sequences. siRNA sequences from within 100 nt of the 5′ termini of coding sequences had low chances for off-target reactivity. This may be owing to coding constraints for signal peptide-encoding regions of genes relative to regions that encode for mature proteins. Off-target distribution varied along the chromosomes of C.elegans, apparently owing to the use of more unique sequences in gene-dense regions. Finally, biological and thermodynamical descriptors of effective siRNA reduced the number of potential siRNAs compared with those identified by sequence identity alone, but off-target RNAi remained likely, with an off-target error rate of ∼10%. These results also suggest a direction for future in vivo studies that could both help in calibrating true off-target rates in living organisms and also in contributing evidence toward the debate of whether siRNA efficacy is correlated with, or independent of, the target molecule. In summary, off-target effects present a real but not prohibitive concern that should be considered for RNAi experiments

CiteSeerX

Crossref

PubMed Central

An efficient algorithm for systematic analysis of nucleotide strings suitable for siRNA design

Author: A Apostolico
A Verdel
AC Hsieh
AL Jackson
AL Jackson
AM Chalk
Ancha Baranova
CF Hung
E Ukkonen
EM McCreight
F Fernandes
F Tilesi
Ganiraju Manyam
IT Li
J Na
Jonathan Bode
K Ui-Tei
M Scherr
Maria Emelianenko
MH Schulz
P Saetrom
P Svoboda
P Weiner
PB Hajeri
PC Scacheri
R Giegerich
SA Manavski
T Alsheddi
W Cui
X Dai
Y Naito
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The "off-target" silencing effect hinders the development of siRNA-based therapeutic and research applications. Existing solutions for finding possible locations of siRNA seats within a large database of genes are either too slow, miss a portion of the targets, or are simply not designed to handle a very large number of queries. We propose a new approach that reduces the computational time as compared to existing techniques. Findings The proposed method employs tree-based storage in a form of a modified truncated suffix tree to sort all possible short string substrings within given set of strings (i.e. transcriptome). Using the new algorithm, we pre-computed a list of the best siRNA locations within each human gene ("siRNA seats"). siRNAs designed to reside within siRNA seats are less likely to hybridize off-target. These siRNA seats could be used as an input for the traditional "set-of-rules" type of siRNA designing software. The list of siRNA seats is available through a publicly available database located at <url>http://web.cos.gmu.edu/~gmanyam/siRNA_db/search.php</url> Conclusions In attempt to perform top-down prediction of the human siRNA with minimized off-target hybridization, we developed an efficient algorithm that employs suffix tree based storage of the substrings. Applications of this approach are not limited to optimal siRNA design, but can also be useful for other tasks involving selection of the characteristic strings specific to individual genes. These strings could then be used as siRNA seats, as specific probes for gene expression studies by oligonucleotide-based microarrays, for the design of molecular beacon probes for Real-Time PCR and, generally, any type of PCR primers.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Predicting the Fission Yeast Protein Interaction Network

Author: Beyer Andreas
Beyer Andreas
Bähler Jürg
Bähler Jürg
Gould Kathleen
Gould Kathleen
McLean Janel R.
McLean Janel R.
Pancaldi Vera
Pancaldi Vera
Převorovský Martin
Převorovský Martin
Rallis C.
Rallis C.
Saraç Ömer S.
Saraç Ömer S.
Publication venue: Genetics Society of America
Publication date: 01/01/2012
Field of study

A systems-level understanding of biological processes and information flow requires the mapping of cellular component interactions, among which protein–protein interactions are particularly important. Fission yeast (Schizosaccharomyces pombe) is a valuable model organism for which no systematic protein-interaction data are available. We exploited gene and protein properties, global genome regulation datasets, and conservation of interactions between budding and fission yeast to predict fission yeast protein interactions in silico. We have extensively tested our method in three ways: first, by predicting with 70–80% accuracy a selected high-confidence test set; second, by recapitulating interactions between members of the well-characterized SAGA co-activator complex; and third, by verifying predicted interactions of the Cbf11 transcription factor using mass spectrometry of TAP-purified protein complexes. Given the importance of the pathway in cell physiology and human disease, we explore the predicted sub-networks centered on the Tor1/2 kinases. Moreover, we predict the histidine kinases Mak1/2/3 to be vital hubs in the fission yeast stress response network, and we suggest interactors of argonaute 1, the principal component of the siRNA-mediated gene silencing pathway, lost in budding yeast but preserved in S. pombe. Of the new high-quality interactions that were discovered after we started this work, 73% were found in our predictions. Even though any predicted interactome is imperfect, the protein network presented here can provide a valuable basis to explore biological processes and to guide wet-lab experiments in fission yeast and beyond. Our predicted protein interactions are freely available through PInt, an online resource on our website (www.bahlerlab.info/PInt)

UEL Research Repository at University of East London

University of Essex Research Repository

Crossref

PubMed Central

UCL Discovery

Prediction of Transposons in DNA

Author: Černohub Jan
Publication venue: Vysoké učení technické v Brně. Fakulta informačních technologií
Publication date: 01/01/2014
Field of study

Cílem práce je seznámení se s problematikou uchovávání informace v DNA, provést rešerši na téma transpozony, bioinformatické nástroje a algoritmy, které jsou používány k jejich detekci v nasekvenovaných genomech a vytvořit tak stručný úvod do obsáhle problematiky, včetně jejího zasazení do kontextu současně probíhajícího výzkumu v dané oblasti. Na základě přehledu stávajících algoritmů a nástrojů pro detekci transpozonů je navržen a implementován nástroj pro hledání tzv. LTR transpozonů.The paper offers brief introduction into DNA with focus on transposable elements also know as transposons and how do they relate to the ongoing research into biology - seen mainly from the bioinformatics point of view. The goal is to research past and concurrent tools and algorithms that were developed for transposon detection in sequenced genomes. Based on the surveyed designs a proposal for long terminal repeat transposons oriented tool is created and implemented.

Digital library of Brno University of Technology

National Repository of Grey Literature

Parallelization of dynamic programming recurrences in computational biology

Author: Jacob Arpith
Publication venue: Washington University Open Scholarship
Publication date: 01/01/2010
Field of study

The rapid growth of biosequence databases over the last decade has led to a performance bottleneck in the applications analyzing them. In particular, over the last five years DNA sequencing capacity of next-generation sequencers has been doubling every six months as costs have plummeted. The data produced by these sequencers is overwhelming traditional compute systems. We believe that in the future compute performance, not sequencing, will become the bottleneck in advancing genome science. In this work, we investigate novel computing platforms to accelerate dynamic programming algorithms, which are popular in bioinformatics workloads. We study algorithm-specific hardware architectures that exploit fine-grained parallelism in dynamic programming kernels using field-programmable gate arrays: FPGAs). We advocate a high-level synthesis approach, using the recurrence equation abstraction to represent dynamic programming and polyhedral analysis to exploit parallelism. We suggest a novel technique within the polyhedral model to optimize for throughput by pipelining independent computations on an array. This design technique improves on the state of the art, which builds latency-optimal arrays. We also suggest a method to dynamically switch between a family of designs using FPGA reconfiguration to achieve a significant performance boost. We have used polyhedral methods to parallelize the Nussinov RNA folding algorithm to build a family of accelerators that can trade resources for parallelism and are between 15-130x faster than a modern dual core CPU implementation. A Zuker RNA folding accelerator we built on a single workstation with four Xilinx Virtex 4 FPGAs outperforms 198 3 GHz Intel Core 2 Duo processors. Furthermore, our design running on a single FPGA is an order of magnitude faster than competing implementations on similar-generation FPGAs and graphics processors. Our work is a step toward the goal of automated synthesis of hardware accelerators for dynamic programming algorithms

Washington University St. Louis: Open Scholarship

Bioinformatics

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

This book is divided into different research areas relevant in Bioinformatics such as biological networks, next generation sequencing, high performance computing, molecular modeling, structural bioinformatics, molecular modeling and intelligent data analysis. Each book section introduces the basic concepts and then explains its application to problems of great relevance, so both novice and expert readers can benefit from the information and research works presented here

Directory of Open Access Books (DOAB)

Recommended from our members

Virus discovery using current and novel methods

Author: Mühlemann Barbara Franziska
Publication venue: University of Cambridge
Publication date: 06/04/2020
Field of study

Next Generation Sequencing (NGS) technology allows researchers to sequence genetic material from a wide range of sources, including patient and environmental samples, and ancient remains. The recovery of viruses from such datasets can provide insights into the diversity and evolution of both novel and already known viruses. This thesis focuses on two aspects of virus discovery in NGS datasets. In the first part of this thesis, I present ancient viral sequences from hepatitis B virus, human parvovirus B19, and variola virus. The sequences were recovered from NGS datasets from individuals living in Eurasia between ∼150 to ∼31,630 years ago, using standard sequence matching tools. The data show the past existence of viruses similar to variants circulating today. The sequences reveal a complexity of virus evolution that is not evident when considering modern sequences alone, including revised substitution rates and most recent common ancestor dates, as well as geographic movement and extinction of strains. The identification of viral sequences in NGS datasets relies heavily on sequence-based matching of unknown sequences to a database of known sequences. Comparisons are usually done at the nucleotide or amino acid level. However, those methods only work well on sequences closely related to those already present in the database. With the aim of identifying more diverged viral sequences, in the second part of this thesis, I present an algorithm to compare sequences based on predicted structural features, such as secondary structures and conserved amino acids. The algorithm is modelled after the music-matching algorithm ‘Shazam’. While initial results of the algorithm are somewhat encouraging, problems remain, in particular with the identification of adequate structural features. Identifying highly diverged viral sequences is thus still a challenging problem, hopefully to be solved in the future

Apollo (Cambridge)

Safety quantification in gene editing experiments using machine learning on rationally designed feature spaces

Author: Störtz Florian Michael
Publication venue
Publication date: 16/02/2024
Field of study

With ongoing development of the CRISPR/Cas programmable nuclease system, applications in the area of \textit{in vivo} therapeutic gene editing are increasingly within reach. However, non-negligible off-target effects remain a major concern for clinical applications. Even though a multitude of off-target cleavage datasets have been published, a comprehensive, transparent overview tool has not yet been established. The first part of this thesis presents the creation of crisprSQL (http://www.crisprsql.com), a large, diverse, interactive and bioinformatically enhanced collection of CRISPR/Cas9 off-target cleavage studies aimed at enriching the fields of cleavage profiling, gene editing safety analysis and transcriptomics. Having established this data source, we use it to train novel deep learning algorithms and explore feature encodings for off-target prediction, systematically sampling the resulting model space in order to find optimal models and inform future modelling efforts. We lay emphasis on physically informed features which capture the biological environment of the cleavage site, hence terming our approach piCRISPR. We find that our best-performing model highlights the importance of sequence context and chromatin accessibility for cleavage prediction and compares favourably with state-of-the-art prediction performance. We further show that our novel, environmentally sensitive features are crucial to accurate prediction on sequence-identical locus pairs, making them highly relevant for clinical guide design. We then turn our attention to the cell-intrinsic repair mechanisms that follow CRISPR/Cas-induced cleavage and provide a prediction algorithm for the outcome genotype distribution based on thermodynamic features of the DNA repair process. In a pioneering approach, we utilise structural calculations for the generation of these features and show that this novel approach surpasses published outcome prediction algorithms within our testing regime. Through interpretation of the trained model, we elucidate the thermodynamic factors driving DNA repair and provide a computational tool that allows experts to assess the severity of the genotypic changes predicted for a given edit. Together, these efforts provide a comprehensive, one-stop computational source to assess and improve CRISPR/Cas9 gene editing safety

Oxford University Research Archive