8 research outputs found
ASPIC: a novel method to predict the exon-intron structure of a gene that is optimally compatible to a set of transcript sequences
BACKGROUND: Currently available methods to predict splice sites are mainly based on the independent and progressive alignment of transcript data (mostly ESTs) to the genomic sequence. Apart from often being computationally expensive, this approach is vulnerable to several problems – hence the need to develop novel strategies. RESULTS: We propose a method, based on a novel multiple genome-EST alignment algorithm, for the detection of splice sites. To avoid limitations of splice sites prediction (mainly, over-predictions) due to independent single EST alignments to the genomic sequence our approach performs a multiple alignment of transcript data to the genomic sequence based on the combined analysis of all available data. We recast the problem of predicting constitutive and alternative splicing as an optimization problem, where the optimal multiple transcript alignment minimizes the number of exons and hence of splice site observations. We have implemented a splice site predictor based on this algorithm in the software tool ASPIC (Alternative Splicing PredICtion). It is distinguished from other methods based on BLAST-like tools by the incorporation of entirely new ad hoc procedures for accurate and computationally efficient transcript alignment and adopts dynamic programming for the refinement of intron boundaries. ASPIC also provides the minimal set of non-mergeable transcript isoforms compatible with the detected splicing events. The ASPIC web resource is dynamically interconnected with the Ensembl and Unigene databases and also implements an upload facility. CONCLUSION: Extensive bench marking shows that ASPIC outperforms other existing methods in the detection of novel splicing isoforms and in the minimization of over-predictions. ASPIC also requires a lower computation time for processing a single gene and an EST cluster. The ASPIC web resource is available at
An approach to improved microbial eukaryotic genome annotation
Les nouvelles technologies de séquençage d’ADN ont accélérées la vitesse à laquelle les
données génomiques sont générées. Par contre, une fois séquencées et assemblées, un défi
continu est l'annotation structurelle précise de ces nouvelles séquences génomiques. Par le
séquençage et l'assemblage du transcriptome (RNA-Seq) du même organisme, la précision de
l'annotation génomique peut être améliorée, car les lectures de RNA-Seq et les transcrits
assemblés fournissent des informations précises sur la structure des gènes. Plusieurs pipelines
bio-informatiques actuelles incorporent des informations provenant du RNA-Seq ainsi que des
données de similarité des séquences protéiques, pour automatiser l'annotation structurelle d’un
génome de manière que la qualité se rapproche à celle de l'annotation par des experts. Les
pipelines suivent généralement un flux de travail similaire. D'abord, les régions répétitives sont
identifiées afin d'éviter de fausser les alignements de séquences et les prédictions de gènes.
Deuxièmement, une base de données est construite contenant les données expérimentales telles
que l’alignement des lectures de séquences, des transcrits et des protéines, ce qui informe les
prédictions de gènes basées sur les Modèles de Markov Cachés généralisés. La dernière étape
est de consolider les alignements de séquences et les prédictions de gènes dans un consensus de
haute qualité. Or, les pipelines existants sont complexes et donc susceptibles aux biais et aux
erreurs, ce qui peut empoisonner les prédictions de gènes et la construction de modèles
consensus. Nous avons développé une approche améliorée pour l'annotation des génomes
eucaryotes microbiens. Notre approche comprend deux aspects principaux. Le premier est axé
sur la création d'un ensemble d'évidences extrinsèques le plus complet et diversifié afin de mieux
informer les prédictions de gènes. Le deuxième porte sur la construction du consensus du modèle
de gènes en utilisant les évidences extrinsèques et les prédictions par MMC, tel que l'influence
de leurs biais potentiel soit réduite. La comparaison de notre nouvel outil avec trois pipelines
populaires démontre des gains significatifs de sensibilité et de spécificité des modèles de gènes,
de transcrits, d'exons et d'introns dans l’annotation structural de génomes d’eucaryotes
microbiens.New sequencing technologies have considerably accelerated the rate at which genomic data is
being generated. One ongoing challenge is the accurate structural annotation of those novel
genomes once sequenced and assembled, in particular if the organism does not have close
relatives with well-annotated genomes. Whole-transcriptome sequencing (RNA-Seq) and
assembly—both of which share similarities to whole-genome sequencing and assembly,
respectively—have been shown to dramatically increase the accuracy of gene annotation. Read
coverage, inferred splice junctions and assembled transcripts can provide valuable information
about gene structure. Several annotation pipelines have been developed to automate structural
annotation by incorporating information from RNA-Seq, as well as protein sequence similarity
data, with the goal of reaching the accuracy of an expert curator. Annotation pipelines follow a
similar workflow. The first step is to identify repetitive regions to prevent misinformed sequence
alignments and gene predictions. The next step is to construct a database of evidence from
experimental data such as RNA-Seq mapping and assembly, and protein sequence alignments,
which are used to inform the generalised Hidden Markov Models of gene prediction software.
The final step is to consolidate sequence alignments and gene predictions into a high-confidence
consensus set. Thus, automated pipelines are complex, and therefore susceptible to incomplete
and erroneous use of information, which can poison gene predictions and consensus model
building. Here, we present an improved approach to microbial eukaryotic genome annotation.
Its conception was based on identifying and mitigating potential sources of error and bias that
are present in available pipelines. Our approach has two main aspects. The first is to create a
more complete and diverse set of extrinsic evidence to better inform gene predictions. The
second is to use extrinsic evidence in tandem with predictions such that the influence of their
respective biases in the consensus gene models is reduced. We benchmarked our new tool
against three known pipelines, showing significant gains in gene, transcript, exon and intron
sensitivity and specificity in the genome annotation of microbial eukaryotes
Unusual Intron Conservation near Tissue-Regulated Exons Found by Splicing Microarrays
Alternative splicing contributes to both gene regulation and protein diversity. To discover broad relationships between regulation of alternative splicing and sequence conservation, we applied a systems approach, using oligonucleotide microarrays designed to capture splicing information across the mouse genome. In a set of 22 adult tissues, we observe differential expression of RNA containing at least two alternative splice junctions for about 40% of the 6,216 alternative events we could detect. Statistical comparisons identify 171 cassette exons whose inclusion or skipping is different in brain relative to other tissues and another 28 exons whose splicing is different in muscle. A subset of these exons is associated with unusual blocks of intron sequence whose conservation in vertebrates rivals that of protein-coding exons. By focusing on sets of exons with similar regulatory patterns, we have identified new sequence motifs implicated in brain and muscle splicing regulation. Of note is a motif that is strikingly similar to the branchpoint consensus but is located downstream of the 5′ splice site of exons included in muscle. Analysis of three paralogous membrane-associated guanylate kinase genes reveals that each contains a paralogous tissue-regulated exon with a similar tissue inclusion pattern. While the intron sequences flanking these exons remain highly conserved among mammalian orthologs, the paralogous flanking intron sequences have diverged considerably, suggesting unusually complex evolution of the regulation of alternative splicing in multigene families
NOVEL COMPUTATIONAL METHODS FOR TRANSCRIPT RECONSTRUCTION AND QUANTIFICATION USING RNA-SEQ DATA
The advent of RNA-seq technologies provides an unprecedented opportunity to precisely profile the mRNA transcriptome of a specific cell population. It helps reveal the characteristics of the cell under the particular condition such as a disease. It is now possible to discover mRNA transcripts not cataloged in existing database, in addition to assessing the identities and quantities of the known transcripts in a given sample or cell. However, the sequence reads obtained from an RNA-seq experiment is only a short fragment of the original transcript. How to recapitulate the mRNA transcriptome from short RNA-seq reads remains a challenging problem. We have proposed two methods directly addressing this challenge. First, we developed a novel method MultiSplice to accurately estimate the abundance of the well-annotated transcripts. Driven by the desire of detecting novel isoforms, a max-flow-min-cost algorithm named Astroid is designed for simultaneously discovering the presence and quantities of all possible transcripts in the transcriptome. We further extend an \emph{ab initio} pipeline of transcriptome analysis to large-scale dataset which may contain hundreds of samples. The effectiveness of proposed methods has been supported by a series of simulation studies, and their application on real datasets suggesting a promising opportunity in reconstructing mRNA transcriptome which is critical for revealing variations among cells (e.g. disease vs. normal)
Bioinformatics
This book is divided into different research areas relevant in Bioinformatics such as biological networks, next generation sequencing, high performance computing, molecular modeling, structural bioinformatics, molecular modeling and intelligent data analysis. Each book section introduces the basic concepts and then explains its application to problems of great relevance, so both novice and expert readers can benefit from the information and research works presented here
Ultrasensitive detection of toxocara canis excretory-secretory antigens by a nanobody electrochemical magnetosensor assay.
peer reviewedHuman Toxocariasis (HT) is a zoonotic disease caused by the migration
of the larval stage of the roundworm Toxocara canis in the human host.
Despite of being the most cosmopolitan helminthiasis worldwide, its
diagnosis is elusive. Currently, the detection of specific immunoglobulins
IgG against the Toxocara Excretory-Secretory Antigens (TES), combined
with clinical and epidemiological criteria is the only strategy to diagnose
HT. Cross-reactivity with other parasites and the inability to distinguish
between past and active infections are the main limitations of this
approach. Here, we present a sensitive and specific novel strategy to
detect and quantify TES, aiming to identify active cases of HT. High
specificity is achieved by making use of nanobodies (Nbs), recombinant
single variable domain antibodies obtained from camelids, that due to
their small molecular size (15kDa) can recognize hidden epitopes not
accessible to conventional antibodies. High sensitivity is attained by the
design of an electrochemical magnetosensor with an amperometric readout
with all components of the assay mixed in one single step. Through
this strategy, 10-fold higher sensitivity than a conventional sandwich
ELISA was achieved. The assay reached a limit of detection of 2 and15
pg/ml in PBST20 0.05% or serum, spiked with TES, respectively. These
limits of detection are sufficient to detect clinically relevant toxocaral
infections. Furthermore, our nanobodies showed no cross-reactivity
with antigens from Ascaris lumbricoides or Ascaris suum. This is to our
knowledge, the most sensitive method to detect and quantify TES so far,
and has great potential to significantly improve diagnosis of HT. Moreover,
the characteristics of our electrochemical assay are promising for the
development of point of care diagnostic systems using nanobodies as a
versatile and innovative alternative to antibodies. The next step will be the
validation of the assay in clinical and epidemiological contexts