Search CORE

92 research outputs found

miROrtho: computational survey of microRNA genes

Author: Gerlach Daniel
Kriventseva Evgenia V.
Rahman Nazim
Vejnar Charles E.
Zdobnov Evgeny M.
Publication venue
Publication date: 02/08/2017
Field of study

MicroRNAs (miRNAs) are short, non-protein coding RNAs that direct the widespread phenomenon of post-transcriptional regulation of metazoan genes. The mature ∼22-nt long RNA molecules are processed from genome-encoded stem-loop structured precursor genes. Hundreds of such genes have been experimentally validated in vertebrate genomes, yet their discovery remains challenging, and substantially higher numbers have been estimated. The miROrtho database (http://cegg.unige.ch/mirortho) presents the results of a comprehensive computational survey of miRNA gene candidates across the majority of sequenced metazoan genomes. We designed and applied a three-tier analysis pipeline: (i) an SVM-based ab initio screen for potent hairpins, plus homologs of known miRNAs, (ii) an orthology delineation procedure and (iii) an SVM-based classifier of the ortholog multiple sequence alignments. The web interface provides direct access to putative miRNA annotations, ortholog multiple alignments, RNA secondary structure conservation, and sequence data. The miROrtho data are conceptually complementary to the miRBase catalog of experimentally verified miRNA sequences, providing a consistent comparative genomics perspective as well as identifying many novel miRNA genes with strong evolutionary suppor

RERO DOC Digital Library

OrthoDB: the hierarchical catalog of eukaryotic orthologs

Author: Altschul
Castresana
Chen
Dayhoff
Duret
E. M. Zdobnov
E. V. Kriventseva
Edgar
Fitch
Guindon
Henikoff
Jones
Koonin
Li
Merkeev
Merkeev
N. Rahman
O. Espinosa
Sonnhammer
Tatusov
van der Heijden
Waterhouse
Zdobnov
Zdobnov
Publication venue: Oxford University Press
Publication date: 01/01/2008
Field of study

The concept of orthology is widely used to relate genes across different species using comparative genomics, and it provides the basis for inferring gene function. Here we present the web accessible OrthoDB database that catalogs groups of orthologous genes in a hierarchical manner, at each radiation of the species phylogeny, from more general groups to more fine-grained delineations between closely related species. We used a COG-like and Inparanoid-like ortholog delineation procedure on the basis of all-against-all Smith-Waterman sequence comparisons to analyze 58 eukaryotic genomes, focusing on vertebrates, insects and fungi to facilitate further comparative studies. The database is freely available at http://cegg.unige.ch/orthod

Crossref

PubMed Central

Archive ouverte UNIGE

miROrtho: computational survey of microRNA genes

Author: Aravin
Barbarotto
Bartel
Bentwich
Berezikov
Boffelli
Brennecke
C. E. Vejnar
Calin
D. Gerlach
Do
Du
E. M. Zdobnov
E. V. Kriventseva
Edgar
Grad
Griffiths-Jones
Hofacker
Hofacker
Kim
Lai
Lewis
Lim
Miranda
N. Rahman
Nam
Saebo
Sewer
Stark
Weaver
Xie
Xue
Zhang
Zhang
Publication venue: Oxford University Press
Publication date: 01/01/2008
Field of study

CiteSeerX

Crossref

PubMed Central

Archive ouverte UNIGE

Genome-Wide Comparative Gene Family Classification

Author: A Barriere
A Heger
A Jaccard
A Kelil
A Krause
A Krause
A Paccanaro
AJ Enright
AJ Enright
AJ Vilella
C-Y Chen
CF Higgins
CH Wu
Christian Frech
CP Ponting
D Lee
E Bolten
E Jacoby
ER Troemel
ES Lander
EV Kriventseva
EV Kriventseva
EV Kriventseva
F Abascal
F Tekaia
G Yona
H Li
HM Robertson
HM Robertson
HM Robertson
HM Robertson
IV Tetko
J Huerta-Cepas
J Schultz
JA Sheps
JC Venter
JD Thompson
JH Thomas
JH Thomas
JH Thomas
JP Demuth
K Tamura
LD Stein
MO Dayhoff
N Chen
N Hulo
N Kaplan
Nansheng Chen
P Pipenbacher
P Sperisen
PK Wall
Q Ma
RD Finn
Robert DeSalle
S Aftab
S Kim
S Nakanishi
SA Rahman
T Meinel
T Wittkop
TJ Harlow
Y Chen
Y Loewenstein
Z Zhao
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Correct classification of genes into gene families is important for understanding gene function and evolution. Although gene families of many species have been resolved both computationally and experimentally with high accuracy, gene family classification in most newly sequenced genomes has not been done with the same high standard. This project has been designed to develop a strategy to effectively and accurately classify gene families across genomes. We first examine and compare the performance of computer programs developed for automated gene family classification. We demonstrate that some programs, including the hierarchical average-linkage clustering algorithm MC-UPGMA and the popular Markov clustering algorithm TRIBE-MCL, can reconstruct manual curation of gene families accurately. However, their performance is highly sensitive to parameter setting, i.e. different gene families require different program parameters for correct resolution. To circumvent the problem of parameterization, we have developed a comparative strategy for gene family classification. This strategy takes advantage of existing curated gene families of reference species to find suitable parameters for classifying genes in related genomes. To demonstrate the effectiveness of this novel strategy, we use TRIBE-MCL to classify chemosensory and ABC transporter gene families in C. elegans and its four sister species. We conclude that fully automated programs can establish biologically accurate gene families if parameterized accordingly. Comparative gene family classification finds optimal parameters automatically, thus allowing rapid insights into gene families of newly sequenced species

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Simon Fraser University Institutional Repository

Structural implication of splicing stochastics

Author: Altschul
Arinobu
Chen
Deshpande
E. Melamud
Florea
Graveley
Hamosh
Hillman
Homma
Imanishi
J. Moult
Johnson
Kan
Kozak
Kriventseva
Lamba
Lander
Liu
Maniatis
Matthews
Miki
Mironov
Modrek
Modrek
Nagy
Nurtdinov
Pagon
Pan
Pan
Pan
Penalva
REHWINKEL
Resch
Resch
Sorek
Stenson
Veldhoen
Wang
Wheeler
Wheeler
Winn
Winter
Wittmann
Wu
Xing
Xu
Publication venue: Oxford University Press
Publication date
Field of study

Even though nearly every human gene has at least one alternative splice form, very little is so far known about the structure and function of resulting protein products. It is becoming increasingly clear that a significant fraction of all isoforms are products of noisy selection of splice sites and thus contribute little to actual functional diversity, and may potentially be deleterious. In this study, we examine the impact of alternative splicing on protein sequence and structure in three datasets: alternative splicing events conserved across multiple species, alternative splicing events in genes that are strongly linked to disease and all observed alternative splicing events. We find that the vast majority of all alternative isoforms result in unstable protein conformations. In contrast to that, the small subset of isoforms conserved across species tends to maintain protein structural integrity to a greater extent. Alternative splicing in disease-associated genes produces unstable structures just as frequently as all other genes, indicating that selection to reduce the effects of alternative splicing on this set is not especially pronounced. Overall, the properties of alternative spliced proteins are consistent with the outcome of noisy selection of splice sites by splicing machinery

Crossref

PubMed Central

Coding potential of the products of alternative splicing in human

Author: A Mortazavi
Anna Tramontano
B Boeckmann
Domenico Raimondo
E Melamud
ET Wang
EV Kriventseva
EW Deutsch
F Beaussart
F Birzele
Fabrizio Ferrè
Guido Leoni
HM Berman
J Stetefeld
J Söding
L Cavallo
Loredana Le Pera
M Floris
M Sultan
ML Tress
ML Tress
N Eswar
N Pattabiraman
NR Voss
P Blakeley
P Mallick
PJ Gardina
Q Pan
Robert D. Finn
S Tanner
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Background: Analysis of the human genome has revealed that as much as an order of magnitude more of the genomic sequence is transcribed than accounted for by the predicted and characterized genes. A number of these transcripts are alternatively spliced forms of known protein coding genes; however, it is becoming clear that many of them do not necessarily correspond to a functional protein. Results: In this study we analyze alternative splicing isoforms of human gene products that are unambiguously identified by mass spectrometry and compare their properties with those of isoforms of the same genes for which no peptide was found in publicly available mass spectrometry datasets. We analyze them in detail for the presence of uninterrupted functional domains, active sites as well as the plausibility of their predicted structure. We report how well each of these strategies and their combination can correctly identify translated isoforms and derive a lower limit for their specificity, that is, their ability to correctly identify non-translated products. Conclusions: The most effective strategy for correctly identifying translated products relies on the conservation of active sites, but it can only be applied to a small fraction of isoforms, while a reasonably high coverage, sensitivity and specificity can be achieved by analyzing the presence of non-truncated functional domains. Combining the latter with an assessment of the plausibility of the modeled structure of the isoform increases both coverage and specificity with a moderate cost in terms of sensitivity

Crossref

Springer - Publisher Connector

PubMed Central

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Archivio della ricerca- Università di Roma La Sapienza

Partitioning clustering algorithms for protein sequence data sets

Author: A Enright
A Enright
A Herger
A Krause
DW Mount
E Bolten
E Kriventseva
F Can
G Yona
H Cathy
H Spath
J Hartigan
J Shi
KJ Anil
L Kaufman
Mohamed Limam
N Essoussi
Nadia Essoussi
O Sasson
P Cabena
P Clote
P Pipenbacher
P Sperisen
R Ng
R Tatusov
RC Dubes
S Altschul
S Henikoff
S Schneckener
S Van Dongen
SB Needleman
SE Brenner
Sondes Fayech
TF Smith
UM Fayyad
V Faber
V Guralnik
WR Pearson
Z Wu
Publication venue: BioMed Central
Publication date: 01/04/2009
Field of study

Abstract Background Genome-sequencing projects are currently producing an enormous amount of new sequences and cause the rapid increasing of protein sequence databases. The unsupervised classification of these data into functional groups or families, clustering, has become one of the principal research objectives in structural and functional genomics. Computer programs to automatically and accurately classify sequences into families become a necessity. A significant number of methods have addressed the clustering of protein sequences and most of them can be categorized in three major groups: hierarchical, graph-based and partitioning methods. Among the various sequence clustering methods in literature, hierarchical and graph-based approaches have been widely used. Although partitioning clustering techniques are extremely used in other fields, few applications have been found in the field of protein sequence clustering. It is not fully demonstrated if partitioning methods can be applied to protein sequence data and if these methods can be efficient compared to the published clustering methods. Methods We developed four partitioning clustering approaches using Smith-Waterman local-alignment algorithm to determine pair-wise similarities of sequences. Four different sets of protein sequences were used as evaluation data sets for the proposed methods. Results We show that these methods outperform several other published clustering methods in terms of correctly predicting a classifier and especially in terms of the correctness of the provided prediction. The software is available to academic users from the authors upon request.</p

Crossref

Directory of Open Access Journals

PubMed Central

Insights into corn genes derived from large-scale cDNA sequencing

Author: A Beletskii
A Grigoriev
B Ewing
BB Wang
CT Bull
DA Petrov
DA Samarsky
DJ Galas
EV Kriventseva
G Haberer
GE Crooks
H Walia
HC Wang
Hongyu Zhang
I Tirosh
J Jia
JD Kittle
John Bouck
Kenneth A. Feldmann
M Gidekel
M Jain
M Strathmann
Maxim E. Troukhan
MB Soares
Nickolai N. Alexandrov
NN Alexandrov
QC Cronk
Richard B. Flavell
S Fujimori
SS Merchant
Stanislav Freidin
Tatiana V. Tatarinova
Timothy J. Swaller
TZ Berardini
Vyacheslav V. Brover
WH Campbell
Yu-Ping Lu
Publication venue: Springer Netherlands
Publication date: 01/01/2008
Field of study

We present a large portion of the transcriptome of Zea mays, including ESTs representing 484,032 cDNA clones from 53 libraries and 36,565 fully sequenced cDNA clones, out of which 31,552 clones are non-redundant. These and other previously sequenced transcripts have been aligned with available genome sequences and have provided new insights into the characteristics of gene structures and promoters within this major crop species. We found that although the average number of introns per gene is about the same in corn and Arabidopsis, corn genes have more alternatively spliced isoforms. Examination of the nucleotide composition of coding regions reveals that corn genes, as well as genes of other Poaceae (Grass family), can be divided into two classes according to the GC content at the third position in the amino acid encoding codons. Many of the transcripts that have lower GC content at the third position have dicot homologs but the high GC content transcripts tend to be more specific to the grasses. The high GC content class is also enriched with intronless genes. Together this suggests that an identifiable class of genes in plants is associated with the Poaceae divergence. Furthermore, because many of these genes appear to be derived from ancestral genes that do not contain introns, this evolutionary divergence may be the result of horizontal gene transfer from species not only with different codon usage but possibly that did not have introns, perhaps outside of the plant kingdom. By comparing the cDNAs described herein with the non-redundant set of corn mRNAs in GenBank, we estimate that there are about 50,000 different protein coding genes in Zea. All of the sequence data from this study have been submitted to DDBJ/GenBank/EMBL under accession numbers EU940701–EU977132 (FLI cDNA) and FK944382-FL482108 (EST)

Crossref

Springer - Publisher Connector

PubMed Central

clusterMaker: a multi-algorithm clustering plugin for Cytoscape

Abstract Background In the post-genomic era, the rapid increase in high-throughput data calls for computational tools capable of integrating data of diverse types and facilitating recognition of biologically meaningful patterns within them. For example, protein-protein interaction data sets have been clustered to identify stable complexes, but scientists lack easily accessible tools to facilitate combined analyses of multiple data sets from different types of experiments. Here we present <it>clusterMaker</it>, a Cytoscape plugin that implements several clustering algorithms and provides network, dendrogram, and heat map views of the results. The Cytoscape network is linked to all of the other views, so that a selection in one is immediately reflected in the others. <it>clusterMaker </it>is the first Cytoscape plugin to implement such a wide variety of clustering algorithms and visualizations, including the only implementations of hierarchical clustering, dendrogram plus heat map visualization (tree view), k-means, k-medoid, SCPS, AutoSOME, and native (Java) MCL. Results Results are presented in the form of three scenarios of use: analysis of protein expression data using a recently published mouse interactome and a mouse microarray data set of nearly one hundred diverse cell/tissue types; the identification of protein complexes in the yeast <it>Saccharomyces cerevisiae</it>; and the cluster analysis of the vicinal oxygen chelate (VOC) enzyme superfamily. For scenario one, we explore functionally enriched mouse interactomes specific to particular cellular phenotypes and apply fuzzy clustering. For scenario two, we explore the prefoldin complex in detail using both physical and genetic interaction clusters. For scenario three, we explore the possible annotation of a protein as a methylmalonyl-CoA epimerase within the VOC superfamily. Cytoscape session files for all three scenarios are provided in the Additional Files section. Conclusions The Cytoscape plugin <it>clusterMaker </it>provides a number of clustering algorithms and visualizations that can be used independently or in combination for analysis and visualization of biological data sets, and for confirming or generating hypotheses about biological function. Several of these visualizations and algorithms are only available to Cytoscape users through the <it>clusterMaker </it>plugin. <it>clusterMaker </it>is available via the Cytoscape plugin manager.</p

University of Toronto Research Repository

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

MPG.PuRe

Deep Blue Documents at the University of Michigan

The effects of multiple features of alternatively spliced exons on the K(A)/K(S )ratio test

Author: A Nekrutenko
A Nekrutenko
B Modrek
C Lee
DL Philipps
E Quevillon
EV Kriventseva
F Wen
FC Chen
FC Chen
Feng-Chi Chen
G Yeo
GW Yeo
J Wang
K Iida
L Cartegni
L Cartegni
LC Filip
LD Hurst
M Karnaugh
MS Cline
NJ Mulder
R Sorek
R Sorek
S Stamm
SM Berget
TA Thanaraj
Trees-Juen Chuang
U Ohler
WG Fairbrother
WG Fairbrother
WG Fairbrother
XH Zhang
XH Zhang
Y Xing
Y Xing
Y Xing
Y Xing
Z Wang
Z Yang
Z Yang
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: The evolution of alternatively spliced exons (ASEs) is of primary interest because these exons are suggested to be a major source of functional diversity of proteins. Many exon features have been suggested to affect the evolution of ASEs. However, previous studies have relied on the K(A)/K(S )ratio test without taking into consideration information sufficiency (i.e., exon length > 75 bp, cross-species divergence > 5%) of the studied exons, leading to potentially biased interpretations. Furthermore, which exon feature dominates the results of the K(A)/K(S )ratio test and whether multiple exon features have additive effects have remained unexplored. RESULTS: In this study, we collect two different datasets for analysis – the ASE dataset (which includes lineage-specific ASEs and conserved ASEs) and the ACE dataset (which includes only conserved ASEs). We first show that information sufficiency can significantly affect the interpretation of relationship between exons features and the K(A)/K(S )ratio test results. After discarding exons with insufficient information, we use a Boolean method to analyze the relationship between test results and four exon features (namely length, protein domain overlapping, inclusion level, and exonic splicing enhancer (ESE) frequency) for the ASE dataset. We demonstrate that length and protein domain overlapping are dominant factors, and they have similar impacts on test results of ASEs. In addition, despite the weak impacts of inclusion level and ESE motif frequency when considered individually, combination of these two factors still have minor additive effects on test results. However, the ACE dataset shows a slightly different result in that inclusion level has a marginally significant effect on test results. Lineage-specific ASEs may have contributed to the difference. Overall, in both ASEs and ACEs, protein domain overlapping is the most dominant exon feature while ESE frequency is the weakest one in affecting test results. CONCLUSION: The proposed method can easily find additive effects of individual or multiple factors on the K(A)/K(S )ratio test results of exons. Therefore, the system can analyze complex conditions in evolution where multiple features are involved. More factors can also be added into the system to extend the scope of evolutionary analysis of exons. In addition, our method may be useful when orthologous exons can not be found for the K(A)/K(S )ratio test

Crossref

Springer - Publisher Connector

National Health Research Institues

Directory of Open Access Journals

PubMed Central