Search CORE

19 research outputs found

Applying negative rule mining to improve genome annotation

Author: Artamonova Irena I
Frishman Dmitrij
Frishman Goar
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Unsupervised annotation of proteins by software pipelines suffers from very high error rates. Spurious functional assignments are usually caused by unwarranted homology-based transfer of information from existing database entries to the new target sequences. We have previously demonstrated that data mining in large sequence annotation databanks can help identify annotation items that are strongly associated with each other, and that exceptions from strong positive association rules often point to potential annotation errors. Here we investigate the applicability of negative association rule mining to revealing erroneously assigned annotation items. Results Almost all exceptions from strong negative association rules are connected to at least one wrong attribute in the feature combination making up the rule. The fraction of annotation features flagged by this approach as suspicious is strongly enriched in errors and constitutes about 0.6% of the whole body of the similarity-transferred annotation in the PEDANT genome database. Positive rule mining does not identify two thirds of these errors. The approach based on exceptions from negative rules is much more specific than positive rule mining, but its coverage is significantly lower. Conclusion Mining of both negative and positive association rules is a potent tool for finding significant trends in protein annotation and flagging doubtful features for further inspection.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

PuSH

Asymmetric and non-uniform evolution of recently duplicated human genes

Author: Artamonova Irena I
Gelfand Mikhail S
Panchin Alexander Y
Ramensky Vasily E
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Gene duplications are a source of new genes and protein functions. The innovative role of duplication events makes families of paralogous genes an interesting target for studies in evolutionary biology. Here we study global trends in the evolution of human genes that resulted from recent duplications. Results The pressure of negative selection is weaker during a short time immediately after a duplication event. Roughly one fifth of genes in paralogous gene families are evolving asymmetrically: one of the proteins encoded by two closest paralogs accumulates amino acid substitutions significantly faster than its partner. This asymmetry cannot be explained by differences in gene expression levels. In asymmetric gene pairs the number of deleterious mutations is increased in one copy, while decreased in the other copy as compared to genes constituting non-asymmetrically evolving pairs. The asymmetry in the rate of synonymous substitutions is much weaker and not significant. Conclusions The increase of negative selection pressure over time after a duplication event seems to be a major trend in the evolution of human paralogous gene families. The observed asymmetry in the evolution of paralogous genes shows that in many cases one of two gene copies remains practically unchanged, while the other accumulates functional mutations. This supports the hypothesis that slowly evolving gene copies preserve their original functions, while fast evolving copies obtain new specificities or functions. Reviewers This article was reviewed by Dr. Igor Rogozin (nominated by Dr. Arcady Mushegian), Dr. Fyodor Kondrashov, and Dr. Sergei Maslov.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Comparative analysis of CRISPR cassettes from the human gut metagenomic contigs

Author: Anna A Gogleva
Irena I Artamonova
Mikhail S Gelfand
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Crossref

Springer - Publisher Connector

PEDANT genome database: 10 years online

Author: Artamonova Irena I.
Frishman Dmitrij
Heumann Klaus
Mewes Hans-Werner
Riley M. Louise
Schmidt Thorsten
Volz Andreas
Wagner Christian
Publication venue: Oxford University Press
Publication date: 05/12/2006
Field of study

The PEDANT genome database provides exhaustive annotation of 468 genomes by a broad set of bioinformatics algorithms. We describe recent developments of the PEDANT Web server. The all-new Graphical User Interface (GUI) implemented in Java™ allows for more efficient navigation of the genome data, extended search capabilities, user customization and export facilities. The DNA and Protein viewers have been made highly dynamic and customizable. We also provide Web Services to access the entire body of PEDANT data programmatically. Finally, we report on the application of association rule mining for automatic detection of potential annotation errors. PEDANT is freely accessible to academic users at

Comparative Genomics and Evolution of Alternative Splicing: The Pessimists' Science

Author: Irena I. Artamonova
Mikhail S. Gelfand
Publication venue: 'American Chemical Society (ACS)'
Publication date
Field of study

Crossref

Evolutionary Dynamics of Clustered Irregularly Interspaced Short Palindromic Repeat Systems in the Ocean Metagenome ▿

Author: Artamonova Irena I.
Gelfand Mikhail S.
Sorokin Valery A.
Publication venue: American Society for Microbiology (ASM)
Publication date: 01/01/2009
Field of study

Clustered regularly interspaced short palindromic repeats (CRISPRs) form a recently characterized type of prokaryotic antiphage defense system. The phage-host interactions involving CRISPRs have been studied in experiments with selected bacterial or archaeal species and, computationally, in completely sequenced genomes. However, these studies do not allow one to take prokaryotic population diversity and phage-host interaction dynamics into account. This gap can be filled by using metagenomic data: in particular, the largest existing data set, generated from the Sorcerer II Global Ocean Sampling expedition. The application of three publicly available CRISPR recognition programs to the Global Ocean metagenome produced a large proportion of false-positive results. To address this problem, a filtering procedure was designed. It resulted in about 200 reliable CRISPR cassettes, which were then studied in detail. The repeat consensuses were clustered into several stable classes that differed from the existing classification. Short fragments of DNA similar to the cassette spacers were more frequently present in the same geographical location than in other locations (P, <0.0001). We developed a catalogue of elementary CRISPR-forming events and reconstructed the likely evolutionary history of cassettes that had common spacers. Metagenomic collections allow for relatively unbiased analysis of phage-host interactions and CRISPR evolution. The results of this study demonstrate that CRISPR cassettes retain the memory of the local virus population at a particular ocean location. CRISPR evolution may be described using a limited vocabulary of elementary events that have a natural biological interpretation

CiteSeerX

Crossref

PubMed Central

Mining sequence annotation databanks for association patterns

Author: Dmitrij Frishman
Goar Frishman
Irena I. Artamonova
Mikhail S. Gelfand
Publication venue
Publication date: 01/01/2005
Field of study

Data and text mining Vol. 21 Suppl. 3 2005, pages iii49–iii57 doi:10.1093/bioinformatics/bti1206 Mining sequence annotation databanks for association pattern

CiteSeerX

PuSH

(probability that a given database entry will satisfy the right side of the rule given that it satisfies the left side of the rule)

Author: Dmitrij Frishman (3652)
Goar Frishman (13987)
Irena I Artamonova (77982)
Publication venue
Publication date
Field of study

Copyright information:Taken from "Applying negative rule mining to improve genome annotation"http://www.biomedcentral.com/1471-2105/8/261BMC Bioinformatics 2007;8():261-261.Published online 21 Jul 2007PMCID:PMC1940032. Minimal coverage counts (number of entries in the database that possess all features from the left hand side of the rule) used are 100 (blue), 200 (pink), and 500 (green). The threshold for minimal leverage count (difference of the actual rule frequency and the probability to find it by chance with the given frequencies of its RHS and LHS) was set to 100 in all calculation

The Francis Crick Institute