Search CORE

46 research outputs found

RAId_DbS: Peptide Identification using Database Searches with Realistic Statistics

Author: Alves Gelio
Ogurtsov Aleksey Y
Yu Yi-Kuo
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background The key to mass-spectrometry-based proteomics is peptide identification. A major challenge in peptide identification is to obtain realistic <it>E</it>-values when assigning statistical significance to candidate peptides. Results Using a simple scoring scheme, we propose a database search method with theoretically characterized statistics. Taking into account possible skewness in the random variable distribution and the effect of finite sampling, we provide a theoretical derivation for the tail of the score distribution. For every experimental spectrum examined, we collect the scores of peptides in the database, and find good agreement between the collected score statistics and our theoretical distribution. Using Student's <it>t</it>-tests, we quantify the degree of agreement between the theoretical distribution and the score statistics collected. The T-tests may be used to measure the reliability of reported statistics. When combined with reported <it>P</it>-value for a peptide hit using a score distribution model, this new measure prevents exaggerated statistics. Another feature of RAId_DbS is its capability of detecting multiple co-eluted peptides. The peptide identification performance and statistical accuracy of RAId_DbS are assessed and compared with several other search tools. The executables and data related to RAId_DbS are freely available upon request.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

RAId_DbS: mass-spectrometry based peptide identification web server with knowledge integration

Author: Alves Gelio
Ogurtsov Aleksey Y
Yu Yi-Kuo
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Existing scientific literature is a rich source of biological information such as disease markers. Integration of this information with data analysis may help researchers to identify possible controversies and to form useful hypotheses for further validations. In the context of proteomics studies, individualized proteomics era may be approached through consideration of amino acid substitutions/modifications as well as information from disease studies. Integration of such information with peptide searches facilitates speedy, dynamic information retrieval that may significantly benefit clinical laboratory studies. Description We have integrated from various sources annotated single amino acid polymorphisms, post-translational modifications, and their documented disease associations (if they exist) into one enhanced database per organism. We have also augmented our peptide identification software RAId_DbS to take into account this information while analyzing a tandem mass spectrum. In principle, one may choose to respect or ignore the <it>correlation </it>of amino acid polymorphisms/modifications within each protein. The former leads to targeted searches and avoids scoring of unnecessary polymorphism/modification combinations; the latter explores possible polymorphisms in a controlled fashion. To facilitate new discoveries, RAId_DbS also allows users to conduct searches permitting <it>novel </it>polymorphisms as well as to search a knowledge database created by the users. Conclusion We have finished constructing enhanced databases for 17 organisms. The web link to RAId_DbS and the enhanced databases is <url>http://www.ncbi.nlm.nih.gov/CBBResearch/qmbp/RAId_DbS/index.html</url>. The relevant databases and binaries of RAId_DbS for Linux, Windows, and Mac OS X are available for download from the same web page.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Evolution of alternative and constitutive regions of mammalian 5'UTRs

Author: Koonin Eugene V
Ogurtsov Aleksey Y
Resch Alissa M
Rogozin Igor B
Shabalina Svetlana A
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Alternative splicing (AS) in protein-coding sequences has emerged as an important mechanism of regulation and diversification of animal gene function. By contrast, the extent and roles of alternative events including AS and alternative transcription initiation (ATI) within the 5'-untranslated regions (5'UTRs) of mammalian genes are not well characterized. Results We evaluated the abundance, conservation and evolution of putative regulatory control elements, namely, upstream start codons (uAUGs) and open reading frames (uORFs), in the 5'UTRs of human and mouse genes impacted by alternative events. For genes with alternative 5'UTRs, the fraction of alternative sequences (those present in a subset of the transcripts) is much greater than that in the corresponding coding sequence, conceivably, because 5'UTRs are not bound by constraints on protein structure that limit AS in coding regions. Alternative regions of mammalian 5'UTRs evolve faster and are subject to a weaker purifying selection than constitutive portions. This relatively weak selection results in over-abundance of uAUGs and uORFs in the alternative regions of 5'UTRs compared to constitutive regions. Nevertheless, even in alternative regions, uORFs evolve under a stronger selection than the rest of the sequences, indicating that some of the uORFs are conserved regulatory elements; some of the non-conserved uORFs could be involved in species-specific regulation. Conclusion The findings on the evolution and selection in alternative and constitutive regions presented here are consistent with the hypothesis that alternative events, namely, AS and ATI, in 5'UTRs of mammalian genes are likely to contribute to the regulation of translation.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Calibrating E-values for MS2 database search methods

Author: Alves Gelio
Ogurtsov Aleksey Y
Shen Rong-Fong
Wang Guanghui
Wu Wells W
Yu Yi-Kuo
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

This is an Open Access article distributed under the terms of the Creative Commons Attribution Licens

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Distinct Patterns of Expression and Evolution of Intronless and Intron-Containing Mammalian Genes

Author: Koonin Eugene V.
Novichkov Pavel S.
Ogurtsov Aleksey Y.
Shabalina Svetlana A.
Spiridonov Alexey N.
Spiridonov Nikolay A.
Publication venue: Oxford University Press
Publication date: 01/04/2010
Field of study

Comparison of expression levels and breadth and evolutionary rates of intronless and intron-containing mammalian genes shows that intronless genes are expressed at lower levels, tend to be tissue specific, and evolve significantly faster than spliced genes. By contrast, monomorphic spliced genes that are not subject to detectable alternative splicing and polymorphic alternatively spliced genes show similar statistically indistinguishable patterns of expression and evolution. Alternative splicing is most common in ancient genes, whereas intronless genes appear to have relatively recent origins. These results imply tight coupling between different stages of gene expression, in particular, transcription, splicing, and nucleocytosolic transport of transcripts, and suggest that formation of intronless genes is an important route of evolution of novel tissue-specific functions in animals

DSpace@MIT

PubMed Central

Detection of co-eluted peptides using database search methods

Author: Alves Gelio
Kwok Siwei
Ogurtsov Aleksey Y
Shen Rong-Fong
Wang Guanghui
Wu Wells W
Yu Yi-Kuo
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Current experimental techniques, especially those applying liquid chromatography mass spectrometry, have made high-throughput proteomic studies possible. The increase in throughput however also raises concerns on the accuracy of identification or quantification. Most experimental procedures select in a given MS scan only a few relatively most intense parent ions, each to be fragmented (MS2) separately, and most other minor co-eluted peptides that have similar chromatographic retention times are ignored and their information lost. Results We have computationally investigated the possibility of enhancing the information retrieval during a given LC/MS experiment by selecting the two or three most intense parent ions for simultaneous fragmentation. A set of spectra is created via superimposing a number of MS2 spectra, each can be identified by all search methods tested with high confidence, to mimick the spectra of co-eluted peptides. The generated convoluted spectra were used to evaluate the capability of several database search methods – SEQUEST, Mascot, X!Tandem, OMSSA, and RAId_DbS – in identifying true peptides from superimposed spectra of co-eluted peptides. We show that using these simulated spectra, all the database search methods will gain eventually in the number of true peptides identified by using the compound spectra of co-eluted peptides. Open peer review Reviewed by Vlad Petyuk (nominated by Arcady Mushegian), King Jordan and Shamil Sunyaev. For the full reviews, please go to the Reviewers' comments section.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Digital Repository at the University of Maryland

Comparison of approaches for rational siRNA design leading to a new efficient and transparent method

Author: Atkins John F.
Matveeva Olga
Moore Barry
Nechipurenko Yury
Ogurtsov Aleksey Y.
Rossi Leo
Shabalina Svetlana A.
Sætrom Pål
Publication venue: Oxford University Press
Publication date: 01/01/2007
Field of study

Current literature describes several methods for the design of efficient siRNAs with 19 perfectly matched base pairs and 2 nt overhangs. Using four independent databases totaling 3336 experimentally verified siRNAs, we compared how well several of these methods predict siRNA cleavage efficiency. According to receiver operating characteristics (ROC) and correlation analyses, the best programs were BioPredsi, ThermoComposition and DSIR. We also studied individual parameters that significantly and consistently correlated with siRNA efficacy in different databases. As a result of this work we developed a new method which utilizes linear regression fitting with local duplex stability, nucleotide position-dependent preferences and total G/C content of siRNA duplexes as input parameters. The new method's discrimination ability of efficient and inefficient siRNAs is comparable with that of the best methods identified, but its parameters are more obviously related to the mechanisms of siRNA action in comparison with BioPredsi. This permits insight to the underlying physical features and relative importance of the parameters. The new method of predicting siRNA efficiency is faster than that of ThermoComposition because it does not employ time-consuming RNA secondary structure calculations and has much less parameters than DSIR. It is available as a web tool called ‘siRNA scales’

Cork Open Research Archive

Expansion of the human μ-opioid receptor gene architecture: novel functional variants

Author: Aleksey Y. Ogurtsov
Altschul
Befort
Beyer
Bhalang
Bikashkumar Mishra
Birney
Bond
Camu
Carly Kiselycznyk
Chappell
Cherny
Chou
Crain
David Goldman
Diatchenko
Dmitri V. Zaykin
Doyle
Edwards
Fillingim
Fillingim
Galeotti
Galer
Glass
Goldstein
Han
Ikeda
Inna Belfer
Inna E. Tchivileva
Inturrisi
Josee Gauthier
Kimura
Klepstad
Kondrashov
Kriventseva
Kvam
Kyoko Shibata
Le
Lotsch
Louie
Luda Diatchenko
Margaret R. Wallace
Mather
Matthes
Max
Mercadante
Mitchell B. Max
Mogil
Mogil
Morris
Narita
Nikolay A. Spiridonov
Nurtdinov
Ogurtsov
Ogurtsov
Ohler
Pan
Pan
Pasternak
Pasternak
Pavel Gris
Polomano
Polomano
Price
Rakvag
Ready
Roger B. Fillingim
Roland Staud
Rowlingson
Sarne
Schuller
Shabalina
Shabalina
Shabalina
Shibata
Shibata
Simes
Skarke
Smith
Smith
Sora
Staahl
Svetlana A. Shabalina
Thompson
Uhl
Weir
Wellcome Trust Case Control Consortium
William Maixner
Xu
Yang
Yeo
Zaykin
Zaykin
Zhang
Zhang
Zuker
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

The μ-opioid receptor (OPRM1) is the principal receptor target for both endogenous and exogenous opioid analgesics. There are substantial individual differences in human responses to painful stimuli and to opiate drugs that are attributed to genetic variations in OPRM1. In searching for new functional variants, we employed comparative genome analysis and obtained evidence for the existence of an expanded human OPRM1 gene locus with new promoters, alternative exons and regulatory elements. Examination of polymorphisms within the human OPRM1 gene locus identified strong association between single nucleotide polymorphism (SNP) rs563649 and individual variations in pain perception. SNP rs563649 is located within a structurally conserved internal ribosome entry site (IRES) in the 5′-UTR of a novel exon 13-containing OPRM1 isoforms (MOR-1K) and affects both mRNA levels and translation efficiency of these variants. Furthermore, rs563649 exhibits very strong linkage disequilibrium throughout the entire OPRM1 gene locus and thus affects the functional contribution of the corresponding haplotype that includes other functional OPRM1 SNPs. Our results provide evidence for an essential role for MOR-1K isoforms in nociceptive signaling and suggest that genetic variations in alternative OPRM1 isoforms may contribute to individual differences in opiate responses

Crossref

PubMed Central

Carolina Digital Repository

RAId_aPS: MS/MS analysis with multiple scoring functions and spectrum-specific statistics

Author: A Keller
A Keller
A Keller
A Prakash
AA Klammer
AB Robinson
AL Oberg
Aleksey Y. Ogurtsov
AR Jones
B MacLean
BC Searle
BC Searle
CF Taylor
CY Park
D Fenyo
G Alves
G Alves
G Alves
G Alves
G Alves
G Alves
Gelio Alves
JE Elias
JK Eng
JK Eng
L Kall
N Edwards
N Zhang
R Craig
S Kim
TP Doerr
Vladimir N. Uversky
WH Press
Yi-Kuo Yu
YK Yu
Publication venue
Publication date: 16/09/2010
Field of study

Statistically meaningful comparison/combination of peptide identification results from various search methods is impeded by the lack of a universal statistical standard. Providing an E-value calibration protocol, we demonstrated earlier the feasibility of translating either the score or heuristic E-value reported by any method into the textbook-defined E-value, which may serve as the universal statistical standard. This protocol, although robust, may lose spectrum-specific statistics and might require a new calibration when changes in experimental setup occur. To mitigate these issues, we developed a new MS/MS search tool, RAId_aPS, that is able to provide spectrum-specific E-values for additive scoring functions. Given a selection of scoring functions out of RAId score, K-score, Hyperscore and XCorr, RAId_aPS generates the corresponding score histograms of all possible peptides using dynamic programming. Using these score histograms to assign E-values enables a calibration-free protocol for accurate significance assignment for each scoring function. RAId_aPS features four different modes: (i) compute the total number of possible peptides for a given molecular mass range, (ii) generate the score histogram given a MS/MS spectrum and a scoring function, (iii) reassign E-values for a list of candidate peptides given a MS/MS spectrum and the scoring functions chosen, and (iv) perform database searches using selected scoring functions. In modes (iii) and (iv), RAId_aPS is also capable of combining results from different scoring functions using spectrum-specific statistics. The web link is http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/raid_aps/index.html. Relevant binaries for Linux, Windows, and Mac OS X are available from the same page.Comment: 34 pages, 10 figures, 1 supplementary information file (RAId_aPS_support.pdf). To view the supplementary file, please download and extract the gzipped tar source file listed under "Other formats

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Expression Patterns of Protein Kinases Correlate with Gene Architecture and Evolutionary Rates

Author: A Sgourou
A Stark
AD Smith
Aleksey Y. Ogurtsov
AR Forrest
AS Kondrashov
AV Kochetov
AW van der Velden
AY Ogurtsov
B Mazumder
BM Pickering
C Dan
C Zhang
CI Castillo-Davis
David Landsman
DG Hardie
DG Hardie
E Eisenberg
E Signori
FA Dhariwala
G Borck
G Manning
Gibbes R. Johnson
H Tanaka
IK Jordan
J Dresios
J Zhu
JA Bernat
JY Wu
KH Chen
L Duret
L Marino-Ramirez
LA Balmer
LA Pennacchio
LE Crocitto
Leonardo Mariño-Ramírez
LS Chen
M Blanchette
M Godbout
M Kimura
M Kozak
M Kozak
M Levine
MC Frith
NA Spiridonov
Nikolay A. Spiridonov
NN Nazipova
OV Matveeva
P Kueng
P Mitchell
RH Waterston
S Caenepeel
S Hanks
S Kimmins
S Takemoto-Kimura
SA Shabalina
SA Shabalina
SA Shabalina
SA Shabalina
SA Shabalina
SA Shabalina
SB Quintaje
SF Altschul
Sridhar Hannenhalli
Svetlana A. Shabalina
T Hunter
T Nakayama
T Sunyer
T Theil
V Matys
VP Mauro
W Makalowski
WW Wasserman
X Xie
Y Zhang
Y Zhang
Z Yang
Publication venue: Public Library of Science
Publication date: 31/10/2008
Field of study

Protein kinase (PK) genes comprise the third largest superfamily that occupy ∼2% of the human genome. They encode regulatory enzymes that control a vast variety of cellular processes through phosphorylation of their protein substrates. Expression of PK genes is subject to complex transcriptional regulation which is not fully understood.Our comparative analysis demonstrates that genomic organization of regulatory PK genes differs from organization of other protein coding genes. PK genes occupy larger genomic loci, have longer introns, spacer regions, and encode larger proteins. The primary transcript length of PK genes, similar to other protein coding genes, inversely correlates with gene expression level and expression breadth, which is likely due to the necessity to reduce metabolic costs of transcription for abundant messages. On average, PK genes evolve slower than other protein coding genes. Breadth of PK expression negatively correlates with rate of non-synonymous substitutions in protein coding regions. This rate is lower for high expression and ubiquitous PKs, relative to low expression PKs, and correlates with divergence in untranslated regions. Conversely, rate of silent mutations is uniform in different PK groups, indicating that differing rates of non-synonymous substitutions reflect variations in selective pressure. Brain and testis employ a considerable number of tissue-specific PKs, indicating high complexity of phosphorylation-dependent regulatory network in these organs. There are considerable differences in genomic organization between PKs up-regulated in the testis and brain. PK genes up-regulated in the highly proliferative testicular tissue are fast evolving and small, with short introns and transcribed regions. In contrast, genes up-regulated in the minimally proliferative nervous tissue carry long introns, extended transcribed regions, and evolve slowly.PK genomic architecture, the size of gene functional domains and evolutionary rates correlate with the pattern of gene expression. Structure and evolutionary divergence of tissue-specific PK genes is related to the proliferative activity of the tissue where these genes are predominantly expressed. Our data provide evidence that physiological requirements for transcription intensity, ubiquitous expression, and tissue-specific regulation shape gene structure and affect rates of evolution

Public Library of Science (PLOS)

Crossref

PubMed Central