Search CORE

141 research outputs found

Estimates of statistical significance for comparison of individual positions in multiple sequence alignments

Author: Grishin Nick V
Sadreyev Ruslan I
Publication venue: BioMed Central
Publication date: 01/01/2004
Field of study

BACKGROUND: Profile-based analysis of multiple sequence alignments (MSA) allows for accurate comparison of protein families. Here, we address the problems of detecting statistically confident dissimilarities between (1) MSA position and a set of predicted residue frequencies, and (2) between two MSA positions. These problems are important for (i) evaluation and optimization of methods predicting residue occurrence at protein positions; (ii) detection of potentially misaligned regions in automatically produced alignments and their further refinement; and (iii) detection of sites that determine functional or structural specificity in two related families. RESULTS: For problems (1) and (2), we propose analytical estimates of P-value and apply them to the detection of significant positional dissimilarities in various experimental situations. (a) We compare structure-based predictions of residue propensities at a protein position to the actual residue frequencies in the MSA of homologs. (b) We evaluate our method by the ability to detect erroneous position matches produced by an automatic sequence aligner. (c) We compare MSA positions that correspond to residues aligned by automatic structure aligners. (d) We compare MSA positions that are aligned by high-quality manual superposition of structures. Detected dissimilarities reveal shortcomings of the automatic methods for residue frequency prediction and alignment construction. For the high-quality structural alignments, the dissimilarities suggest sites of potential functional or structural importance. CONCLUSION: The proposed computational method is of significant potential value for the analysis of protein families

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Exploring dynamics of protein structure determination and homology-based prediction to estimate the number of superfamilies and folds

Author: Grishin Nick V
Sadreyev Ruslan I
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: As tertiary structure is currently available only for a fraction of known protein families, it is important to assess what parts of sequence space have been structurally characterized. We consider protein domains whose structure can be predicted by sequence similarity to proteins with solved structure and address the following questions. Do these domains represent an unbiased random sample of all sequence families? Do targets solved by structural genomic initiatives (SGI) provide such a sample? What are approximate total numbers of structure-based superfamilies and folds among soluble globular domains? RESULTS: To make these assessments, we combine two approaches: (i) sequence analysis and homology-based structure prediction for proteins from complete genomes; and (ii) monitoring dynamics of the assigned structure set in time, with the accumulation of experimentally solved structures. In the Clusters of Orthologous Groups (COG) database, we map the growing population of structurally characterized domain families onto the network of sequence-based connections between domains. This mapping reveals a systematic bias suggesting that target families for structure determination tend to be located in highly populated areas of sequence space. In contrast, the subset of domains whose structure is initially inferred by SGI is similar to a random sample from the whole population. To accommodate for the observed bias, we propose a new non-parametric approach to the estimation of the total numbers of structural superfamilies and folds, which does not rely on a specific model of the sampling process. Based on dynamics of robust distribution-based parameters in the growing set of structure predictions, we estimate the total numbers of superfamilies and folds among soluble globular proteins in the COG database. CONCLUSION: The set of currently solved protein structures allows for structure prediction in approximately a third of sequence-based domain families. The choice of targets for structure determination is biased towards domains with many sequence-based homologs. The growing SGI output in the future should further contribute to the reduction of this bias. The total number of structural superfamilies and folds in the COG database are estimated as ~4000 and ~1700. These numbers are respectively four and three times higher than the numbers of superfamilies and folds that can currently be assigned to COG proteins

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A tale of two ferredoxins: sequence similarity and structural differences

Author: Grishin Nick V
Krishna S Sri
Sadreyev Ruslan I
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Sequence similarity between proteins is usually considered a reliable indicator of homology. Pyruvate-ferredoxin oxidoreductase and quinol-fumarate reductase contain ferredoxin domains that bind [Fe-S] clusters and are involved in electron transport. Profile-based methods for sequence comparison, such as PSI-BLAST and HMMer, suggest statistically significant similarity between these domains. RESULTS: The sequence similarity between these ferredoxin domains resides in the area of the [Fe-S] cluster-binding sites. Although overall folds of these ferredoxins bear no obvious similarity, the regions of sequence similarity display a remarkable local structural similarity. These short regions with pronounced sequence motifs are incorporated in completely different structural environments. In pyruvate-ferredoxin oxidoreductase (bacterial ferredoxin), the hydrophobic core of the domain is completed by two β-hairpins, whereas in quinol-fumarate reductase (α-helical ferredoxin), the cluster-binding motifs are part of a larger all-α-helical globin-like fold core. CONCLUSION: Functionally meaningful sequence similarity may sometimes be reflected only in local structural similarity, but not in global fold similarity. If detected and used naively, such similarities may lead to incorrect fold predictions

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Recommended from our members

X-chromosome hyperactivation in mammals via nonlinear relationships between chromatin states and transcription

Author: Lee Jeannie T.
Pinter Stefan F.
Sadreyev Ruslan I.
Yildirim Eda
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2014
Field of study

Dosage compensation in mammals occurs at two levels. In addition to balancing X-chromosome dosage between males and females via X-inactivation, mammals also balance dosage of Xs and autosomes. It has been proposed that X-autosome equalization occurs by upregulation of Xa (active X). To investigate mechanism, we perform allele-specific ChIP-seq for chromatin epitopes and analyze RNA-seq data. The hypertranscribed Xa demonstrates enrichment of active chromatin marks relative to autosomes. We derive predictive models for relationships among POL-II, active mark densities, and gene expression, and suggest that Xa upregulation involves increased transcription initiation and elongation. Enrichment of active marks on Xa does not scale proportionally with transcription output, a disparity explained by nonlinear quantitative dependencies among active histone marks, POL-II occupancy, and transcription. Significantly, the trend of nonlinear upregulation also occurs on autosomes. Thus, Xa upregulation involves combined increases of active histone marks and POL-II occupancy, without invoking X-specific dependencies between chromatin states and transcription

Harvard University - DASH

COMPASS server for remote homology inference

Author: Grishin Nick V.
Kim Bong-Hyun
Sadreyev Ruslan I.
Tang Ming
Publication venue: Oxford University Press
Publication date: 01/01/2007
Field of study

COMPASS is a method for homology detection and local alignment construction based on the comparison of multiple sequence alignments (MSAs). The method derives numerical profiles from given MSAs, constructs local profile-profile alignments and analytically estimates E-values for the detected similarities. Until now, COMPASS was only available for download and local installation. Here, we present a new web server featuring the latest version of COMPASS, which provides (i) increased sensitivity and selectivity of homology detection; (ii) longer, more complete alignments; and (iii) faster computational speed. After submission of the query MSA or single sequence, the server performs searches versus a user-specified database. The server includes detailed and intuitive control of the search parameters. A flexible output format, structured similarly to BLAST and PSI-BLAST, provides an easy way to read and analyze the detected profile similarities. Brief help sections are available for all input parameters and output options, along with detailed documentation. To illustrate the value of this tool for protein structure-functional prediction, we present two examples of detecting distant homologs for uncharacterized protein families. Available at http://prodata.swmed.edu/compas

CiteSeerX

Crossref

PubMed Central

Cell of origin dictates aggression and stem cell number in acute lymphoblastic leukemia

Author: Garcia Elaine G
Garcia Sara P
Iyer Sowmya
Langenau David M
Loontiens Siebe
Sadreyev Ruslan I
Speleman Franki
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Ghent University Academic Bibliography

A comprehensive system for evaluation of remote sequence similarity detection

Author: Grishin Nick V
Kim Bong-Hyun
Qi Yuan
Sadreyev Ruslan I
Wang Yong
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Accurate and sensitive performance evaluation is crucial for both effective development of better structure prediction methods based on sequence similarity, and for the comparative analysis of existing methods. Up to date, there has been no satisfactory comprehensive evaluation method that (i) is based on a large and statistically unbiased set of proteins with clearly defined relationships; and (ii) covers all performance aspects of sequence-based structure predictors, such as sensitivity and specificity, alignment accuracy and coverage, and structure template quality. Results With the aim of designing such a method, we (i) select a statistically balanced set of divergent protein domains from SCOP, and define similarity relationships for the majority of these domains by complementing the best of information available in SCOP with a rigorous SVM-based algorithm; and (ii) develop protocols for the assessment of similarity detection and alignment quality from several complementary perspectives. The evaluation of similarity detection is based on ROC-like curves and includes several complementary approaches to the definition of true/false positives. Reference-dependent approaches use the 'gold standard' of pre-defined domain relationships and structure-based alignments. Reference-independent approaches assess the quality of structural match predicted by the sequence alignment, with respect to the whole domain length (global mode) or to the aligned region only (local mode). Similarly, the evaluation of alignment quality includes several reference-dependent and -independent measures, in global and local modes. As an illustration, we use our benchmark to compare the performance of several methods for the detection of remote sequence similarities, and show that different aspects of evaluation reveal different properties of the evaluated methods, highlighting their advantages, weaknesses, and potential for further development. Conclusion The presented benchmark provides a new tool for a statistically unbiased assessment of methods for remote sequence similarity detection, from various complementary perspectives. This tool should be useful both for users choosing the best method for a given purpose, and for developers designing new, more powerful methods. The benchmark set, reference alignments, and evaluation codes can be downloaded from <url>ftp://iole.swmed.edu/pub/evaluation/</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Carolina Digital Repository

Recommended from our members

Identifying candidate ncRNAs that direct changes in chromatin structure

Author: Borowsky Mark L
Kingston Robert Edward
Ray Mridula Kumari
Sadreyev Ruslan
Wang Yanqun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Harvard University - DASH

Springer - Publisher Connector

Recommended from our members

H3K27 modifications define segmental regulatory domains in the Drosophila bithorax complex

Author: Bender Welcome
Bowman Sarah K
Deaton Aimee M
Domingues Heber
Kingston Robert E
Sadreyev Ruslan I
Wang Peggy I
Publication venue: 'eLife Sciences Publications, Ltd'
Publication date: 08/09/2014
Field of study

The bithorax complex (BX-C) in Drosophila melanogaster is a cluster of homeotic genes that determine body segment identity. Expression of these genes is governed by cis-regulatory domains, one for each parasegment. Stable repression of these domains depends on Polycomb Group (PcG) functions, which include trimethylation of lysine 27 of histone H3 (H3K27me3). To search for parasegment-specific signatures that reflect PcG function, chromatin from single parasegments was isolated and profiled. The H3K27me3 profiles across the BX-C in successive parasegments showed a ‘stairstep’ pattern that revealed sharp boundaries of the BX-C regulatory domains. Acetylated H3K27 was broadly enriched across active domains, in a pattern complementary to H3K27me3. The CCCTC-binding protein (CTCF) bound the borders between H3K27 modification domains; it was retained even in parasegments where adjacent domains lack H3K27me3. These findings provide a molecular definition of the homeotic domains, and implicate precisely positioned H3K27 modifications as a central determinant of segment identity. DOI: http://dx.doi.org/10.7554/eLife.02833.00

Harvard University - DASH

Loss of muscleblind splicing factor shortens Caenorhabditis elegans lifespan by reducing the activity of p38 MAPK/PMK-1 and transcription factors ATF-7 and Nrf/SKN-1

Author: Cetinbas Murat
Garcia Susana M. D. A.
Matilainen Olli
Ribeiro Ana R. S.
Sadreyev Ruslan
Sood Heini
Verbeeren Jens
Publication venue
Publication date: 01/10/2021
Field of study

Muscleblind-like splicing regulators (MBNLs) are RNA-binding factors that have an important role in developmental processes. Dysfunction of these factors is a key contributor of different neuromuscular degenerative disorders, including Myotonic Dystrophy type 1 (DM1). Since DM1 is a multisystemic disease characterized by symptoms resembling accelerated aging, we asked which cellular processes do MBNLs regulate that make them necessary for normal lifespan. By utilizing the model organism Caenorhabditis elegans, we found that loss of MBL-1 (the sole ortholog of mammalian MBNLs), which is known to be required for normal lifespan, shortens lifespan by decreasing the activity of p38 MAPK/PM K-1 as well as the function of transcription factors ATF-7 and SKN-1. Furthermore, we show that mitochondrial stress caused by the knockdown of mitochondrial electron transport chain components promotes the longevity of mbl-1 mutants in a partially PMK-1-dependent manner. Together, the data establish a mechanism of how DM1-associated loss of muscleblind affects lifespan. Furthermore, this study suggests that mitochondrial stress could alleviate symptoms caused by the dysfunction of muscleblind splicing factor, creating a potential approach to investigate for therapy.Peer reviewe

Helsingin yliopiston digitaalinen arkisto