Search CORE

Oxford University Research Archive

De-DUFing the DUFs: Deciphering distant evolutionary relationships of Domains of Unknown Function using sensitive homology detection methods

Author
Publication venue: BioMed Central
Publication date: 31/07/2015
Field of study

Springer - Publisher Connector

Realm of PD-(D/E)XK nuclease superfamily revisited: detection of novel families with modified transitive meta profile searches

Author: Ginalski Krzysztof
Grishin Nick V
Kinch Lisa N
Knizewski Lukasz
Rychlewski Leszek
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background PD-(D/E)XK nucleases constitute a large and highly diverse superfamily of enzymes that display little sequence similarity despite retaining a common core fold and a few critical active site residues. This makes identification of new PD-(D/E)XK nuclease families a challenging task as they usually escape detection with standard sequence-based methods. We developed a modified transitive meta profile search approach and to consider the structural diversity of PD-(D/E)XK nuclease fold more thoroughly we analyzed also lower than threshold Meta-BASIC hits to select potentially correct predictions placed among unreliable or incorrect ones. Results Application of a modified transitive Meta-BASIC searches on updated PFAM families and PDB structures resulted in detection of five new PD-(D/E)XK nuclease families encompassing hundreds of so far uncharacterized and poorly annotated proteins. These include four families catalogued in PFAM database as domains of unknown function (DUF506, DUF524, DUF1626 and DUF1703) and YhgA-like family of putative transposases. Three of these families represent extremely distant homologs (DUF506, DUF524, and YhgA-like), while two are newly defined in updated database (DUF1626 and DUF1703). In addition, we also confidently identified an extended AAA-ATPase domain in the N-terminal region of DUF1703 family proteins. Conclusion Obtained results suggest that detailed analysis of below threshold Meta-BASIC hits may push limits further for distant homology detection in the 'midnight zone' of homology. All identified families conserve the core evolutionary fold, secondary structure and hydrophobic patterns common to existing PD-(D/E)XK nucleases and maintain critical active site motifs that contribute to nucleic acid cleavage. Further experimental investigations should address the predicted activity and clarify potential substrates providing further insight into detailed biological role of these newly detected nucleases.</p

Homology-extended sequence alignment

Author: Jaap Heringa
Jens Kleinjung
John Romein
Kuang Lin
Publication venue: Oxford University Press
Publication date: 01/01/2005
Field of study

We present a profile–profile multiple alignment strategy that uses database searching to collect homologues for each sequence in a given set, in order to enrich their available evolutionary information for the alignment. For each of the alignment sequences, the putative homologous sequences that score above a pre-defined threshold are incorporated into a position-specific pre-alignment profile. The enriched position-specific profile is used for standard progressive alignment, thereby more accurately describing the characteristic features of the given sequence set. We show that owing to the incorporation of the pre-alignment information into a standard progressive multiple alignment routine, the alignment quality between distant sequences increases significantly and outperforms state-of-the-art methods, such as T-COFFEE and MUSCLE. We also show that although entirely sequence-based, our novel strategy is better at aligning distant sequences when compared with a recent contact-based alignment method. Therefore, our pre-alignment profile strategy should be advantageous for applications that rely on high alignment accuracy such as local structure prediction, comparative modelling and threading

CiteSeerX

Distant homologs of anti-apoptotic factor HAX1 encode parvalbumin-like calcium binding proteins

Abstract Background Apoptosis is a highly ordered and orchestrated multiphase process controlled by the numerous cellular and extra-cellular signals, which executes the programmed cell death <it>via </it>release of cytochrome c alterations in calcium signaling, caspase-dependent limited proteolysis and DNA fragmentation. Besides the general modifiers of apoptosis, several tissue-specific regulators of this process were identified including HAX1 (HS-1 associated protein X-1) - an anti-apoptotic factor active in myeloid cells. Although HAX1 was the subject of various experimental studies, the mechanisms of its action and a functional link connected with the regulation of apoptosis still remains highly speculative. Findings Here we provide the data which suggests that HAX1 may act as a regulator or as a sensor of calcium. On the basis of iterative similarity searches, we identified a set of distant homologs of HAX1 in insects. The applied fold recognition protocol gives us strong evidence that the distant insects' homologs of HAX1 are novel parvalbumin-like calcium binding proteins. Although the whole three EF-hands fold is not preserved in vertebrate our analysis suggests that there is an existence of a potential single EF-hand calcium binding site in HAX1. The molecular mechanism of its action remains to be identified, but the risen hypothesis easily translates into previously reported lines of various data on the HAX1 biology as well as, provides us a direct link to the regulation of apoptosis. Moreover, we also report that other family of myeloid specific apoptosis regulators - myeloid leukemia factors (MLF1, MLF2) share the homologous C-terminal domain and taxonomic distribution with HAX1. Conclusions Performed structural and active sites analyses gave new insights into mechanisms of HAX1 and MLF families in apoptosis process and suggested possible role of HAX1 in calcium-binding, still the analyses require further experimental verification.</p

Springer - Publisher Connector

Molecular determinants archetypical to the phylum Nematoda

Author: Abubucker Sahar
Martin John
McCarter James P
Mitreva Makedonka
Rychlewski Leszek
Wang Zhengyuan
Wilson Richard K
Wyrwicz Lucjan
Yin Yong
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Nematoda diverged from other animals between 600–1,200 million years ago and has become one of the most diverse animal phyla on earth. Most nematodes are free-living animals, but many are parasites of plants and animals including humans, posing major ecological and economical challenges around the world. Results We investigated phylum-specific molecular characteristics in Nematoda by exploring over 214,000 polypeptides from 32 nematode species including 27 parasites. Over 50,000 nematode protein families were identified based on primary sequence, including ~10% with members from at least three different species. Nearly 1,600 of the multi-species families did not share homology to Pfam domains, including a total of 758 restricted to Nematoda. Majority of the 462 families that were conserved among both free-living and parasitic species contained members from multiple nematode clades, yet ~90% of the 296 parasite-specific families originated only from a single clade. Features of these protein families were revealed through extrapolation of essential functions from observed RNAi phenotypes in <it>C. elegans</it>, bioinformatics-based functional annotations, identification of distant homology based on protein folds, and prediction of expression at accessible nematode surfaces. In addition, we identified a group of nematode-restricted sequence features in energy-generating electron transfer complexes as potential targets for new chemicals with minimal or no toxicity to the host. Conclusion This study identified and characterized the molecular determinants that help in defining the phylum Nematoda, and therefore improved our understanding of nematode protein evolution and provided novel insights for the development of next generation parasite control strategies.</p

Springer - Publisher Connector

UGD Academic Repository

Pcons.net: protein structure prediction meta server

Author: Elofsson Arne
Larsson Per
Wallner Björn
Publication venue: Oxford University Press
Publication date
Field of study

The Pcons.net Meta Server (http://pcons.net) provides improved automated tools for protein structure prediction and analysis using consensus. It essentially implements all the steps necessary to produce a high quality model of a protein. The whole process is fully automated and a potential user only submits the protein sequence. For PSI-BLAST detectable targets, an accurate model is generated within minutes of submission. For more difficult targets the sequence is automatically submitted to publicly available fold-recognition servers that use more advanced approaches to find distant structural homologs. The results from these servers are analyzed and assessed for structural correctness using Pcons and ProQ; and the user is presented with a ranked list of possible models. In addition, if the protein sequence contains more than one domain, these are automatically parsed out and resubmitted to the server as individual queries

PRALINE: a multiple sequence alignment toolbox that integrates homology-extended and secondary structure information

Author: Heringa J.
Simossis V. A.
Publication venue: Oxford University Press
Publication date: 27/06/2005
Field of study

PRofile ALIgNEment (PRALINE) is a fully customizable multiple sequence alignment application. In addition to a number of available alignment strategies, PRALINE can integrate information from database homology searches to generate a homology-extended multiple alignment. PRALINE also provides a choice of seven different secondary structure prediction programs that can be used individually or in combination as a consensus for integrating structural information into the alignment process. The program can be used through two separate interfaces: one has been designed to cater to more advanced needs of researchers in the field, and the other for standard construction of high confidence alignments. The web-based output is designed to facilitate the comprehensive visualization of the generated alignments by means of five default colour schemes based on: residue type, position conservation, position reliability, residue hydrophobicity and secondary structure, depending on the options set. A user can also define a custom colour scheme by selecting which colour will represent one or more amino acids in the alignment. All generated alignments are also made available in the PDF format for easy figure generation for publications. The grouping of sequences, on which the alignment is based, can also be visualized as a dendrogram. PRALINE is available at

Accurate statistical model of comparison between multiple sequence alignments

Author: Altschul
Altschul
Altschul
Battey
Chung
Finn
Frenkel-Morgenstern
Ginalski
Ginalski
Gnedenko
Gribskov
Gribskov
Gumbel
Jones
Kahsay
Karlin
Karplus
Madera
Murzin
Nick V. Grishin
Ohlson
Pearson
Pearson
Pearson
Pietrokovski
Qi
Reid
Ruslan I. Sadreyev
Rychlewski
Sadreyev
Sadreyev
Schaffer
Soding
Taylor
Wang
Waterman
Yona
Yu
Publication venue: Oxford University Press
Publication date: 01/01/2008
Field of study

Comparison of multiple protein sequence alignments (MSA) reveals unexpected evolutionary relations between protein families and leads to exciting predictions of spatial structure and function. The power of MSA comparison critically depends on the quality of statistical model used to rank the similarities found in a database search, so that biologically relevant relationships are discriminated from spurious connections. Here, we develop an accurate statistical description of MSA comparison that does not originate from conventional models of single sequence comparison and captures essential features of protein families. As a final result, we compute E-values for the similarity between any two MSA using a mathematical function that depends on MSA lengths and sequence diversity. To develop these estimates of statistical significance, we first establish a procedure for generating realistic alignment decoys that reproduce natural patterns of sequence conservation dictated by protein secondary structure. Second, since similarity scores between these alignments do not follow the classic Gumbel extreme value distribution, we propose a novel distribution that yields statistically perfect agreement with the data. Third, we apply this random model to database searches and show that it surpasses conventional models in the accuracy of detecting remote protein similarities

CiteSeerX