502 research outputs found
Recommended from our members
Protein Fold Recognition Using Neural Networks
To predict accurately the three-dimensional (3D) structures of proteins from their amino acid sequences alone remains a challenging problem. However, using protein fold recognition tools, it is often possible to achieve good models or at least to gain some more information, to aid scientists in their research. This thesis describes development of TUNE (Threading Using Neural Networks), a fold recognition program using artificial neural network (ANN) models. A new method to generate amino acid substitution matrices is described in chapter two. It uses an ANN to generalise amino acid substitutions observed in protein structure alignments. Matrices for alignment scoring from this approach were compared with classic alignment scoring schemes. From these neural network models, a series of encoding schemes were constructed. These schemes describe the amino acid types with a few numbers. They were generated to replace the orthogonal encoding scheme, so that smaller, faster and more accurate neural network models can be applied on bioinformatic problems. The TUNE model was introduced in chapter four to measure protein sequence-structure compatibility. Given the integrated residue structural environment descriptions, the model predicts probabilities of observing amino acid types in such environments. Using this model, a scoring function to measure the fitness of a residue in a protein structure model can be made for protein threading programs. The model in chapter two was extended by including the residue structural environment descriptions for predictions. A simple protein fold recognition program with a dynamic programming algorithm was developed using this model. The program was then tested in the fourth round of the Critical Assessment of protein Structure Prediction methods (CASP4) and produced reasonably good results
Homology-extended sequence alignment
We present a profile–profile multiple alignment strategy that uses database searching to collect homologues for each sequence in a given set, in order to enrich their available evolutionary information for the alignment. For each of the alignment sequences, the putative homologous sequences that score above a pre-defined threshold are incorporated into a position-specific pre-alignment profile. The enriched position-specific profile is used for standard progressive alignment, thereby more accurately describing the characteristic features of the given sequence set. We show that owing to the incorporation of the pre-alignment information into a standard progressive multiple alignment routine, the alignment quality between distant sequences increases significantly and outperforms state-of-the-art methods, such as T-COFFEE and MUSCLE. We also show that although entirely sequence-based, our novel strategy is better at aligning distant sequences when compared with a recent contact-based alignment method. Therefore, our pre-alignment profile strategy should be advantageous for applications that rely on high alignment accuracy such as local structure prediction, comparative modelling and threading
Consensus structural models for the amino terminal domain of the retrovirus restriction gene Fv1 and the Murine Leukaemia Virus capsid proteins
BACKGROUND: The mouse Fv1 (friend virus) susceptibility gene inhibits the development of the murine leukaemia virus (MLV) by interacting with its capsid (CA) protein. As no structures are available for these proteins we have constructed molecular models based on distant sequence similarity to other retroviral capsid proteins. RESULTS: Molecular models were constructed for the amino terminal domains of the probable capsid-like structure for the mouse Fv1 gene product and the capsid protein of the MLV. The models were based on sequence alignments with a variety of other retrovirus capsid proteins. As the sequence similarity of these proteins with MLV and especially Fv1 is very distant, a threading method was employed that incorporates predicted secondary structure and multiple sequence information. The resulting models were compared with equivalent models constructed using the sequences of the capsid proteins of known structure. CONCLUSIONS: These comparisons suggested that the MLV model should be accurate in the core but with significant uncertainty in the loop regions. The Fv1 model may have some additional errors in the core packing of its helices but the resulting model gave some support to the hypothesis that it adopts a capsid-like structure
SAND, a New Protein Family: From Nucleic Acid to Protein Structure and Function Prediction
As a result of genome, EST and cDNA sequencing projects, there are huge numbers of
predicted and/or partially characterised protein sequences compared with a relatively small
number of proteins with experimentally determined function and structure. Thus, there is a
considerable attention focused on the accurate prediction of gene function and structure
from sequence by using bioinformatics. In the course of our analysis of genomic sequence
from Fugu rubripes, we identified a novel gene, SAND, with significant sequence identity to
hypothetical proteins predicted in Saccharomyces cerevisiae, Schizosaccharomyces pombe,
Caenorhabditis elegans, a Drosophila melanogaster gene, and mouse and human cDNAs.
Here we identify a further SAND homologue in human and Arabidopsis thaliana by use of
standard computational tools. We describe the genomic organisation of SAND in these
evolutionarily divergent species and identify sequence homologues from EST database
searches confirming the expression of SAND in over 20 different eukaryotes. We confirm
the expression of two different SAND paralogues in mammals and determine expression of
one SAND in other vertebrates and eukaryotes. Furthermore, we predict structural
properties of SAND, and characterise conserved sequence motifs in this protein family
Detection and Architecture of Small Heat Shock Protein Monomers
International audienceBACKGROUND: Small Heat Shock Proteins (sHSPs) are chaperone-like proteins involved in the prevention of the irreversible aggregation of misfolded proteins. Although many studies have already been conducted on sHSPs, the molecular mechanisms and structural properties of these proteins remain unclear. Here, we propose a better understanding of the architecture, organization and properties of the sHSP family through structural and functional annotations. We focused on the Alpha Crystallin Domain (ACD), a sandwich fold that is the hallmark of the sHSP family. METHODOLOGY/PRINCIPAL FINDINGS: We developed a new approach for detecting sHSPs and delineating ACDs based on an iterative Hidden Markov Model algorithm using a multiple alignment profile generated from structural data on ACD. Using this procedure on the UniProt databank, we found 4478 sequences identified as sHSPs, showing a very good coverage with the corresponding PROSITE and Pfam profiles. ACD was then delimited and structurally annotated. We showed that taxonomic-based groups of sHSPs (animals, plants, bacteria) have unique features regarding the length of their ACD and, more specifically, the length of a large loop within ACD. We detailed highly conserved residues and patterns specific to the whole family or to some groups of sHSPs. For 96% of studied sHSPs, we identified in the C-terminal region a conserved I/V/L-X-I/V/L motif that acts as an anchor in the oligomerization process. The fragment defined from the end of ACD to the end of this motif has a mean length of 14 residues and was named the C-terminal Anchoring Module (CAM). CONCLUSIONS/SIGNIFICANCE: This work annotates structural components of ACD and quantifies properties of several thousand sHSPs. It gives a more accurate overview of the architecture of sHSP monomers
Multidisciplinary ecosystem to study lifecourse determinants and prevention of early-onset burdensome multimorbidity (MELD-B) – protocol for a research collaboration
Background: Most people living with multiple long-term condition multimorbidity (MLTC-M) are under 65 (defined as ‘early onset’). Earlier and greater accrual of long-term conditions (LTCs) may be influenced by the timing and nature of exposure to key risk factors, wider determinants or other LTCs at different life stages. We have established a research collaboration titled ‘MELD-B’ to understand how wider determinants, sentinel conditions (the first LTC in the lifecourse) and LTC accrual sequence affect risk of early-onset, burdensome MLTC-M, and to inform prevention interventions. Aim: Our aim is to identify critical periods in the lifecourse for prevention of early-onset, burdensome MLTC-M, identified through the analysis of birth cohorts and electronic health records, including artificial intelligence (AI)-enhanced analyses. Design: We will develop deeper understanding of ‘burdensomeness’ and ‘complexity’ through a qualitative evidence synthesis and a consensus study. Using safe data environments for analyses across large, representative routine healthcare datasets and birth cohorts, we will apply AI methods to identify early-onset, burdensome MLTC-M clusters and sentinel conditions, develop semi-supervised learning to match individuals across datasets, identify determinants of burdensome clusters, and model trajectories of LTC and burden accrual. We will characterise early-life (under 18 years) risk factors for early-onset, burdensome MLTC-M and sentinel conditions. Finally, using AI and causal inference modelling, we will model potential ‘preventable moments’, defined as time periods in the life course where there is an opportunity for intervention on risk factors and early determinants to prevent the development of MLTC-M. Patient and public involvement is integrated throughout
: peel it
International audienceThree-dimensional structures of proteins are the support of their biological functions. Their folds are maintained by inter-residue interactions which are one of the main focuses to understand the mechanisms of protein folding and stability. Furthermore, protein structures can be composed of single or multiple functional domains that can fold and function independently. Hence, dividing a protein into domains is useful for obtaining an accurate structure and function determination. In previous studies, we enlightened protein contact properties according to different definitions and developed a novel methodology named Protein Peeling. Within protein structures, Protein Peeling characterizes small successive compact units along the sequence called protein units (PUs). The cutting done by Protein Peeling maximizes the number of contacts within the PUs and minimizes the number of contacts between them. This method is so a relevant tool in the context of the protein folding research and particularly regarding the hierarchical model proposed by George Rose. Here, we accurately analyze the PUs at different levels of cutting, using a non-redundant protein databank. Distribution of PU sizes, number of PUs or their accessibility are screened to determine their common and different features. Moreover, we highlight the preferential amino acid interactions inside and between PUs. Our results show that PUs are clearly an intermediate level between secondary structures and protein structural domains
ARGO: a web system for the detection of degenerate motifs and large-scale recognition of eukaryotic promoters
Reliable recognition of the promoters in eukaryotic genomes remains an open issue. This is largely owing to the poor understanding of the features of the structural–functional organization of the eukaryotic promoters essential for their function and recognition. However, it was demonstrated that detection of ensembles of regulatory signals characteristic of specific promoter groups increases the accuracy of promoter recognition and prediction of specific expression features of the queried genes. The ARGO_Motifs package was developed for the detection of sets of region-specific degenerate oligonucleotide motifs in the regulatory regions of the eukaryotic genes. The ARGO_Viewer package was developed for the recognition of tissue-specific gene promoters based on the presence and distribution of oligonucleotide motifs obtained by the ARGO_Motifs program. Analysis and recognition of tissue-specific promoters in five gene samples demonstrated high quality of promoter recognition. The public version of the ARGO system is available at and
Serverification of Molecular Modeling Applications: the Rosetta Online Server that Includes Everyone (ROSIE)
The Rosetta molecular modeling software package provides experimentally
tested and rapidly evolving tools for the 3D structure prediction and
high-resolution design of proteins, nucleic acids, and a growing number of
non-natural polymers. Despite its free availability to academic users and
improving documentation, use of Rosetta has largely remained confined to
developers and their immediate collaborators due to the code's difficulty of
use, the requirement for large computational resources, and the unavailability
of servers for most of the Rosetta applications. Here, we present a unified web
framework for Rosetta applications called ROSIE (Rosetta Online Server that
Includes Everyone). ROSIE provides (a) a common user interface for Rosetta
protocols, (b) a stable application programming interface for developers to add
additional protocols, (c) a flexible back-end to allow leveraging of computer
cluster resources shared by RosettaCommons member institutions, and (d)
centralized administration by the RosettaCommons to ensure continuous
maintenance. This paper describes the ROSIE server infrastructure, a
step-by-step 'serverification' protocol for use by Rosetta developers, and the
deployment of the first nine ROSIE applications by six separate developer
teams: Docking, RNA de novo, ERRASER, Antibody, Sequence Tolerance,
Supercharge, Beta peptide design, NCBB design, and VIP redesign. As illustrated
by the number and diversity of these applications, ROSIE offers a general and
speedy paradigm for serverification of Rosetta applications that incurs
negligible cost to developers and lowers barriers to Rosetta use for the
broader biological community. ROSIE is available at
http://rosie.rosettacommons.org
- …