Search CORE

502 research outputs found

Recommended from our members

Protein Fold Recognition Using Neural Networks

Author: Lin Guang
Publication venue
Publication date: 01/01/2003
Field of study

To predict accurately the three-dimensional (3D) structures of proteins from their amino acid sequences alone remains a challenging problem. However, using protein fold recognition tools, it is often possible to achieve good models or at least to gain some more information, to aid scientists in their research. This thesis describes development of TUNE (Threading Using Neural Networks), a fold recognition program using artificial neural network (ANN) models. A new method to generate amino acid substitution matrices is described in chapter two. It uses an ANN to generalise amino acid substitutions observed in protein structure alignments. Matrices for alignment scoring from this approach were compared with classic alignment scoring schemes. From these neural network models, a series of encoding schemes were constructed. These schemes describe the amino acid types with a few numbers. They were generated to replace the orthogonal encoding scheme, so that smaller, faster and more accurate neural network models can be applied on bioinformatic problems. The TUNE model was introduced in chapter four to measure protein sequence-structure compatibility. Given the integrated residue structural environment descriptions, the model predicts probabilities of observing amino acid types in such environments. Using this model, a scoring function to measure the fitness of a residue in a protein structure model can be made for protein threading programs. The model in chapter two was extended by including the residue structural environment descriptions for predictions. A simple protein fold recognition program with a dynamic programming algorithm was developed using this model. The program was then tested in the fourth round of the Critical Assessment of protein Structure Prediction methods (CASP4) and produced reasonably good results

Open Research Online (The Open University)

OpenGrey Repository

TSTMP: target selection for structural genomics of human transmembrane proteins

Author: Dobson László
Reményi István
Tusnády Gábor
Varga Julia Kornélia
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2017
Field of study

Repository of the Academy's Library

Homology-extended sequence alignment

Author: Jaap Heringa
Jens Kleinjung
John Romein
Kuang Lin
Publication venue: Oxford University Press
Publication date: 01/01/2005
Field of study

We present a profile–profile multiple alignment strategy that uses database searching to collect homologues for each sequence in a given set, in order to enrich their available evolutionary information for the alignment. For each of the alignment sequences, the putative homologous sequences that score above a pre-defined threshold are incorporated into a position-specific pre-alignment profile. The enriched position-specific profile is used for standard progressive alignment, thereby more accurately describing the characteristic features of the given sequence set. We show that owing to the incorporation of the pre-alignment information into a standard progressive multiple alignment routine, the alignment quality between distant sequences increases significantly and outperforms state-of-the-art methods, such as T-COFFEE and MUSCLE. We also show that although entirely sequence-based, our novel strategy is better at aligning distant sequences when compared with a recent contact-based alignment method. Therefore, our pre-alignment profile strategy should be advantageous for applications that rely on high alignment accuracy such as local structure prediction, comparative modelling and threading

CiteSeerX

Crossref

PubMed Central

Consensus structural models for the amino terminal domain of the retrovirus restriction gene Fv1 and the Murine Leukaemia Virus capsid proteins

Author: Stoye Jonathan P
Taylor William R
Publication venue: BioMed Central
Publication date: 01/01/2004
Field of study

BACKGROUND: The mouse Fv1 (friend virus) susceptibility gene inhibits the development of the murine leukaemia virus (MLV) by interacting with its capsid (CA) protein. As no structures are available for these proteins we have constructed molecular models based on distant sequence similarity to other retroviral capsid proteins. RESULTS: Molecular models were constructed for the amino terminal domains of the probable capsid-like structure for the mouse Fv1 gene product and the capsid protein of the MLV. The models were based on sequence alignments with a variety of other retrovirus capsid proteins. As the sequence similarity of these proteins with MLV and especially Fv1 is very distant, a threading method was employed that incorporates predicted secondary structure and multiple sequence information. The resulting models were compared with equivalent models constructed using the sequences of the capsid proteins of known structure. CONCLUSIONS: These comparisons suggested that the MLV model should be accurate in the core but with significant uncertainty in the loop regions. The Fv1 model may have some additional errors in the core packing of its helices but the resulting model gave some support to the hypothesis that it adopts a capsid-like structure

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

SAND, a New Protein Family: From Nucleic Acid to Protein Structure and Function Prediction

Author: Adams
Altschul
Amanda Cottage
Aparicio
Armes
Attwood
Bairoch
Bork
Brenner
Burge
Corpet
Cottage
Cuff
Dunham
Edwards
Garavelli
Goffeau
Greg Elgar
Hattori
Henikoff
Hofmann
Jones
Kok
Lin
Mayer
Murvai
Nagase
Naito
Reese
Salanoubat
Sonnhammer
Stoesser
Tabata
Theologis
Thompson
Uberbacher
Venter
Yvonne J. K. Edwards
Publication venue: Hindawi Publishing Corporation
Publication date: 01/01/2001
Field of study

As a result of genome, EST and cDNA sequencing projects, there are huge numbers of predicted and/or partially characterised protein sequences compared with a relatively small number of proteins with experimentally determined function and structure. Thus, there is a considerable attention focused on the accurate prediction of gene function and structure from sequence by using bioinformatics. In the course of our analysis of genomic sequence from Fugu rubripes, we identified a novel gene, SAND, with significant sequence identity to hypothetical proteins predicted in Saccharomyces cerevisiae, Schizosaccharomyces pombe, Caenorhabditis elegans, a Drosophila melanogaster gene, and mouse and human cDNAs. Here we identify a further SAND homologue in human and Arabidopsis thaliana by use of standard computational tools. We describe the genomic organisation of SAND in these evolutionarily divergent species and identify sequence homologues from EST database searches confirming the expression of SAND in over 20 different eukaryotes. We confirm the expression of two different SAND paralogues in mammals and determine expression of one SAND in other vertebrates and eukaryotes. Furthermore, we predict structural properties of SAND, and characterise conserved sequence motifs in this protein family

Crossref

Directory of Open Access Journals

PubMed Central

Detection and Architecture of Small Heat Shock Protein Monomers

Author: Flatters Delphine
Gelly Jean-Christophe
Poulain Pierre
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

International audienceBACKGROUND: Small Heat Shock Proteins (sHSPs) are chaperone-like proteins involved in the prevention of the irreversible aggregation of misfolded proteins. Although many studies have already been conducted on sHSPs, the molecular mechanisms and structural properties of these proteins remain unclear. Here, we propose a better understanding of the architecture, organization and properties of the sHSP family through structural and functional annotations. We focused on the Alpha Crystallin Domain (ACD), a sandwich fold that is the hallmark of the sHSP family. METHODOLOGY/PRINCIPAL FINDINGS: We developed a new approach for detecting sHSPs and delineating ACDs based on an iterative Hidden Markov Model algorithm using a multiple alignment profile generated from structural data on ACD. Using this procedure on the UniProt databank, we found 4478 sequences identified as sHSPs, showing a very good coverage with the corresponding PROSITE and Pfam profiles. ACD was then delimited and structurally annotated. We showed that taxonomic-based groups of sHSPs (animals, plants, bacteria) have unique features regarding the length of their ACD and, more specifically, the length of a large loop within ACD. We detailed highly conserved residues and patterns specific to the whole family or to some groups of sHSPs. For 96% of studied sHSPs, we identified in the C-terminal region a conserved I/V/L-X-I/V/L motif that acts as an anchor in the oligomerization process. The fragment defined from the end of ACD to the end of this motif has a mean length of 14 residues and was named the C-terminal Anchoring Module (CAM). CONCLUSIONS/SIGNIFICANCE: This work annotates structural components of ACD and quantifies properties of several thousand sHSPs. It gives a more accurate overview of the architecture of sHSP monomers

Public Library of Science (PLOS)

CiteSeerX

HAL-Inserm

Directory of Open Access Journals

PubMed Central

Hal-Diderot

Multidisciplinary ecosystem to study lifecourse determinants and prevention of early-onset burdensome multimorbidity (MELD-B) – protocol for a research collaboration

Author: Ashley Akbari
Rhiannon Owen
Roberta Chiovoloni
Publication venue: SAGE Publications
Publication date: 01/01/2023
Field of study

Background: Most people living with multiple long-term condition multimorbidity (MLTC-M) are under 65 (defined as ‘early onset’). Earlier and greater accrual of long-term conditions (LTCs) may be influenced by the timing and nature of exposure to key risk factors, wider determinants or other LTCs at different life stages. We have established a research collaboration titled ‘MELD-B’ to understand how wider determinants, sentinel conditions (the first LTC in the lifecourse) and LTC accrual sequence affect risk of early-onset, burdensome MLTC-M, and to inform prevention interventions. Aim: Our aim is to identify critical periods in the lifecourse for prevention of early-onset, burdensome MLTC-M, identified through the analysis of birth cohorts and electronic health records, including artificial intelligence (AI)-enhanced analyses. Design: We will develop deeper understanding of ‘burdensomeness’ and ‘complexity’ through a qualitative evidence synthesis and a consensus study. Using safe data environments for analyses across large, representative routine healthcare datasets and birth cohorts, we will apply AI methods to identify early-onset, burdensome MLTC-M clusters and sentinel conditions, develop semi-supervised learning to match individuals across datasets, identify determinants of burdensome clusters, and model trajectories of LTC and burden accrual. We will characterise early-life (under 18 years) risk factors for early-onset, burdensome MLTC-M and sentinel conditions. Finally, using AI and causal inference modelling, we will model potential ‘preventable moments’, defined as time periods in the life course where there is an opportunity for intervention on risk factors and early determinants to prevent the development of MLTC-M. Patient and public involvement is integrated throughout

Cronfa at Swansea University

: peel it

Author: Bornot Aurélie
De Brevern Alexandre
Faure Guilhem
Publication venue: 'Elsevier BV'
Publication date: 01/07/2009
Field of study

International audienceThree-dimensional structures of proteins are the support of their biological functions. Their folds are maintained by inter-residue interactions which are one of the main focuses to understand the mechanisms of protein folding and stability. Furthermore, protein structures can be composed of single or multiple functional domains that can fold and function independently. Hence, dividing a protein into domains is useful for obtaining an accurate structure and function determination. In previous studies, we enlightened protein contact properties according to different definitions and developed a novel methodology named Protein Peeling. Within protein structures, Protein Peeling characterizes small successive compact units along the sequence called protein units (PUs). The cutting done by Protein Peeling maximizes the number of contacts within the PUs and minimizes the number of contacts between them. This method is so a relevant tool in the context of the protein folding research and particularly regarding the hierarchical model proposed by George Rose. Here, we accurately analyze the PUs at different levels of cutting, using a non-redundant protein databank. Distribution of PU sizes, number of PUs or their accessibility are screened to determine their common and different features. Moreover, we highlight the preferential amino acid interactions inside and between PUs. Our results show that PUs are clearly an intermediate level between secondary structures and protein structural domains

HAL-Inserm

Hal-Diderot

ARGO: a web system for the detection of degenerate motifs and large-scale recognition of eukaryotic promoters

Author: Kolchanov Nikolay A.
Vishnevsky Oleg V.
Publication venue: Oxford University Press
Publication date: 01/01/2005
Field of study

Reliable recognition of the promoters in eukaryotic genomes remains an open issue. This is largely owing to the poor understanding of the features of the structural–functional organization of the eukaryotic promoters essential for their function and recognition. However, it was demonstrated that detection of ensembles of regulatory signals characteristic of specific promoter groups increases the accuracy of promoter recognition and prediction of specific expression features of the queried genes. The ARGO_Motifs package was developed for the detection of sets of region-specific degenerate oligonucleotide motifs in the regulatory regions of the eukaryotic genes. The ARGO_Viewer package was developed for the recognition of tissue-specific gene promoters based on the presence and distribution of oligonucleotide motifs obtained by the ARGO_Motifs program. Analysis and recognition of tissue-specific promoters in five gene samples demonstrated high quality of promoter recognition. The public version of the ARGO system is available at and

CiteSeerX

Crossref

PubMed Central

Serverification of Molecular Modeling Applications: the Rosetta Online Server that Includes Everyone (ROSIE)

Author: Bonneau Richard
Borgo Benjamin
Chou Fang-Chieh
Conchúir Shane Ó
Das Rhiju
Der Bryan S.
Drew Kevin
Gray Jeffrey J.
Havranek James J.
Kortemme Tanja
Kuhlman Brian
Kuroda Daisuke
Lyskov Sergey
Renfrew P. Douglas
Sripakdeevong Parin
Weitzner Brian D.
Xu Jianqing
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

The Rosetta molecular modeling software package provides experimentally tested and rapidly evolving tools for the 3D structure prediction and high-resolution design of proteins, nucleic acids, and a growing number of non-natural polymers. Despite its free availability to academic users and improving documentation, use of Rosetta has largely remained confined to developers and their immediate collaborators due to the code's difficulty of use, the requirement for large computational resources, and the unavailability of servers for most of the Rosetta applications. Here, we present a unified web framework for Rosetta applications called ROSIE (Rosetta Online Server that Includes Everyone). ROSIE provides (a) a common user interface for Rosetta protocols, (b) a stable application programming interface for developers to add additional protocols, (c) a flexible back-end to allow leveraging of computer cluster resources shared by RosettaCommons member institutions, and (d) centralized administration by the RosettaCommons to ensure continuous maintenance. This paper describes the ROSIE server infrastructure, a step-by-step 'serverification' protocol for use by Rosetta developers, and the deployment of the first nine ROSIE applications by six separate developer teams: Docking, RNA de novo, ERRASER, Antibody, Sequence Tolerance, Supercharge, Beta peptide design, NCBB design, and VIP redesign. As illustrated by the number and diversity of these applications, ROSIE offers a general and speedy paradigm for serverification of Rosetta applications that incurs negligible cost to developers and lowers barriers to Rosetta use for the broader biological community. ROSIE is available at http://rosie.rosettacommons.org

arXiv.org e-Print Archive

Directory of Open Access Journals

Digital Commons@Becker

PubMed Central

Carolina Digital Repository