579 research outputs found

    Fast protein superfamily classification using principal component null space analysis.

    Get PDF
    The protein family classification problem, which consists of determining the family memberships of given unknown protein sequences, is very important for a biologist for many practical reasons, such as drug discovery, prediction of molecular functions and medical diagnosis. Neural networks and Bayesian methods have performed well on the protein classification problem, achieving accuracy ranging from 90% to 98% while running relatively slowly in the learning stage. In this thesis, we present a principal component null space analysis (PCNSA) linear classifier to the problem and report excellent results compared to those of neural networks and support vector machines. The two main parameters of PCNSA are linked to the high dimensionality of the dataset used, and were optimized in an exhaustive manner to maximize accuracy. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2005 .F74. Source: Masters Abstracts International, Volume: 44-03, page: 1400. Thesis (M.Sc.)--University of Windsor (Canada), 2005

    Crystal structure of archaeal RNase HII: a homologue of human major RNase H

    Get PDF
    AbstractBackground: RNases H are present in all organisms and cleave RNAs in RNA/DNA hybrids. There are two major types of RNases H that have little similarity in sequence, size and specificity. The structure of RNase HI, the smaller enzyme and most abundant in bacteria, has been extensively studied. However, no structural information is available for the larger RNase H, which is most abundant in eukaryotes and archaea. Mammalian RNase H participates in DNA replication, removal of the Okazaki fragments and possibly DNA repair.Results: The crystal structure of RNase HII from the hypothermophile Methanococcus jannaschii, which is homologous to mammalian RNase H, was solved using a multiwavelength anomalous dispersion (MAD) phasing method at 2 Ã… resolution. The structure contains two compact domains. Despite the absence of sequence similarity, the large N-terminal domain shares a similar fold with the RNase HI of bacteria. The active site of RNase HII contains three aspartates: Asp7, Asp112 and Asp149. The nucleotide-binding site is located in the cleft between the N-terminal and C-terminal domains.Conclusions: Despite a lack of any detectable similarity in primary structure, RNase HII shares a similar structural domain with RNase HI, suggesting that the two classes of RNases H have a common catalytic mechanism and possibly a common evolutionary origin. The involvement of the unique C-terminal domain in substrate recognition explains the different reaction specificity observed between the two classes of RNase H

    In Silico Prediction and Analysis of Caenorhabditis EF-hand Containing Proteins

    Get PDF
    Calcium (Ca+2) is a ubiquitous messenger in eukaryotes including Caenorhabditis. Ca+2-mediated signalling processes are usually carried out through well characterized proteins like calmodulin (CaM) and other Ca+2 binding proteins (CaBP). These proteins interact with different targets and activate it by bringing conformational changes. Majority of the EF-hand proteins in Caenorhabditis contain Ca+2 binding motifs. Here, we have performed homology modelling of CaM-like proteins using the crystal structure of Drosophila melanogaster CaM as a template. Molecular docking was applied to explore the binding mechanism of CaM-like proteins and IQ1 motif which is a ∼25 residues and conform to the consensus sequence (I, L, V)QXXXRXXXX(R,K) to serve as a binding site for different EF hand proteins. We made an attempt to identify all the EF-hand (a helix-loop-helix structure characterized by a 12 residues loop sequence involved in metal coordination) containing proteins and their Ca+2 binding affinity in Caenorhabditis by analysing the complete genome sequence. Docking studies revealed that F165, F169, L29, E33, F44, L57, M61, M96, M97, M108, G65, V115, F93, N104, E144 of CaM-like protein is involved in the interaction with IQ1 motif. A maximum of 170 EF-hand proteins and 39 non-EF-hand proteins with Ca+2/metal binding motif were identified. Diverse proteins including enzyme, transcription, translation and large number of unknown proteins have one or more putative EF-hands. Phylogenetic analysis revealed seven major classes/groups that contain some families of proteins. Various domains that we identified in the EF-hand proteins (uncharacterized) would help in elucidating their functions. It is the first report of its kind where calcium binding loop sequences of EF-hand proteins were analyzed to decipher their calcium affinities. Variation in Ca+2-binding affinity of EF-hand CaBP could be further used to study the behaviour of these proteins. Our analyses postulated that Ca+2 is likely to be key player in Caenorhabditis cell signalling

    Pervasive Cryptic Epistasis in Molecular Evolution

    Get PDF
    The functional effects of most amino acid replacements accumulated during molecular evolution are unknown, because most are not observed naturally and the possible combinations are too numerous. We created 168 single mutations in wild-type Escherichia coli isopropymalate dehydrogenase (IMDH) that match the differences found in wild-type Pseudomonas aeruginosa IMDH. 104 mutant enzymes performed similarly to E. coli wild-type IMDH, one was functionally enhanced, and 63 were functionally compromised. The transition from E. coli IMDH, or an ancestral form, to the functional wild-type P. aeruginosa IMDH requires extensive epistasis to ameliorate the combined effects of the deleterious mutations. This result stands in marked contrast with a basic assumption of molecular phylogenetics, that sites in sequences evolve independently of each other. Residues that affect function are scattered haphazardly throughout the IMDH structure. We screened for compensatory mutations at three sites, all of which lie near the active site and all of which are among the least active mutants. No compensatory mutations were found at two sites indicating that a single site may engage in compound epistatic interactions. One complete and three partial compensatory mutations of the third site are remote and lie in a different domain. This demonstrates that epistatic interactions can occur between distant (>20Ã…) sites. Phylogenetic analysis shows that incompatible mutations were fixed in different lineages

    MESSM: a framework for protein threading by neural networks and support vector machines

    Get PDF
    Protein threading, which is also referred to as fold recognition, aligns a probe amino acid sequence onto a library of representative folds of known structure to identify a structural similarity. Following the threading technique of the structural profile approach, this research focused on developing and evaluating a new framework - Mixed Environment Specific Substitution Mapping (MESSM) - for protein threading by artificial neural networks (ANNs) and support vector machines (SVMs). The MESSM presents a new process to develop an efficient tool for protein fold recognition. It achieved better efficiency while retained the effectiveness on protein prediction. The MESSM has three key components, each of which is a step in the protein threading framework. First, building the fold profile library-given a protein structure with a residue level environmental description, Neural Networks are used to generate an environment-specific amino acid substitution (3D-1D) mapping. Second, mixed substitution mapping--a mixed environment-specific substitution mapping is developed by combing the structural-derived substitution score with sequence profile from well-developed amino acid substitution matrices. Third, confidence evaluation--a support vector machine is employed to measure the significance of the sequence-structure alignment. Four computational experiments are carried out to verify the performance of the MESSM. They are Fischer, ProSup, Lindahl and Wallner benchmarks. Tested on Fischer, Lindahl and Wallner benchmarks, MESSM achieved a comparable performance on fold recognition to those energy potential based threading models. For Fischer benchmark, MESSM correctly recognise 56 out of 68 pairs, which has the same performance as that of COBLATH and SPARKS. The computational experiments show that MESSM is a fast program. It could make an alignment between probe sequence (150 amino acids) and a profile of 4775 template proteins in 30 seconds on a PC with IG memory Pentium IV. Also, tested on ProSup benchmark, the MESSM achieved alignment accuracy of 59.7%, which is better than current models. The research work was extended to develop a threading score following the threading technique of the contact potential approach. A TES (Threading with Environment-specific Score) model is constructed by neural networks
    • …
    corecore