18 research outputs found

    Protein Secondary Structure Prediction Based on Physicochemical Features and PSSM by KNN

    Get PDF
    In this paper, we propose a protein secondary structure prediction method based on the k-nearest neighborhood (KNN) technique with position-specific scoring matrix (PSSM) profiles, propensity matrix of amino acids in three conformations (HEC) and three physicochemical features; hydrophobicity, net charges, and side chain mass. First, the KNN with the optimal k-value is found. Then, the Euclidean distance of 26-dimensional data for each amino acid of a protein, to the data vectors of all other proteins are computed. The conformations of the nearest seven amino acids are pooled. Majority of the pooled votes is given to the amino acid of the quarry protein as the conformation H, E, or C. Finally, we use a filter to refine the predicted results from KNN. After filtering, the accuracy of the prediction goes up to the level of 90% for some proteins. This validates that considering PSSM, the propensity matrix, and physicochemical features may exhibit better performance

    Efficient Algorithm for Primer Design

    Get PDF
    PCR is one of the most popular technique that enable amplify specific region of the genome. It allows performing variety of analyses and application including DNA cloning for sequencing; functional analysis of genes; the diagnosis of hereditary diseases. Primer design is one of the most fundamental in PCR-based methods. Many different parameters need to be taken into account. There is many commercial program, some of them are online available, for the primer design. In this paper we present an algorithm which is not located in analyzing large sequence but for infrequent users

    Predicting the Secondary Structure of Proteins Using Artificial Neural Networks

    Get PDF
    A method for protein secondary structure prediction based on the use of artificial neural networks (ANN) is presented.  Amino acids, and their secondary structures obtained from National Center for Biotechnology Information (NCBI) and the online tool given in Chou-Fasman website of seven proteins are concatenated to create a sequence of 15536 residues. A neural network with only an input and an output layer is used, and back-propagation technique is adopted to tune the synaptic weights. Data is divided into two sets for training, and testing. The average success rate of the method on a testing set of proteins was 90.64% in training and 89.13% in testing on three types of secondary structure a-helix, β-sheet, and coil, with correct identification coefficients of . These quality indices are all compatible with those of previous methods. From computational experiments on real and artificial structures that no method based solely on local information in the protein sequence is likely to produce significantly better results for proteins

    Three Variable Cancer Angiogenesis models

    Get PDF
    In this paper we present several mathematical models of tumor growth and angiogenesis expressed by  systems of ODE’ s encoding the most essential observations and assumptions about the complex hierarchical interactive processes of tumor neo-vascularization (angiogenesis). The simplest modeling option presented merely captures the three independent variables mentioned earlier-tumor size N, total vessel volume V and the amount of protein P. We modify this model assuming that the protein is additionally consumed by growing vessels and obtain a model with protein consumption. Next models with time-delays are introduced. To make our models more realistic, two more compartments representing more complex vascularity and protein effects are introduced

    Regression Analysis to Predict the Secondary Structure of Proteins

    Get PDF
    A method is presented for protein secondary structure prediction based on the use of multidimensional regression. 200 proteins are chosen from RCSB Protein Database. Their secondary structures obtained through x-ray crystallography analyses are downloaded from the same source. Primary and secondary structure of proteins are concatenated separately to create a sequence of 169 026 residues. First 150 000 of the amino acid residues and corresponding secondary structures are chosen to create a regression model. The remaining 19 026 residues are used for testing. Since we expect three outputs a-helices "S", b-sheets "H", and coiled coils "C", our regression modes consists of  parameters. These parameters are tuned and a correct classification rate of 62.50% is achieved on the test data. Furthermore, the performance of the regression model compared with online secondary structure estimation algorithms on 14 unused proteins, and the performance of the regression model is found comparable with the online estimation tools

    Virus Induced Gene Silencing of prolyl-4-hydroxylase Induce alteration in Plant Growth

    Get PDF
    Prolyl 4-hydroxylases (P4Hs) belong to the family of 2-oxoglutarate-dependent dioxygenases and catalyze the formation of 4-hydroxyproline, requiring 2-oxoglutarate and O2 as co-substrates and Fe2+ as a co-factor. Nine putative P4Hs have been identified up to now in the tomato genome. A reverse genetics approach, Virus-Induced Gene Silencing, was used to silence two P4Hs, P4H3 and P4H8. Transient silencing of P4H3 and P4H8 altered tomato plant growth. P4H3- and P4H8-VIGS plants had shorter stems, reduced fresh weight and smaller leaves. The effect of VIGS application in different plant organs such as root, shoot, and root fresh weight, shoot fresh weight, leaves area were analyzed. Linear correlation was measured by calculating the Pearson correlation coefficiency in order to see if there are correlations between these variables. These results indicate that P4H3 and P4H8 play a significant role in tomato plant growth and VIGS is a useful tool to study the function of tomato gene families such as P4Hs in growth and development, and elucidate, in the long run, the physiological significance of substrate-proteins such as hydroxyproline-rich glycoproteins

    Predicting the Secondary Structure of Proteins by the Use of Hamming Distances and Alignment Scores

    Get PDF
    Researchers are confident about the validity of the basic hypothesis that the secondary and tertiary structures of a protein are uniquely determined by its sequence of amino acids, that is its primary structure. In this article we use a database of 200 proteins. To find the secondary structure of a new protein, the first thirteen residues of this protein are taken as a substring. Then the conformations of the central amino acids of thirteen residue substrings of the proteins in the database, whose hamming distances are less than a given threshold or alignment scores exceed a given limit are collected in a basin. The commonest conformation in this basin is attached as the conformation of the central amino acid of the substring of the unknown protein. Using this technique, for MHsim threshold 3.0, a correct estimation rate of 53.4% is obtained with 4.74% indecisives and for MHsim threshold  5.0, the success was 56.93% with76.59% indecisives. When the half of the proteins, whose secondary structure estimations are higher, subjected to same calculation the following results are obtained; for MHsim threshold 3.0, correct estimation rate is 79.52% with 58.87% indecisives and for MHsim threshold 5.0, correct estimation rate is 65.52% with 5.02% indecisives. Average correct estimation rate for the alignment scores was %54

    Protein Secondary Structure Prediction by Using PSSM Pseudo Digital Image of Proteins

    Get PDF
    Protein secondary structure prediction is one of the hot topics of bioinformatics and computational biology. In this article we present a new method to predict secondary structure of proteins. PSSMs of proteins are used to generate pseudo image of proteins. These protein images are used to extract digital image features. Digital image features vectors used for similarity analysis. We believe that PSSM pseudo digital images of proteins could help us to represent protein global intrinsic information in order find globally similar proteins and use these similar proteins during prediction. Highest prediction accuracy for Q3 recorded as 72.1% by using the system. Beside the high accuracy, this method allows us to shorten computational time for predicting secondary structure of proteins

    Secondary Structure Segments are Much More Conserved than Primary Sequence Segments

    Get PDF
    oai:ojs.scjournal.ius.edu.ba:article/99To be biologically functional, all proteins must adopt specific folded three-dimensional structures. Some believes in that the genetic information for the protein specifies only the primary structure, the linear sequence of amino acids in the polypeptide backbone, and most purified proteins can spontaneously refold in vitro after being completely unfolded, so the three-dimensional structure must be determined by the primary structure (Creighton, 1990). How this occurs has come to be known as 'the protein folding problem'. As a part of the protein folding problem, the existence of similar substrings in diverse proteins is remarkable. Some scientist call it “conserved core” which echoes the claim that all proteins diversified from a common ancestor protein, and these similar pieces of the two or several proteins are the substrings that resisted the pressure of the evolution. Due to naturally-occurring (DNA fails to copy accurately) and external influences just like ultraviolet radiation, electromagnetic fields, atomic radiations, protein coding genes and proteins may undergo some changes by the time in response to mutations. The rate of these mutations is strongly correlated to the intensity of the environmental conditions, and it is not possible to estimate a constant rate just in the case of radioactive decay. Also there is no much evidence that the diversity of proteins relies on only these mutations. For this reason we prefer the term "similar substrings". In this paper we focused in the relation between primary and secondary structure mismatches of the substrings of length seventeen residues. We have seen that the mismatches in the corresponding secondary structure sequence substrings of the same length lags behind primary mismatches. We constructed a conditional probability landscape that resembles the conditional probability of a certain secondary substring mismatch given the primary substring mismatch. This landscape shows that even when 6-7 mismatches exist in two primary substrings of length 17 that belong to the two different proteins, the probability of full match of corresponding secondary structure substrings is remarkable. We downloaded primary and secondary sequences of all 303,524 proteins of the PDB protein databank. Eliminating the duplicates and proteins of residue length less than 30, we have got a non redundant database of 80,592. We developed a search algorithm FIND-SIM to find similar primary sequence substrings in a query protein and target proteins. Some examples of full secondary structure matches of short substrings corresponding to short primary structure substrings with high mismatches are given

    Leaf Area Assessment By Image Analysis

    Get PDF
    A measurement of leaf area is apparently simple and fundamental.  Various methods are available to measure leaf area. Available methods are time consuming, humdrum and laborious. Less expensive methods involving image processing based on video camera images and computer programs for analysis of these images have been alternative for all other techniques for leaf area assessment.  In this paper we introduce a computer program that can calculate leaf area meter based on the pixel count in very short time and highly accurate by image processing. This program provides very fast, inexpensive and highly accurate measurement
    corecore