8 research outputs found

    Protein secondary structure prediction with classifier fusion

    Get PDF
    The number of known protein sequences is increasing very rapidly. However, experimentally determining protein structure is costly and slow, so the number of proteins with known sequence but unknown structure is increasing. Thus, computational methods for prediction of structure of a protein from its amino acid sequence are very useful. In this thesis, we focus on the problem of a special type of protein structure prediction called secondary structure prediction. The problem of structure prediction can be analyzed in categories. Some sequences can be enriched by forming multiple alignment profiles, whereas some are single sequences where one cannot form profiles. We look into different aspects of both cases in this thesis. The first case we focus in this thesis is when multiple sequence alignment information exists. We introduce a novel feature extraction technique that extracts unigram, bigram and positional features from profiles using dimension reduction and feature selection techniques. We use both these novel features and regular raw features for classification. We experimented with the following types of first level classifiers: Linear Discriminant Classifier (LDCs), Support Vector Machines (SVMs) and Hidden Markov Models (HMMs). A novel method that combines these classifiers is introduced. Secondly, we focus on protein secondary structure prediction of single sequences. We explored different methods of training set reduction in order to increase the prediction accuracy of the IPSSP (Iterative Protein Secondary Structure Prediction) algorithm that was introduced before [34]. Results show that composition-based training set reduction is useful in prediction of secondary structures of orphan proteins

    Yetim proteinlerde ikincil yapı öngörüsü için eğitim kümesi indirgeme yöntemleri = Training set reduction methods for single sequence protein secondary structure prediction

    Get PDF
    Orphan proteins are characterized by the lack of significant sequence similarity to almost all proteins in the database. To infer the functional properties of the orphans, more elaborate techniques that utilize structural information are required. In this regard, the protein structure prediction gains considerable importance. Secondary structure prediction algorithms designed for orphan proteins (also known as single-sequence algorithms) cannot utilize multiple alignments or aligment profiles, which are derived from similar proteins. This is a limiting factor for the prediction accuracy. One way to improve the performance of a single-sequence algorithm is to perform re-training. In this approach, first, the models used by the algorithm are trained by a representative set of proteins and a secondary structure prediction is computed. Then, using a distance measure, the original training set is refined by removing proteins that are dissimilar to the initial prediction. This step is followed by the re-estimation of the model parameters and the prediction of the secondary structure. In this paper, we compare training set reduction methods that are used to re-train the hidden semi-Markov models employed by the IPSSP algorithm. We found that the composition based reduction method has the highest performance compared to the other reduction methods. In addition, threshold-based reduction performed bettern than the reduction technique that selects the first 80% of the dataset proteins

    Gözetim videolarında nesne ve olay tanımlama başarım analizi için veritabanı oluşturulması

    No full text
    In our era, surveillance systems are largely employed in the field of security and data gathering. The main driving force behind the expansion of these visual surveillance systems is due to the active use of visual surveillance systems in security applications. For these systems to be developed and to be able to act in real time, the performance of object and event detection algorithms must be improved. The objective comparison of detection algorithms will provide a concrete base to carry out research on this topic and lead to the measurement of real developments. To be able to conduct objective comparisons, databases which are accessible for research purposes and evaluation metrics are needed. Within the literature itself, a few evaluation metrics are defined, but databases that are accessible for research purposes are not common. This paper presents an analysis of a database which was formed within Sabancı University, based on surveillance systems’ use of object and detection algorithms. Finally the performance analysis of an object detection algorithm that was tested on the database is presented

    Training set reduction methods for protein secondary structure prediction in single-sequence condition

    No full text
    Orphan proteins are characterized by the lack of significant sequence similarity to database proteins. To infer the functional properties of the orphans, more elaborate techniques that utilize structural information are required. In this regard, the protein structure prediction gains considerable importance. Secondary structure prediction algorithms designed for orphan proteins (also known as single-sequence algorithms) cannot utilize multiple alignments or alignment profiles, which are derived from similar proteins. This is a limiting factor for the prediction accuracy. One way to improve the performance of a single-sequence algorithm is to perform re-training. In this approach, first, the models used by the algorithm are trained by a representative set of proteins and a secondary structure prediction is computed. Then, using a distance measure, the original training set is refined by removing proteins that are dissimilar to the given protein. This step is followed by the re-estimation of the model parameters and the prediction of the secondary structure. In this paper, we compare training set reduction methods that are used to re-train the hidden semi- Markov models employed by the IPSSP algorithm [1].We found that the composition based reduction method has the highest performance compared to the alignment based and the Chou- Fasman based reduction methods. In addition, threshold-based reduction performed better than the reduction technique that selects the first 80% of the dataset proteins

    Analyses of Allele-Specific Gene Expression in Highly Divergent Mouse Crosses Identifies Pervasive Allelic Imbalance

    No full text
    Complex human traits are influenced by variation in regulatory DNA through mechanisms that are not fully understood. Because regulatory elements are conserved between humans and mice, a thorough annotation of cis regulatory variants in mice could aid in further characterizing these mechanisms. Here we provide a detailed portrait of mouse gene expression across multiple tissues in a three-way diallel. Greater than 80% of mouse genes have cis regulatory variation. Effects from these variants influence complex traits and usually extend to the human ortholog. Further, we estimate that at least one in every thousand SNPs creates a cis regulatory effect. We also observe two types of parent-of-origin effects, including classical imprinting and a new global allelic imbalance in expression favoring the paternal allele. We conclude that, as with humans, pervasive regulatory variation influences complex genetic traits in mice and provide a new resource toward understanding the genetic control of transcription in mammals

    Analyses of allele-specific gene expression in highly divergent mouse crosses identifies pervasive allelic imbalance

    No full text
    Complex human traits are influenced by variation in regulatory DNA through mechanisms that are not fully understood. Since regulatory elements are conserved between humans and mice, a thorough annotation of cis regulatory variants in mice could aid in this process. Here we provide a detailed portrait of mouse gene expression across multiple tissues in a three-way diallel. Greater than 80% of mouse genes have cis regulatory variation. These effects influence complex traits and usually extend to the human ortholog. Further, we estimate that at least one in every thousand SNPs creates a cis regulatory effect. We also observe two types of parent-of-origin effects, including classical imprinting and a novel, global allelic imbalance in favor of the paternal allele. We conclude that, as with humans, pervasive regulatory variation influences complex genetic traits in mice and provide a new resource toward understanding the genetic control of transcription in mammals
    corecore