5 research outputs found

    Partially-supervised protein subclass discovery with simultaneous annotation of functional residues

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The study of functional subfamilies of protein domain families and the identification of the residues which determine substrate specificity is an important question in the analysis of protein domains. One way to address this question is the use of clustering methods for protein sequence data and approaches to predict functional residues based on such clusterings. The locations of putative functional residues in known protein structures provide insights into how different substrate specificities are reflected on the protein structure level.</p> <p>Results</p> <p>We have developed an extension of the <it>context-specific independence </it>mixture model clustering framework which allows for the integration of experimental data. As these are usually known only for a few proteins, our algorithm implements a partially-supervised learning approach. We discover domain subfamilies and predict functional residues for four protein domain families: phosphatases, pyridoxal dependent decarboxylases, WW and SH3 domains to demonstrate the usefulness of our approach.</p> <p>Conclusion</p> <p>The partially-supervised clustering revealed biologically meaningful subfamilies even for highly heterogeneous domains and the predicted functional residues provide insights into the basis of the different substrate specificities.</p

    Length control of the "Yersinia" injectisome

    Get PDF
    Many pathogenic bacteria harbor a type III secretion system to translocate effector proteins from the bacterium into the host cell cytosol. This one-step translocation requires a nanomachinery which was termed injectisome. It consists of a basal body, spanning the two bacterial membranes, and a needle-like structure, bridging the distance between the bacterium and the target cell. Control of the length of the type III secretion injectisome needle is crucial for a correct function. In Yersinia, the YscP protein is involved in needle length control: the number of YscP residues directly correlates with needle length. In this thesis, this correlation was shown to be dependent on the secondary structure of YscP. By substitution of individual residues, needle length could be altered without changing the number of residues in YscP. The molecular ruler model was proposed for length control of the Yersinia injectisome needle. There are, however, two possibilities for the molecular ruler model regarding the amount of YscP needed for regulation of the needle length of one injectisome. In the static model, only one molecule of YscP and in a more dynamic model, several proteins are required for length control of one needle. Here, it was demonstrated that partially diploid bacteria, expressing a short and a long YscP simultaneously assemble distinct sets of short and long needles. These results suggest that only one YscP molecule is required for length control of one needle. In Yersinia, the YscU protein (a member of the export machinery) was suggested to be involved in the substrate specificity switch. Here, YscU was demonstrated to play a role in substrate recognition but not in substrate switching. Taken together, a refined model for length control of the Yersinia injectisome needle is proposed in this thesis, confirming the role of YscP as a molecular ruler

    PyMix - The Python mixture package - a tool for clustering of heterogeneous biological data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Cluster analysis is an important technique for the exploratory analysis of biological data. Such data is often high-dimensional, inherently noisy and contains outliers. This makes clustering challenging. Mixtures are versatile and powerful statistical models which perform robustly for clustering in the presence of noise and have been successfully applied in a wide range of applications.</p> <p>Results</p> <p>PyMix - the Python mixture package implements algorithms and data structures for clustering with basic and advanced mixture models. The advanced models include context-specific independence mixtures, mixtures of dependence trees and semi-supervised learning. PyMix is licenced under the <it>GNU General Public licence </it>(GPL). PyMix has been successfully used for the analysis of biological sequence, complex disease and gene expression data.</p> <p>Conclusions</p> <p>PyMix is a useful tool for cluster analysis of biological data. Due to the general nature of the framework, PyMix can be applied to a wide range of applications and data sets.</p
    corecore