28 research outputs found

    Structural pattern detection and domain recognition for protein function prediction

    Get PDF
    Proteins are essential players of the cell that control and affect all functions. In proteins, structural patterns consist of a few amino acids which assemble in a specific arrangement. Due to their specific structures, they are recognized as the functionally important sites of the proteins, and conserved even in distantly related proteins. Moreover, several structural patterns merge and form domains which are also associated with the proteins function. In this work, we introduced a method for finding structure patterns common to a protein pair by using graphlet mappings. We presented protein structures with graphs, and then generate graphlets. Local alignments are produced by mapping the generated graphlets from protein pairs. Moreover, by merging these local alignments, we tried to recognize functionally important domains. These common domains are very useful in protein function prediction, fold classification and homology relationship detection. In this work, our algorithm was first applied to fold classification problem and 80% accuracy was observed. Furthermore, our algorithm was also used for protein function prediction and 97% accuracy was observed

    Evolutionary approaches for feature selection in biological data

    Get PDF
    Data mining techniques have been used widely in many areas such as business, science, engineering and medicine. The techniques allow a vast amount of data to be explored in order to extract useful information from the data. One of the foci in the health area is finding interesting biomarkers from biomedical data. Mass throughput data generated from microarrays and mass spectrometry from biological samples are high dimensional and is small in sample size. Examples include DNA microarray datasets with up to 500,000 genes and mass spectrometry data with 300,000 m/z values. While the availability of such datasets can aid in the development of techniques/drugs to improve diagnosis and treatment of diseases, a major challenge involves its analysis to extract useful and meaningful information. The aims of this project are: 1) to investigate and develop feature selection algorithms that incorporate various evolutionary strategies, 2) using the developed algorithms to find the “most relevant” biomarkers contained in biological datasets and 3) and evaluate the goodness of extracted feature subsets for relevance (examined in terms of existing biomedical domain knowledge and from classification accuracy obtained using different classifiers). The project aims to generate good predictive models for classifying diseased samples from control

    College of Engineering

    Full text link
    Cornell University Courses of Study Vol. 94 2002/200

    Myelin is remodeled cell-autonomously by oligodendroglial macroautophagy

    Get PDF
    Myelination of axons in the CNS by oligodendrocytes (OLs) is critical for the rapid and reliable conduction of action potentials down neuronal axons, as evidenced by the severe disabilities associated with myelin loss in multiple sclerosis and other diseases of myelin. The specification, differentiation, and maturation of OLs along with myelin formation by OLs have been thoroughly characterized. How myelin is turned over, however remains unclear. It is unsurprising that little is known about myelin turnover considering that for decades following their discovery, myelin and OLs were considered static elements in the adult nervous system. Recent evidence, however, shows that myelin in the CNS is actually plastic. Moreover, myelin remodeling in humans has been suggested to be mediated by mature OLs. As mature OLs have limited capacity to generate new myelin sheaths, we must ask whether mature OLs can remodel the myelin at preexisting myelin sheaths. One intriguing but unproven possibility is that myelin at individual internodes may be remodeled cell-autonomously by mature OLs to modulate neuronal circuit function. Macroautophagy (MA) is responsible for the lysosome-mediated elimination of cytosolic proteins, lipids, and organelles. MA achieves this by capturing cargo in bulk or selectively in a transient, multilamellar structure known as an autophagosome (AP). In this study, we used a combination of in vivo and cellular approaches to test the hypothesis that MA in OLs may be important for myelin remodeling in the adult CNS. We establish that myelin of individual internodes is remodeled, and does so through the coordinated efforts of endocytosis and MA. We found that autophagy protein Atg7 is essential for myelin remodeling in vivo: loss of Atg7 in OLs leads to an age-dependent increase of myelin at the internode and the formation of aberrant myelin structures, most notably myelin outfoldings. In addition, we find that MA has the potential to occur throughout the mature OL, and examination of OLs in culture suggests that formation of a mature AP structure, the amphisome, is required to facilitate the efficient degradation of myelin-containing endocytic structures. Together, we propose that myelin is a dynamic structure that is regularly remodeled through the cooperative efforts of MA and endocytosis. These findings raise the possibility that myelin remodeling is involved in neural plasticity and the tuning of neural circuits

    Novel approaches to the integration and analysis of systems biology data

    Get PDF
    The opportunity to investigate whole cellular systems using experimental and computational high-throughput methods leads to the generation of unprecedented amounts of data. Processing of these data often results in large lists of genes or proteins that need to be analyzed and interpreted in the context of all other biological information that is already available. To support such analyses, repositories aggregating and merging the biological information contained in different databases are required. To address this need, we created an integrative data warehouse containing millions of up-to-date annotations related to human genes and proteins from over thirty major molecular biology databases. In particular, this data warehouse was instrumental in assessing the data quality of human protein interactions and in predicting an important, but largely unidentified, group of proteins that function as molecular scaffolds in the formation of signaling cascades. Additionally, the data warehouse enabled us to devise the novel computational method BioSim for the discovery of biological relationships based on the functional similarity of gene and protein annotations. Furthermore, we showed how this method allows identifying disease-associated genes. To facilitate the analysis and interpretation of large lists of genes or proteins derived from high-throughput methods, we built the new web portal BioMyn. It provides a powerful search engine with public access to the data warehouse and the BioSim method. BioMyn also offers a number of useful tools for own functional enrichment analysis and the visualization of the results.Die Möglichkeit, ganze zelluläre Systeme mit experimentellen und computerbasierten Hochdurchsatz-Methoden zu erforschen, führt zur Generierung beispielloser Datenmengen. Die Verarbeitung dieser Daten ergibt oft große Listen von Genen oder Proteinen, die im Kontext all der anderen bereits vorhandenen, biologischen Informationen analysiert und interpretiert werden müssen. Um solche Analysen zu unterstützen, werden Datensammlungen benötigt, die die in verschiedenen Datenbanken enthaltenen biologischen Informationen zusammenführen und verknüpfen. Um diesem Bedarf Rechnung zu tragen, wurde ein integratives Data-Warehouse angelegt, das Millionen aktueller Annotationen bezüglich humaner Gene und Proteine aus über dreißig wichtigen Datenbanken der Molekularbiologie beinhaltet. Insbesondere war dieses Data-Warehouse nützlich bei der Bewertung der Datenqualität humaner Proteininteraktionen und bei der Vorhersage einer bedeutenden, jedoch größtenteils unidentifizierten Gruppe von Proteinen, die als molekulares Gerüst der Bildung von Signalkaskaden dienen. Zudem ermöglichte es das Data-Warehouse, die neuartige Computermethode BioSim zur Aufdeckung biologischer Ähnlichkeiten, basierend auf funktionellen Ähnlichkeiten von Gen- und Proteinannotationen, zu entwickeln. Des Weiteren wurde gezeigt, wie diese Methode die Identifizierung krankheitsassoziierter Gene erlaubt. Um die Analyse und Interpretation großer Listen von Genen oder Proteinen, die aus Hochdurchsatz-Methoden stammen, zu erleichtern, wurde das neue Webportal BioMyn geschaffen. Es bietet eine starke Suchmaschine, die das Data-Warehouse und die BioSim-Methode öffentlich zugänglich macht. Auch stellt BioMyn eine Reihe praktischer Tools für eigene Funktionsanalysen und die Visualisierung der Ergebnisse zur Verfügung

    Efficient algorithms and architectures for protein 3-D structure comparison

    Get PDF
    Η σύγκριση δομών πρωτεϊνών είναι ανεπτυγμένος τομέας της υπολογιστικής πρωτεϊνωμικής που χρησιμοποιείται ευρέως στη δομική βιολογία και την ανακάλυψη φαρμάκων. Οι αυξανόμενες υπολογιστικές απαιτήσεις του είναι αποτέλεσμα τριών παραγόντων: ταχεία επέκταση των βάσεων δεδομένων με νέες δομές πρωτεϊνών, υψηλή υπολογιστική πολυπλοκότητα των αλγορίθμων σύγκρισης δομών πρωτεϊνών κατά ζεύγη (PSC), και τάση χρήσης πολλαπλών μεθόδων σύγκρισης και συνδυασμού των αποτελεσμάτων τους (multi criteria protein structure comparison-MCPSC-), μιας και δεν υπάρχει PSC μέθοδος κοινά αποδεκτή ως η καλύτερη. Αναπτύξαμε πλαίσιο λογισμικού που εκμεταλλεύεται επεξεργαστές πολλών πυρήνων για την υλοποίηση παράλληλων στρατηγικών MCPSC με βάση τρεις δημοφιλείς PSC μεθόδους, τις TMalign, CE και USM. Συγκρίνουμε την απόδοση και αποδοτικότητα δύο παράλληλων υλοποιήσεων MCPSC στον πειραματικό επεξεργαστή δικτύου σε ψηφίδα (Network on Chip)  Intel Single-Chip Cloud Computer και τον δημοφιλή επεξεργαστή Intel Core i7. Επιπλέον, αναπτύξαμε εκτενές υπολογιστικό pipeline και υλοποίησή του με πρόγραμμα Python, που ονομάζεται pyMCPSC, που επιτρέπει στους χρήστες να εκτελούν MCPSC διεργασίες σε επεξεργαστές πολλαπλών πυρήνων. Το pyMCPSC, το οποίο συνδυάζει πέντε μεθόδους PSC και υποστηρίζει πέντε διαφορετικά σχήματα συναίνεσης MCPSC, υποστηρίζει τη συγκριτική ανάλυση μεγάλων συνόλων με δομές πρωτεϊνών και μπορεί να επεκταθεί ώστε να ενσωματώσει και νέες μεθόδους PSC στις βαθμολογίες συναίνεσης, καθώς αυτές καθίστανται διαθέσιμες.Protein Structure Comparison (PSC) is a well developed field of computational proteomics with active interest since it is widely used in structural biology and drug discovery. Fast increasing computational demand for all-to-all protein structures comparison is a result of mainly three factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise PSC algorithms, and the trend towards using multiple criteria for comparison and combining their results (MCPSC). In this thesis we have developed a software framework that exploits many-core and multi-core CPUs to implement efficient parallel MCPSC schemes in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of two parallel MCPSC implementations using Intel’s experimental many-core Single-Chip Cloud Computer (SCC) CPU as well as Intel’s Core i7 multi-core processor. Further, we have developed a dataset processing pipeline and implemented it in a Python utility, called pyMCPSC, allowing users to perform MCPSC efficiently on multi-core CPU. pyMCPSC, which combines five PSC methods and five different consensus scoring schemes, facilitates the analysis of similarities in protein domain datasets and can be easily extended to incorporate more PSC methods in the consensus scoring as they are becoming available

    Recent Developments in Annexin Biology

    Get PDF
    Discovered over 40 years ago, the annexin proteins were found to be a structurally conserved subgroup of Ca2+-binding proteins. While the initial research on annexins focused on their signature feature of Ca2+-dependent binding to membranes, over the years, the biennial “Annexin” conference series has highlighted additional diversity in the functions attributed to the annexin family of proteins. The roles of these proteins now extend from basic science to biomedical research, and are being translated into clinical settings. Research on annexins involves a global network of researchers and the 10th biennial Annexin conference brought together over 80 researchers from ten European countries, USA, Brazil, Singapore, Japan, and Australia for 3 days in September 2019. In this conference, the discussions focused on two distinct themes — the role of annexins in cellular organization and health and disease. The articles published in this Special Issue cover these two main themes discussed at the conference, offering a glimpse into some of the notable findings in the field of annexin biolog

    Enhancing systems biology models through semantic data integration

    Get PDF
    Studying and modelling biology at a systems level requires a large amount of data of different experimental types. Historically, each of these types is stored in its own distinct format, with its own internal structure for holding the data produced by those experiments. While the use of community data standards can reduce the need for specialised, independent formats by providing a common syntax, standards uptake is not universal and a single standard cannot yet describe all biological data. In the work described in this thesis, a variety of integrative methods have been developed to reuse and restructure already extant systems biology data. SyMBA is a simple Web interface which stores experimental metadata in a published, common format. The creation of accurate quantitative SBML models is a time-intensive manual process. Modellers need to understand both the systems they are modelling and the intricacies of the SBML format. However, the amount of relevant data for even a relatively small and well-scoped model can be overwhelming. Saint is a Web application which accesses a number of external Web services and which provides suggested annotation for SBML and CellML models. MFO was developed to formalise all of the knowledge within the multiple SBML specification documents in a manner which is both human and computationally accessible. Rule-based mediation, a form of semantic data integration, is a useful way of reusing and re-purposing heterogeneous datasets which cannot, or are not, structured according to a common standard. This method of ontology-based integration is generic and can be used in any context, but has been implemented specifically to integrate systems biology data and to enrich systems biology models through the creation of new biological annotations. The work described in this thesis is one step towards the formalisation of biological knowledge useful to systems biology. Experimental metadata has been transformed into common structures, a Web application has been created for the retrieval of data appropriate to the annotation of systems biology models and multiple data models have been formalised and made accessible to semantic integration techniques.EThOS - Electronic Theses Online ServiceBBSRCEPSRCGBUnited Kingdo
    corecore