8 research outputs found

    An Approach to Developing Benchmark Datasets for Protein Secondary Structure Segmentation from Cryo-EM Density Maps

    Get PDF
    More and more deep learning approaches have been proposed to segment secondary structures from cryo-electron density maps at medium resolution range (5--10Å). Although the deep learning approaches show great potential, only a few small experimental data sets have been used to test the approaches. There is limited understanding about potential factors, in data, that affect the performance of segmentation. We propose an approach to generate data sets with desired specifications in three potential factors - the protein sequence identity, structural contents, and data quality. The approach was implemented and has generated a test set and various training sets to study the effect of secondary structure content and data quality on the performance of DeepSSETracer, a deep learning method that segments regions of protein secondary structures from cryo-EM map components. Results show that various content levels in the secondary structure and data quality influence the performance of segmentation for DeepSSETracer

    Tracing Actin Filament Bundles in Three-Dimensional Electron Tomography Density Maps of Hair Cell Stereocilia

    Get PDF
    Cryo-electron tomography (cryo-ET) is a powerful method of visualizing the three-dimensional organization of supramolecular complexes, such as the cytoskeleton, in their native cell and tissue contexts. Due to its minimal electron dose and reconstruction artifacts arising from the missing wedge during data collection, cryo-ET typically results in noisy density maps that display anisotropic XY versus Z resolution. Molecular crowding further exacerbates the challenge of automatically detecting supramolecular complexes, such as the actin bundle in hair cell stereocilia. Stereocilia are pivotal to the mechanoelectrical transduction process in inner ear sensory epithelial hair cells. Given the complexity and dense arrangement of actin bundles, traditional approaches to filament detection and tracing have failed in these cases. In this study, we introduce BundleTrac, an effective method to trace hundreds of filaments in a bundle. A comparison between BundleTrac and manually tracing the actin filaments in a stereocilium showed that BundleTrac accurately built 326 of 330 filaments (98.8%), with an overall cross-distance of 1.3 voxels for the 330 filaments. BundleTrac is an effective semi-automatic modeling approach in which a seed point is provided for each filament and the rest of the filament is computationally identified. We also demonstrate the potential of a denoising method that uses a polynomial regression to address the resolution and high-noise anisotropic environment of the density map

    Deep Learning for Segmentation Of 3D Cryo-EM Images

    Get PDF
    Cryo-electron microscopy (cryo-EM) is an emerging biophysical technique for structural determination of protein complexes. However, accurate detection of secondary structures is still challenging when cryo-EM density maps are at medium resolutions (5-10 Å). Most existing methods are image processing methods that do not fully utilize available images in the cryo-EM database. In this paper, we present a deep learning approach to segment secondary structure elements as helices and β-sheets from medium- resolution density maps. The proposed 3D convolutional neural network is shown to detect secondary structure locations with an F1 score between 0.79 and 0.88 for six simulated test cases. The architecture was also applied to experimentally-derived cryo- EM density regions of 571 protein chains. . The average F1 score for helix detection is 0.747 and 0.674 for β-sheets in a test involving seven cryo-EM density regions. Additionally, we extend an arc-length association method to β -strands and show that this method for measuring error is superior to many popular methods. An interactive tool is also presented that can visualize the results of this arc-length association method

    Cryogenic Electron Microscopy protein structure modelling

    Get PDF
    Τα τελευταία χρόνια, ο αριθμός των λυμένων δομών κρυογονικής ηλεκτρονική μικροσκοπίας στη βάση δεδομένων PDB έχει αυξηθεί εκθετικά. Ταυτόχρονα οι πρόσφατες πρόοδοι στις τεχνολογίες υλικών και την πληροφορική επέτρεψαν στην μέθοδο να επιλύει δομές σε ατομικές διακριτότητες. Η κρυογονική ηλεκτρονική μικροσκοπία βοηθά τους ερευνητές να λύνουν εύκολα και γρήγορα πρωτεϊνικές δομές και μεγάλα πρωτεϊνικά σύμπλοκα με σκοπό να κατανοήσουν και να μελετούν μοριακούς μηχανισμούς είτε για ακαδημαϊκούς σκοπούς είτε για φαρμακευτικές εφαρμογές. Η παρούσα διπλωματική προσπαθεί να επιλύσει το πρόβλημα της μοντελοποίησης πρωτεϊνικών δομών που προέρχονται από δεδομένα ηλεκτρονικής μικροσκοπίας, όσο το δυνατόν γρηγορότερα και ακριβέστερα. Για να το επιτύχουμε αυτό, χρησιμοποιούμε τεχνικές βαθιάς μάθησης, για κατάτμηση εικόνας, εισάγουμε τρία τέτοια μοντέλα που προβλέπουν διαφορετικά χαρακτηριστικά της πρωτεϊνικής δομής, όπως είναι τα άτομα CA, η κύρια και πλευρικές αλυσίδες και τα στοιχεία δευτεροταγής δομής. Αυτά τα μοντέλα εκπαιδεύονται από ένα πρότυπο σετ δεδομένων ηλεκτρονικής μικροσκοπίας, που αναπτύξαμε. Συνολικά η ακολουθία των διαδικάσιών περιέχει βήματα πριν το νευρωνικό δίκτυο για την επεξεργασία και την προετοιμασία των δεδομένων καθώς και μετά για την τελική μοντελοποίηση της πρωτεϊνικής δομής από τις προβλέψεις του νευρωνικού δικτύου. Το σύνολο των διαδικασιών είναι αυτοματοποιημένες και δεν χρειάζεται καμία άλλη πληροφορία πλην των πειραματικών δεδομένων ηλεκτρονικής μικροσκοπίας. Εφαρμόσαμε το λογισμικό μας σε 50 πειραματικούς χάρτες ηλεκτρονιακής πυκνότητας διακριτότητας από 2.6 μέχρι 4.4 Å, που είχαν δημοσιευτεί στο παρελθόν. Στα αποτελέσματα μας το μέσο ποσοστό αντιστοίχισης CA ανθράκων άγγιξε το 77,4%, με τη μέση απόκλιση ρίζας-μέσου-τετραγώνου (RMSD) στα 0.84 Å, σε σύγκριση με τις πραγματικές δομές που έχουν κατατεθεί στην PDB.During the last years, the number of structures from cryogenic electron microscopy in Protein Data Bank PDB has grown exponentially, at the same time recent advances in technologies and informatics have allowed cryo-EM to reach atomic resolutions. Cryo-EM helps researchers to determine protein structures and large complexes in order to understand and study molecular mechanisms either for academic purposes or pharmaceutic applications. This thesis attempts to tackle the problem of modelling molecular structures from cryo-EM data, fast and accurately. To achieve this, we use deep learning techniques for image segmentation; we introduce three models which are able to predict different elements of the protein structure. These elements are CA atoms, backbone and side chains and secondary structure elements. The models are trained with an exemplar cryo-EM dataset that we constructed. The pipeline also contains pre-processing steps to prepare the cryo-EM data for the deep neural network and post-processing steps to model the molecular structure of the protein from the predictions of the neural network. All steps, operate automatically in our pipeline and do not need any prior information about the protein. We applied our pipeline on 50 previously tested experimental maps of resolutions between 2.6 and 4.4 Å. In our results we report an average CA percentage match of 77.4%, with an RMSD of 0.84 Å, against ground truth structures deposited in PDB

    A preliminary study of micro-gestures:dataset collection and analysis with multi-modal dynamic networks

    Get PDF
    Abstract. Micro-gestures (MG) are gestures that people performed spontaneously during communication situations. A preliminary exploration of Micro-Gesture is made in this thesis. By collecting recorded sequences of body gestures in a spontaneous state during games, a MG dataset is built through Kinect V2. A novel term ‘micro-gesture’ is proposed by analyzing the properties of MG dataset. Implementations of two sets of neural network architectures are achieved for micro-gestures segmentation and recognition task, which are the DBN-HMM model and the 3DCNN-HMM model for skeleton data and RGB-D data respectively. We also explore a method for extracting neutral states used in the HMM structure by detecting the activity level of the gesture sequences. The method is simple to derive and implement, and proved to be effective. The DBN-HMM and 3DCNN-HMM architectures are evaluated on MG dataset and optimized for the properties of micro-gestures. Experimental results show that we are able to achieve micro-gesture segmentation and recognition with satisfied accuracy with these two models. The work we have done about the micro-gestures in this thesis also explores a new research path for gesture recognition. Therefore, we believe that our work could be widely used as a baseline for future research on micro-gestures

    Prediction of Secondary Protein Structure

    Get PDF
    Αυτό το έργο στοχεύει να δείξει στους αναγνώστες του μια προσπάθεια για την επίλυση του προβλήματος πρόβλεψης της δευτερογενούς δομής πρωτεΐνης χρησιμοποιώντας βαθιά υπολειμματικά νευρωνικά δίκτυα και άλλες μεθόδους. Οι πρωτεΐνες είναι ένα από τα πιο ζωτικά συστατικά κάθε ζωντανού όντος. Παίζουν πολύ σημαντικό ρόλο καθώς καθορίζουν τις λειτουργίες ενός οργανισμού. Επομένως, η γνώση της δομής της πρωτεΐνης είναι μεγάλης σημασίας. Συγκεκριμένα, η δομή της πρωτεΐνης αποτελείται από τέσσερα επίπεδα. πρωτοταγής, δευτεροταγής, τριτοταγής και τεταρτοταγής πρωτεϊνική δομή. Η πιο σημαντική είναι η δομή στον τρισδιάστατο χώρο, η τριτοταγής δομή , γιατί αυτή καθορίζει τον βιολογικό ρόλο της πρωτεΐνης. Ως αποτέλεσμα, η γνώση των πρωτεϊνικών λειτουργιών μπορεί να βοηθήσει στη θεραπεία πολλών ασθενειών. Δυστυχώς, οι μεθοδολογίες εξαγωγών που έχουν αναπτυχθεί μέχρι τώρα, είναι πολύ περίπλοκες και χρονοβόρες διαδικασίες. Ο ορισμός της δευτεροταγής δομής είναι απαραίτητος για την εξαγωγή της τριτοταγής δομής και αυτός είναι ο λόγος που μελετάται. Η δευτεροταγής δομή εξάγεται από την πρωτοταγή δομή, η οποία περιλαμβάνει μια αλληλουχία αμινοξέων. Σε αυτό το έργο θα αναλυθούν κυρίως τα βαθιά υπολειμματικά δίκτυα και ο τρόπος που μπορούν να βοηθήσουν στην πρόβλεψη της δευτεροταγούς δομής της πρωτεΐνης. Τέτοια δίκτυα ανήκουν στην κατηγορία των βαθιών νευρωνικών δικτύων, τα οποία ουσιαστικά αποτελούνται από συγκλίνοντα επίπεδα με προσθετικές συνδέσεις μεταξύ τους.This project aims to show its readers an effort for the solution of the prediction problem of the protein secondary structure using deep residual neural networks and other methods. Proteins are one of the most vital components of every living being. They play a quite important role as they define the functions of an organism. Therefore, knowing the protein structure is of great importance. Specifically, protein structure consists of four levels; primary, secondary, tertiary and quaternary protein structure. The most significant is the structure in the three-dimensional space, the tertiary structure because this one defines the biological role of the protein. As a result, knowing the protein functions may help the treatment of many diseases. Unfortunately, the export methodologies that are developed so far, are very complicated and time-wasting procedures. The definition of the secondary structure is needed to export the tertiary structure and that is the reason it is studied. The secondary structure is exported by the primary structure, which includes an amino acid sequence. In this project the deep residual networks and the way they can help for the prediction of the protein secondary structure will mainly be analyzed. Such networks belong to the category of deep residual neural ones, which essentially consist of convergent levels with additive connections among them

    Predicting the Future

    Get PDF
    Due to the increased capabilities of microprocessors and the advent of graphics processing units (GPUs) in recent decades, the use of machine learning methodologies has become popular in many fields of science and technology. This fact, together with the availability of large amounts of information, has meant that machine learning and Big Data have an important presence in the field of Energy. This Special Issue entitled “Predicting the Future—Big Data and Machine Learning” is focused on applications of machine learning methodologies in the field of energy. Topics include but are not limited to the following: big data architectures of power supply systems, energy-saving and efficiency models, environmental effects of energy consumption, prediction of occupational health and safety outcomes in the energy industry, price forecast prediction of raw materials, and energy management of smart buildings

    Aceleración de algoritmos de procesamiento de imágenes para el análisis de partículas individuales con microscopia electrónica

    Full text link
    Tesis Doctoral inédita cotutelada por la Masaryk University (República Checa) y la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Ingeniería Informática. Fecha de Lectura: 24-10-2022Cryogenic Electron Microscopy (Cryo-EM) is a vital field in current structural biology. Unlike X-ray crystallography and Nuclear Magnetic Resonance, it can be used to analyze membrane proteins and other samples with overlapping spectral peaks. However, one of the significant limitations of Cryo-EM is the computational complexity. Modern electron microscopes can produce terabytes of data per single session, from which hundreds of thousands of particles must be extracted and processed to obtain a near-atomic resolution of the original sample. Many existing software solutions use high-Performance Computing (HPC) techniques to bring these computations to the realm of practical usability. The common approach to acceleration is parallelization of the processing, but in praxis, we face many complications, such as problem decomposition, data distribution, load scheduling, balancing, and synchronization. Utilization of various accelerators further complicates the situation, as heterogeneous hardware brings additional caveats, for example, limited portability, under-utilization due to synchronization, and sub-optimal code performance due to missing specialization. This dissertation, structured as a compendium of articles, aims to improve the algorithms used in Cryo-EM, esp. the SPA (Single Particle Analysis). We focus on the single-node performance optimizations, using the techniques either available or developed in the HPC field, such as heterogeneous computing or autotuning, which potentially needs the formulation of novel algorithms. The secondary goal of the dissertation is to identify the limitations of state-of-the-art HPC techniques. Since the Cryo-EM pipeline consists of multiple distinct steps targetting different types of data, there is no single bottleneck to be solved. As such, the presented articles show a holistic approach to performance optimization. First, we give details on the GPU acceleration of the specific programs. The achieved speedup is due to the higher performance of the GPU, adjustments of the original algorithm to it, and application of the novel algorithms. More specifically, we provide implementation details of programs for movie alignment, 2D classification, and 3D reconstruction that have been sped up by order of magnitude compared to their original multi-CPU implementation or sufficiently the be used on-the-fly. In addition to these three programs, multiple other programs from an actively used, open-source software package XMIPP have been accelerated and improved. Second, we discuss our contribution to HPC in the form of autotuning. Autotuning is the ability of software to adapt to a changing environment, i.e., input or executing hardware. Towards that goal, we present cuFFTAdvisor, a tool that proposes and, through autotuning, finds the best configuration of the cuFFT library for given constraints of input size and plan settings. We also introduce a benchmark set of ten autotunable kernels for important computational problems implemented in OpenCL or CUDA, together with the introduction of complex dynamic autotuning to the KTT tool. Third, we propose an image processing framework Umpalumpa, which combines a task-based runtime system, data-centric architecture, and dynamic autotuning. The proposed framework allows for writing complex workflows which automatically use available HW resources and adjust to different HW and data but at the same time are easy to maintainThe project that gave rise to these results received the support of a fellowship from the “la Caixa” Foundation (ID 100010434). The fellowship code is LCF/BQ/DI18/11660021. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 71367
    corecore