124 research outputs found
Validação de heterogeneidade estrutural em dados de Crio-ME por comitês de agrupadores
Orientadores: Fernando JosĂ© Von Zuben, Rodrigo Villares PortugalDissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia ElĂ©trica e de ComputaçãoResumo: Análise de PartĂculas Isoladas Ă© uma tĂ©cnica que permite o estudo da estrutura tridimensional de proteĂnas e outros complexos macromoleculares de interesse biolĂłgico. Seus dados primários consistem em imagens de microscopia eletrĂ´nica de transmissĂŁo de mĂşltiplas cĂłpias da molĂ©cula em orientações aleatĂłrias. Tais imagens sĂŁo bastante ruidosas devido Ă baixa dose de elĂ©trons utilizada. Reconstruções 3D podem ser obtidas combinando-se muitas imagens de partĂculas em orientações similares e estimando seus ângulos relativos. Entretanto, estados conformacionais heterogĂŞneos frequentemente coexistem na amostra, porque os complexos moleculares podem ser flexĂveis e tambĂ©m interagir com outras partĂculas. Heterogeneidade representa um desafio na reconstrução de modelos 3D confiáveis e degrada a resolução dos mesmos. Entre os algoritmos mais populares usados para classificação estrutural estĂŁo o agrupamento por k-mĂ©dias, agrupamento hierárquico, mapas autoorganizáveis e estimadores de máxima verossimilhança. Tais abordagens estĂŁo geralmente entrelaçadas Ă reconstrução dos modelos 3D. No entanto, trabalhos recentes indicam ser possĂvel inferir informações a respeito da estrutura das molĂ©culas diretamente do conjunto de projeções 2D. Dentre estas descobertas, está a relação entre a variabilidade estrutural e manifolds em um espaço de atributos multidimensional. Esta dissertação investiga se um comitĂŞ de algoritmos de nĂŁo-supervisionados Ă© capaz de separar tais "manifolds conformacionais". MĂ©todos de "consenso" tendem a fornecer classificação mais precisa e podem alcançar performance satisfatĂłria em uma ampla gama de conjuntos de dados, se comparados a algoritmos individuais. NĂłs investigamos o comportamento de seis algoritmos de agrupamento, tanto individualmente quanto combinados em comitĂŞs, para a tarefa de classificação de heterogeneidade conformacional. A abordagem proposta foi testada em conjuntos sintĂ©ticos e reais contendo misturas de imagens de projeção da proteĂna Mm-cpn nos estados "aberto" e "fechado". Demonstra-se que comitĂŞs de agrupadores podem fornecer informações Ăşteis na validação de particionamentos estruturais independetemente de algoritmos de reconstrução 3DAbstract: Single Particle Analysis is a technique that allows the study of the three-dimensional structure of proteins and other macromolecular assemblies of biological interest. Its primary data consists of transmission electron microscopy images from multiple copies of the molecule in random orientations. Such images are very noisy due to the low electron dose employed. Reconstruction of the macromolecule can be obtained by averaging many images of particles in similar orientations and estimating their relative angles. However, heterogeneous conformational states often co-exist in the sample, because the molecular complexes can be flexible and may also interact with other particles. Heterogeneity poses a challenge to the reconstruction of reliable 3D models and degrades their resolution. Among the most popular algorithms used for structural classification are k-means clustering, hierarchical clustering, self-organizing maps and maximum-likelihood estimators. Such approaches are usually interlaced with the reconstructions of the 3D models. Nevertheless, recent works indicate that it is possible to infer information about the structure of the molecules directly from the dataset of 2D projections. Among these findings is the relationship between structural variability and manifolds in a multidimensional feature space. This dissertation investigates whether an ensemble of unsupervised classification algorithms is able to separate these "conformational manifolds". Ensemble or "consensus" methods tend to provide more accurate classification and may achieve satisfactory performance across a wide range of datasets, when compared with individual algorithms. We investigate the behavior of six clustering algorithms both individually and combined in ensembles for the task of structural heterogeneity classification. The approach was tested on synthetic and real datasets containing a mixture of images from the Mm-cpn chaperonin in the "open" and "closed" states. It is shown that cluster ensembles can provide useful information in validating the structural partitionings independently of 3D reconstruction methodsMestradoEngenharia de ComputaçãoMestre em Engenharia ElĂ©tric
A Geometric Approach for Deciphering Protein Structure from Cryo-EM Volumes
Electron Cryo-Microscopy or cryo-EM is an area that has received much attention in the recent past. Compared to the traditional methods of X-Ray Crystallography and NMR Spectroscopy, cryo-EM can be used to image much larger complexes, in many different conformations, and under a wide range of biochemical conditions. This is because it does not require the complex to be crystallisable. However, cryo-EM reconstructions are limited to intermediate resolutions, with the state-of-the-art being 3.6A, where secondary structure elements can be visually identified but not individual amino acid residues. This lack of atomic level resolution creates new computational challenges for protein structure identification. In this dissertation, we present a suite of geometric algorithms to address several aspects of protein modeling using cryo-EM density maps. Specifically, we develop novel methods to capture the shape of density volumes as geometric skeletons. We then use these skeletons to find secondary structure elements: SSEs) of a given protein, to identify the correspondence between these SSEs and those predicted from the primary sequence, and to register high-resolution protein structures onto the density volume. In addition, we designed and developed Gorgon, an interactive molecular modeling system, that integrates the above methods with other interactive routines to generate reliable and accurate protein backbone models
Automatic approaches for microscopy imaging based on machine learning and spatial statistics
One of the most frequent ways to interact with the surrounding environment occurs
as a visual way. Hence imaging is a very common way in order to gain information
and learn from the environment. Particularly in the field of cellular biology, imaging
is applied in order to get an insight into the minute world of cellular complexes. As
a result, in recent years many researches have focused on developing new suitable
image processing approaches which have facilitates the extraction of meaningful
quantitative information from image data sets. In spite of recent progress, but due to
the huge data set of acquired images and the demand for increasing precision, digital
image processing and statistical analysis are gaining more and more importance in
this field.
There are still limitations in bioimaging techniques that are preventing sophisticated
optical methods from reaching their full potential. For instance, in the 3D
Electron Microscopy(3DEM) process nearly all acquired images require manual postprocessing
to enhance the performance, which should be substitute by an automatic
and reliable approach (dealt in Part I). Furthermore, the algorithms to localize individual
fluorophores in 3D super-resolution microscopy data are still in their initial
phase (discussed in Part II). In general, biologists currently lack automated and high
throughput methods for quantitative global analysis of 3D gene structures.
This thesis focuses mainly on microscopy imaging approaches based on Machine
Learning, statistical analysis and image processing in order to cope and improve the
task of quantitative analysis of huge image data. The main task consists of building
a novel paradigm for microscopy imaging processes which is able to work in an
automatic, accurate and reliable way.
The specific contributions of this thesis can be summarized as follows:
• Substitution of the time-consuming, subjective and laborious task of manual
post-picking in Cryo-EM process by a fully automatic particle post-picking
routine based on Machine Learning methods (Part I).
• Quality enhancement of the 3D reconstruction image due to the high performance
of automatically post-picking steps (Part I).
• Developing a full automatic tool for detecting subcellular objects in multichannel
3D Fluorescence images (Part II).
• Extension of known colocalization analysis by using spatial statistics in order
to investigate the surrounding point distribution and enabling to analyze the
colocalization in combination with statistical significance (Part II).
All introduced approaches are implemented and provided as toolboxes which are
free available for research purposes
A Bayesian approach to initial model inference in cryo-electron microscopy
Eine Hauptanwendung der Einzelpartikel-Analyse in der Kryo-Elektronenmikroskopie ist die Charakterisierung der dreidimensionalen Struktur makromolekularer Komplexe. Dazu werden zehntausende Bilder verwendet, die verrauschte zweidimensionale Projektionen des Partikels zeigen. Im ersten Schritt werden ein niedrig aufgelöstetes Anfangsmodell rekonstruiert sowie die unbekannten Bildorientierungen geschätzt. Dies ist ein schwieriges inverses Problem mit vielen Unbekannten, einschließlich einer unbekannten Orientierung für jedes Projektionsbild. Ein gutes Anfangsmodell ist entscheidend für den Erfolg des anschließenden Verfeinerungsschrittes.
Meine Dissertation stellt zwei neue Algorithmen zur Rekonstruktion eines Anfangsmodells in der Kryo-Elektronenmikroskopie vor, welche auf einer groben Darstellung der Elektronendichte basieren. Die beiden wesentlichen Beiträge meiner Arbeit sind zum einen das Modell, welches die Elektronendichte darstellt, und zum anderen die neuen Rekonstruktionsalgorithmen.
Der erste Hauptbeitrag liegt in der Verwendung Gaußscher Mischverteilungen zur Darstellung von Elektrondichten im Rekonstruktionsschritt. Ich verwende kugelförmige Mischungskomponenten mit unbekannten Positionen, Ausdehnungen und Gewichtungen. Diese Darstellung hat viele Vorteile im Vergleich zu einer gitterbasierten Elektronendichte, die andere Rekonstruktionsalgorithmen üblicherweise verwenden. Zum Beispiel benötigt sie wesentlich weniger Parameter, was zu schnelleren und robusteren Algorithmen führt.
Der zweite Hauptbeitrag ist die Entwicklung von Markovketten-Monte-Carlo-Verfahren im Rahmen eines Bayes'schen Ansatzes zur Schätzung der Modellparameter. Der erste Algorithmus kann aus dem Gibbs-Sampling, welches Gaußsche Mischverteilungen an Punktwolken anpasst, abgeleitet werden. Dieser Algorithmus wird hier so erweitert, dass er auch mit Bildern, Projektionen sowie unbekannten Drehungen und Verschiebungen funktioniert.
Der zweite Algorithmus wählt einen anderen Zugang. Das Vorwärtsmodell nimmt nun Gaußsche Fehler an. Sampling-Algorithmen wie Hamiltonian Monte Carlo (HMC) erlauben es, die Positionen der Mischungskomponenten und die Bildorientierungen zu schätzen.
Meine Dissertation zeigt umfassende numerische Experimente mit simulierten und echten Daten, die die vorgestellten Algorithmen in der Praxis testen und mit anderen Rekonstruktionsverfahren vergleichen.Single-particle cryo-electron microscopy (cryo-EM) is widely used to study the structure of macromolecular assemblies. Tens of thousands of noisy two-dimensional images of the macromolecular assembly viewed from different directions are used to infer its three-dimensional structure. The first step is to estimate a low-resolution initial model and initial image orientations. This is a challenging ill-posed inverse problem with many unknowns, including an unknown orientation for each two-dimensional image. Obtaining a good initial model is crucial for the success of the subsequent refinement step. In this thesis we introduce new algorithms for estimating an initial model in cryo-EM, based on a coarse representation of the electron density. The contribution of the thesis can be divided into these two parts: one relating to the model, and the other to the algorithms. The first main contribution of the thesis is using Gaussian mixture models to represent electron densities in reconstruction algorithms. We use spherical (isotropic) mixture components with unknown positions, size and weights. We show that using this representation offers many advantages over the traditional grid-based representation used by other reconstruction algorithms. There is for example a significant reduction in the number of parameters needed to represent the three-dimensional electron density, which leads to fast and robust algorithms.
The second main contribution of the thesis is developing Markov Chain Monte Carlo (MCMC) algorithms within a Bayesian framework for estimating the parameters of the mixture models. The first algorithm is a Gibbs sampling algorithm. It is derived by starting with the standard Gibbs sampling algorithm for fitting Gaussian mixture models to point clouds, and extending it to work with images, to handle projections from three dimensions to two dimensions, and to account for unknown rotations and translations.
The second algorithm takes a different approach. It modifies the forward model to work with Gaussian noise, and uses sampling algorithms such as Hamiltonian Monte Carlo (HMC) to sample the positions of the mixture components and the image orientations.
We provide extensive tests of our algorithms using simulated and experimental data, and compare them to other initial model algorithms
Inferring Biological Structures from Super-Resolution Single Molecule Images Using Generative Models
Localization-based super resolution imaging is presently limited by sampling requirements for dynamic measurements of biological structures. Generating an image requires serial acquisition of individual molecular positions at sufficient density to define a biological structure, increasing the acquisition time. Efficient analysis of biological structures from sparse localization data could substantially improve the dynamic imaging capabilities of these methods. Using a feature extraction technique called the Hough Transform simple biological structures are identified from both simulated and real localization data. We demonstrate that these generative models can efficiently infer biological structures in the data from far fewer localizations than are required for complete spatial sampling. Analysis at partial data densities revealed efficient recovery of clathrin vesicle size distributions and microtubule orientation angles with as little as 10% of the localization data. This approach significantly increases the temporal resolution for dynamic imaging and provides quantitatively useful biological information
- …