118 research outputs found

    Advanced source separation methods with applications to spatio-temporal datasets

    Get PDF
    Latent variable models are useful tools for statistical data analysis in many applications. Examples of popular models include factor analysis, state-space models and independent component analysis. These types of models can be used for solving the source separation problem in which the latent variables should have a meaningful interpretation and represent the actual sources generating data. Source separation methods is the main focus of this work. Bayesian statistical theory provides a principled way to learn latent variable models and therefore to solve the source separation problem. The first part of this work studies variational Bayesian methods and their application to different latent variable models. The properties of variational Bayesian methods are investigated both theoretically and experimentally using linear source separation models. A new nonlinear factor analysis model which restricts the generative mapping to the practically important case of post-nonlinear mixtures is presented. The variational Bayesian approach to learning nonlinear state-space models is studied as well. This method is applied to the practical problem of detecting changes in the dynamics of complex nonlinear processes. The main drawback of Bayesian methods is their high computational burden. This complicates their use for exploratory data analysis in which observed data regularities often suggest what kind of models could be tried. Therefore, the second part of this work proposes several faster source separation algorithms implemented in a common algorithmic framework. The proposed approaches separate the sources by analyzing their spectral contents, decoupling their dynamic models or by optimizing their prominent variance structures. These algorithms are applied to spatio-temporal datasets containing global climate measurements from a long period of time.reviewe

    Mathematical Modeling and Dimension Reduction in Dynamical Systems

    Get PDF

    Principled machine learning

    Get PDF
    We introduce the underlying concepts which give rise to some of the commonly used machine learning methods, excluding deep-learning machines and neural networks. We point to their advantages, limitations and potential use in various areas of photonics. The main methods covered include parametric and non-parametric regression and classification techniques, kernel-based methods and support vector machines, decision trees, probabilistic models, Bayesian graphs, mixture models, Gaussian processes, message passing methods and visual informatics

    Hyperspectral Data Acquisition and Its Application for Face Recognition

    Get PDF
    Current face recognition systems are rife with serious challenges in uncontrolled conditions: e.g., unrestrained lighting, pose variations, accessories, etc. Hyperspectral imaging (HI) is typically employed to counter many of those challenges, by incorporating the spectral information within different bands. Although numerous methods based on hyperspectral imaging have been developed for face recognition with promising results, three fundamental challenges remain: 1) low signal to noise ratios and low intensity values in the bands of the hyperspectral image specifically near blue bands; 2) high dimensionality of hyperspectral data; and 3) inter-band misalignment (IBM) correlated with subject motion during data acquisition. This dissertation concentrates mainly on addressing the aforementioned challenges in HI. First, to address low quality of the bands of the hyperspectral image, we utilize a custom light source that has more radiant power at shorter wavelengths and properly adjust camera exposure times corresponding to lower transmittance of the filter and lower radiant power of our light source. Second, the high dimensionality of spectral data imposes limitations on numerical analysis. As such, there is an emerging demand for robust data compression techniques with lows of less relevant information to manage real spectral data. To cope with these challenging problems, we describe a reduced-order data modeling technique based on local proper orthogonal decomposition in order to compute low-dimensional models by projecting high-dimensional clusters onto subspaces spanned by local reduced-order bases. Third, we investigate 11 leading alignment approaches to address IBM correlated with subject motion during data acquisition. To overcome the limitations of the considered alignment approaches, we propose an accurate alignment approach ( A3) by incorporating the strengths of point correspondence and a low-rank model. In addition, we develop two qualitative prediction models to assess the alignment quality of hyperspectral images in determining improved alignment among the conducted alignment approaches. Finally, we show that the proposed alignment approach leads to promising improvement on face recognition performance of a probabilistic linear discriminant analysis approach

    A Multivariate Approach to Functional Neuro Modeling

    Get PDF
    This Ph.D. thesis, A Multivariate Approach to Functional Neuro Modeling, deals with the analysis and modeling of data from functional neuro imaging experiments. A multivariate dataset description is provided which facilitates efficient representation of typical datasets and, more importantly, provides the basis for a generalization theoretical framework relating model performance to model complexity and dataset size. Briefly summarized the major topics discussed in the thesis include: ffl An introduction of the representation of functional datasets by pairs of neuronal activity patterns and overall conditions governing the functional experiment, via associated micro- and macroscopic variables. The description facilitates an efficient microscopic re-representation, as well as a handle on the link between brain and behavior; the latter is obtained by hypothesizing variations in the micro- and macroscopic variables to be manifestations of an underlying system. ffl A review of two micros..

    Semi-supervised and unsupervised kernel-based novelty detection with application to remote sensing images

    Get PDF
    The main challenge of new information technologies is to retrieve intelligible information from the large volume of digital data gathered every day. Among the variety of existing data sources, the satellites continuously observing the surface of the Earth are key to the monitoring of our environment. The new generation of satellite sensors are tremendously increasing the possibilities of applications but also increasing the need for efficient processing methodologies in order to extract information relevant to the users' needs in an automatic or semi-automatic way. This is where machine learning comes into play to transform complex data into simplified products such as maps of land-cover changes or classes by learning from data examples annotated by experts. These annotations, also called labels, may actually be difficult or costly to obtain since they are established on the basis of ground surveys. As an example, it is extremely difficult to access a region recently flooded or affected by wildfires. In these situations, the detection of changes has to be done with only annotations from unaffected regions. In a similar way, it is difficult to have information on all the land-cover classes present in an image while being interested in the detection of a single one of interest. These challenging situations are called novelty detection or one-class classification in machine learning. In these situations, the learning phase has to rely only on a very limited set of annotations, but can exploit the large set of unlabeled pixels available in the images. This setting, called semi-supervised learning, allows significantly improving the detection. In this Thesis we address the development of methods for novelty detection and one-class classification with few or no labeled information. The proposed methodologies build upon the kernel methods, which take place within a principled but flexible framework for learning with data showing potentially non-linear feature relations. The thesis is divided into two parts, each one having a different assumption on the data structure and both addressing unsupervised (automatic) and semi-supervised (semi-automatic) learning settings. The first part assumes the data to be formed by arbitrary-shaped and overlapping clusters and studies the use of kernel machines, such as Support Vector Machines or Gaussian Processes. An emphasis is put on the robustness to noise and outliers and on the automatic retrieval of parameters. Experiments on multi-temporal multispectral images for change detection are carried out using only information from unchanged regions or none at all. The second part assumes high-dimensional data to lie on multiple low dimensional structures, called manifolds. We propose a method seeking a sparse and low-rank representation of the data mapped in a non-linear feature space. This representation allows us to build a graph, which is cut into several groups using spectral clustering. For the semi-supervised case where few labels of one class of interest are available, we study several approaches incorporating the graph information. The class labels can either be propagated on the graph, constrain spectral clustering or used to train a one-class classifier regularized by the given graph. Experiments on the unsupervised and oneclass classification of hyperspectral images demonstrate the effectiveness of the proposed approaches

    Validação de heterogeneidade estrutural em dados de Crio-ME por comitês de agrupadores

    Get PDF
    Orientadores: Fernando José Von Zuben, Rodrigo Villares PortugalDissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: Análise de Partículas Isoladas é uma técnica que permite o estudo da estrutura tridimensional de proteínas e outros complexos macromoleculares de interesse biológico. Seus dados primários consistem em imagens de microscopia eletrônica de transmissão de múltiplas cópias da molécula em orientações aleatórias. Tais imagens são bastante ruidosas devido à baixa dose de elétrons utilizada. Reconstruções 3D podem ser obtidas combinando-se muitas imagens de partículas em orientações similares e estimando seus ângulos relativos. Entretanto, estados conformacionais heterogêneos frequentemente coexistem na amostra, porque os complexos moleculares podem ser flexíveis e também interagir com outras partículas. Heterogeneidade representa um desafio na reconstrução de modelos 3D confiáveis e degrada a resolução dos mesmos. Entre os algoritmos mais populares usados para classificação estrutural estão o agrupamento por k-médias, agrupamento hierárquico, mapas autoorganizáveis e estimadores de máxima verossimilhança. Tais abordagens estão geralmente entrelaçadas à reconstrução dos modelos 3D. No entanto, trabalhos recentes indicam ser possível inferir informações a respeito da estrutura das moléculas diretamente do conjunto de projeções 2D. Dentre estas descobertas, está a relação entre a variabilidade estrutural e manifolds em um espaço de atributos multidimensional. Esta dissertação investiga se um comitê de algoritmos de não-supervisionados é capaz de separar tais "manifolds conformacionais". Métodos de "consenso" tendem a fornecer classificação mais precisa e podem alcançar performance satisfatória em uma ampla gama de conjuntos de dados, se comparados a algoritmos individuais. Nós investigamos o comportamento de seis algoritmos de agrupamento, tanto individualmente quanto combinados em comitês, para a tarefa de classificação de heterogeneidade conformacional. A abordagem proposta foi testada em conjuntos sintéticos e reais contendo misturas de imagens de projeção da proteína Mm-cpn nos estados "aberto" e "fechado". Demonstra-se que comitês de agrupadores podem fornecer informações úteis na validação de particionamentos estruturais independetemente de algoritmos de reconstrução 3DAbstract: Single Particle Analysis is a technique that allows the study of the three-dimensional structure of proteins and other macromolecular assemblies of biological interest. Its primary data consists of transmission electron microscopy images from multiple copies of the molecule in random orientations. Such images are very noisy due to the low electron dose employed. Reconstruction of the macromolecule can be obtained by averaging many images of particles in similar orientations and estimating their relative angles. However, heterogeneous conformational states often co-exist in the sample, because the molecular complexes can be flexible and may also interact with other particles. Heterogeneity poses a challenge to the reconstruction of reliable 3D models and degrades their resolution. Among the most popular algorithms used for structural classification are k-means clustering, hierarchical clustering, self-organizing maps and maximum-likelihood estimators. Such approaches are usually interlaced with the reconstructions of the 3D models. Nevertheless, recent works indicate that it is possible to infer information about the structure of the molecules directly from the dataset of 2D projections. Among these findings is the relationship between structural variability and manifolds in a multidimensional feature space. This dissertation investigates whether an ensemble of unsupervised classification algorithms is able to separate these "conformational manifolds". Ensemble or "consensus" methods tend to provide more accurate classification and may achieve satisfactory performance across a wide range of datasets, when compared with individual algorithms. We investigate the behavior of six clustering algorithms both individually and combined in ensembles for the task of structural heterogeneity classification. The approach was tested on synthetic and real datasets containing a mixture of images from the Mm-cpn chaperonin in the "open" and "closed" states. It is shown that cluster ensembles can provide useful information in validating the structural partitionings independently of 3D reconstruction methodsMestradoEngenharia de ComputaçãoMestre em Engenharia Elétric

    Principles and theory of protein-based pattern formation

    Get PDF
    Biological systems perform functions by the orchestrated interplay of many small components without a "conductor." Such self-organization pervades life on many scales, from the subcellular level to populations of many organisms and whole ecosystems. On the intracellular level, protein-based pattern formation coordinates and instructs functions like cell division, differentiation and motility. A key feature of protein-based pattern formation is that the total numbers of the involved proteins remain constant on the timescale of pattern formation. The overarching theme of this thesis is the profound impact of this mass-conservation property on pattern formation and how one can harness mass conservation to understand the underlying physical principles. The central insight is that changes in local densities shift local reactive equilibria, and thus induce concentration gradients which, in turn, drive diffusive transport of mass. For two-component systems, this dynamic interplay can be captured by simple geometric objects in the (low-dimensional) phase space of chemical concentrations. On this phase-space level, physical insight can be gained from geometric criteria and graphical constructions. Moreover, we introduce the notion of regional (in)stabilities, which allows one to characterize the dynamics in the highly nonlinear regime reveals an inherent connection between Turing instability and stimulus-induced pattern formation. The insights gained for conceptual two-component systems can be generalized to systems with more components and several conserved masses. In the minimal setting of two diffusively coupled "reactors," the full dynamics can be embedded in the phase-space of redistributed masses where the phase space flow is organized by surfaces of local reactive equilibria. Building on the phase-space analysis for two component systems, we develop a new approach to the important open problem of wavelength selection in the highly nonlinear regime. We show that two-component reaction–diffusion systems always exhibit uninterrupted coarsening (the continual growth of the characteristic length scale) of patterns if they are strictly mass conserving. Selection of a finite wavelength emerges due to weakly broken mass-conservation, or coupling to additional components, which counteract and stop the competition instability that drives coarsening. For complex dynamical phenomena like wave patterns and the transition to spatiotemporal chaos, an analysis in terms of local equilibria and their stability properties provides a powerful tool to interpret data from numerical simulations and experiments, and to reveal the underlying physical mechanisms. In collaborations with different experimental labs, we studied the Min system of Escherichia coli. A central insight from these investigations is that bulk-surface coupling imparts a strong dependence of pattern formation on the geometry of the spatial confinement, which explains the qualitatively different dynamics observed inside cells compared to in vitro reconstitutions. By theoretically studying the polarization machinery in budding yeast and testing predictions in collaboration with experimentalists, we found that this functional module implements several redundant polarization mechanisms that depend on different subsets of proteins. Taken together, our work reveals unifying principles underlying biological self-organization and elucidates how microscopic interaction rules and physical constraints collectively lead to specific biological functions.Biologische Systeme führen Funktionen durch das orchestrierte Zusammenspiel vieler kleiner Komponenten ohne einen "Dirigenten" aus. Solche Selbstorganisation durchdringt das Leben auf vielen Skalen, von der subzellulären Ebene bis zu Populationen vieler Organismen und ganzen Ökosystemen. Auf der intrazellulären Ebene koordiniert und instruieren proteinbasierte Muster Funktionen wie Zellteilung, Differenzierung und Motilität. Ein wesentliches Merkmal der proteinbasierten Musterbildung ist, dass die Gesamtzahl der beteiligten Proteine auf der Zeitskala der Musterbildung konstant bleibt. Das übergreifende Thema dieser Arbeit ist es, den tiefgreifenden Einfluss dieser Massenerhaltung auf die Musterbildung zu untersuchen und Methoden zu entwickeln, die Massenerhaltung nutzen, um die zugrunde liegenden physikalischen Prinzipien von proteinbasierter Musterbildung zu verstehen. Die zentrale Erkenntnis ist, dass Änderungen der lokalen Dichten lokale reaktive Gleichgewichte verschieben und somit Konzentrationsgradienten induzieren, die wiederum den diffusiven Transport von Masse antreiben. Für Zweikomponentensysteme kann dieses dynamische Wechselspiel durch einfache geometrische Objekte im (niedrigdimensionalen) Phasenraum der chemischen Konzentrationen erfasst werden. Auf dieser Phasenraumebene können physikalische Erkenntnisse durch geometrische Kriterien und grafische Konstruktionen gewonnen werden. Darüber hinaus führen wir den Begriff der regionalen (In-)stabilität ein, der es erlaubt, die Dynamik im hochgradig nichtlinearen Regime zu charakterisieren und einen inhärenten Zusammenhang zwischen Turing-Instabilität und stimulusinduzierter Musterbildung aufzuzeigen. Die für konzeptionelle Zweikomponentensysteme gewonnenen Erkenntnisse können auf Systeme mit mehr Komponenten und mehreren erhaltenen Massen verallgemeinert werden. In der minimalen Fassung von zwei diffusiv gekoppelten "Reaktoren" kann die gesamte Dynamik in den Phasenraum umverteilter Massen eingebettet werden, wobei der Phasenraumfluss durch Flächen lokaler reaktiver Gleichgewichte organisiert wird. Aufbauend auf der Phasenraumanalyse für Zweikomponentensysteme entwickeln wir einen neuen Ansatz für die wichtige offene Fragestellung der Wellenängenselektion im hochgradig nichtlinearen Regime. Wir zeigen, dass "coarsening" (das stetige wachsen der charakteristischen Längenskala) von Mustern in Zweikomponentensystemen nie stoppt, wenn sie exakt massenerhaltend sind. Die Selektion einer endlichen Wellenlänge entsteht durch schwach gebrochene Massenerhaltung oder durch Kopplung an zusätzliche Komponenten. Diese Prozesse wirken der Masseumverteilung, die coarsening treibt, entgegen und stoppen so das coarsening. Bei komplexen dynamischen Phänomenen wie Wellenmustern und dem Übergang zu raumzeitlichen Chaos bietet eine Analyse in Bezug auf lokale Gleichgewichte und deren Stabilitätseigenschaften ein leistungsstarkes Werkzeug, um Daten aus numerischen Simulationen und Experimenten zu interpretieren und die zugrunde liegenden physikalischen Mechanismen aufzudecken. In Zusammenarbeit mit verschiedenen experimentellen Labors haben wir das Min-System von Escherichia coli untersucht. Eine zentrale Erkenntnis aus diesen Untersuchungen ist, dass die Kopplung zwischen Volumen und Oberfläche zu einer starken Abhängigkeit der Musterbildung von der räumlichen Geometrie führt. Das erklärt die qualitativ unterschiedliche Dynamik, die in Zellen im Vergleich zu in vitro Rekonstitutionen beobachtet wird. Durch die theoretische Untersuchung der Polarisationsmaschinerie in Hefezellen, kombiniert mit experimentellen Tests theoretischer Vorhersagen, haben wir herausgefunden, dass dieses Funktionsmodul mehrere redundante Polarisationsmechanismen implementiert, die von verschiedenen Untergruppen von Proteinen abhängen. Zusammengenommen beleuchtet unsere Arbeit die vereinheitlichenden Prinzipien, die der intrazellulären Selbstorganisation zugrunde liegen, und zeigt, wie mikroskopische Interaktionsregeln und physikalische Bedingungen gemeinsam zu spezifischen biologischen Funktionen führen
    corecore