    Time Series Analysis and Classification with State-Space Models for Industrial Processes and the Life Sciences

    In this thesis the use of state-space models for analysis and classification of time series data, gathered from industrial manufacturing processes and the life sciences, is investigated. To overcome hitherto unsolved problems in both application domains the temporal behavior of the data is captured using state-space models. Industrial laser welding processes are monitored with a high speed camera and the appearance of unusual events in the image sequences correlates with errors on the produced part. Thus, novel classification frameworks are developed to robustly detect these unusual events with a small false positive rate. For classifier learning, class labels are by default only available for the complete image sequence, since scanning the sequences for anomalies is expensive. The first framework combines appearance based features and state-space models for the unusual event detection in image sequences. For the first time, ideas adapted from face recognition are used for the automatic dimension reduction of images recorded from laser welding processes. The state-space model is trained incrementally and can learn from erroneous sequences without the need of manually labeling the position of the error event within sequences. %The limitation to weakly labeled data helps to reduce the labeling effort. In addition, a second framework for the object-based detection of sputter events in laser welding processes is developed. The framework successfully combines for the first time temporal change detection, object tracking and trajectory classification for the detection of weak sputter events. %This is the first time that object tracking is successfully applied to automatic sputter detection. For the application in the life sciences the improvement and further development of data analysis methods for Single Molecule Fluorescence Spectroscopy (SMFS) is considered. SMFS experiments allow to study biochemical processes on a single molecule basis. The single molecule is excited with a laser and the photons which are emitted thereon by fluorescence contain important information about conformational changes of the molecule. Advanced statistical analysis techniques are necessary to infer state changes of the molecule from changes in the photon emissions. By using state-space models, it is possible to extract information from recorded photon streams which would be lost with traditional analysis techniques

    Modeling of evolving textures using granulometries

    This chapter describes a statistical approach to classification of dynamic texture images, called parallel evolution functions (PEFs). Traditional classification methods predict texture class membership using comparisons with a finite set of predefined texture classes and identify the closest class. However, where texture images arise from a dynamic texture evolving over time, estimation of a time state in a continuous evolutionary process is required instead. The PEF approach does this using regression modeling techniques to predict time state. It is a flexible approach which may be based on any suitable image features. Many textures are well suited to a morphological analysis and the PEF approach uses image texture features derived from a granulometric analysis of the image. The method is illustrated using both simulated images of Boolean processes and real images of corrosion. The PEF approach has particular advantages for training sets containing limited numbers of observations, which is the case in many real world industrial inspection scenarios and for which other methods can fail or perform badly.

    Filter-Based Probabilistic Markov Random Field Image Priors: Learning, Evaluation, and Image Analysis

    Markov random fields (MRF) based on linear filter responses are one of the most popular forms for modeling image priors due to their rigorous probabilistic interpretations and versatility in various applications. In this dissertation, we propose an application-independent method to quantitatively evaluate MRF image priors using model samples. To this end, we developed an efficient auxiliary-variable Gibbs samplers for a general class of MRFs with flexible potentials. We found that the popular pairwise and high-order MRF priors capture image statistics quite roughly and exhibit poor generative properties. We further developed new learning strategies and obtained high-order MRFs that well capture the statistics of the inbuilt features, thus being real maximum-entropy models, and other important statistical properties of natural images, outlining the capabilities of MRFs. We suggest a multi-modal extension of MRF potentials which not only allows to train more expressive priors, but also helps to reveal more insights of MRF variants, based on which we are able to train compact, fully-convolutional restricted Boltzmann machines (RBM) that can model visual repetitive textures even better than more complex and deep models. The learned high-order MRFs allow us to develop new methods for various real-world image analysis problems. For denoising of natural images and deconvolution of microscopy images, the MRF priors are employed in a pure generative setting. We propose efficient sampling-based methods to infer Bayesian minimum mean squared error (MMSE) estimates, which substantially outperform maximum a-posteriori (MAP) estimates and can compete with state-of-the-art discriminative methods. For non-rigid registration of live cell nuclei in time-lapse microscopy images, we propose a global optical flow-based method. The statistics of noise in fluorescence microscopy images are studied to derive an adaptive weighting scheme for increasing model robustness. High-order MRFs are also employed to train image filters for extracting important features of cell nuclei and the deformation of nuclei are then estimated in the learned feature spaces. The developed method outperforms previous approaches in terms of both registration accuracy and computational efficiency

    Multi Sensor Multi Target Perception and Tracking for Informed Decisions in Public Road Scenarios

    Multi-target tracking in public traffic calls for a tracking system with automated track initiation and termination facilities in a randomly evolving driving environment. Besides, the key problem of data association needs to be handled effectively considering the limitations in the computational resources on-board an autonomous car. The challenge of the tracking problem is further evident in the use of high-resolution automotive sensors which return multiple detections per object. Furthermore, it is customary to use multiple sensors that cover different and/or over-lapping Field of View and fuse sensor detections to provide robust and reliable tracking. As a consequence, in high-resolution multi-sensor settings, the data association uncertainty, and the corresponding tracking complexity increases pointing to a systematic approach to handle and process sensor detections. In this work, we present a multi-target tracking system that addresses target birth/initiation and death/termination processes with automatic track management features. These tracking functionalities can help facilitate perception during common events in public traffic as participants (suddenly) change lanes, navigate intersections, overtake and/or brake in emergencies, etc. Various tracking approaches including the ones based on joint integrated probability data association (JIPDA) filter, Linear Multi-target Integrated Probabilistic Data Association (LMIPDA) Filter, and their multi-detection variants are adapted to specifically include algorithms that handle track initiation and termination, clutter density estimation and track management. The utility of the filtering module is further elaborated by integrating it into a trajectory tracking problem based on model predictive control. To cope with tracking complexity in the case of multiple high-resolution sensors, we propose a hybrid scheme that combines the approaches of data clustering at the local sensor and multiple detections tracking schemes at the fusion layer. We implement a track-to-track fusion scheme that de-correlates local (sensor) tracks to avoid double counting and apply a measurement partitioning scheme to re-purpose the LMIPDA tracking algorithm to multi-detection cases. In addition to the measurement partitioning approach, a joint extent and kinematic state estimation scheme are integrated into the LMIPDA approach to facilitate perception and tracking of an individual as well as group targets as applied to multi-lane public traffic. We formulate the tracking problem as a two hierarchical layer. This arrangement enhances the multi-target tracking performance in situations including but not limited to target initialization(birth process), target occlusion, missed detections, unresolved measurement, target maneuver, etc. Also, target groups expose complex individual target interactions to help in situation assessment which is challenging to capture otherwise. The simulation studies are complemented by experimental studies performed on single and multiple (group) targets. Target detections are collected from a high-resolution radar at a frequency of 20Hz; whereas RTK-GPS data is made available as ground truth for one of the target vehicle\u27s trajectory

    Trennung und Schätzung der Anzahl von Audiosignalquellen mit Zeit- und Frequenzüberlappung

    Everyday audio recordings involve mixture signals: music contains a mixture of instruments; in a meeting or conference, there is a mixture of human voices. For these mixtures, automatically separating or estimating the number of sources is a challenging task. A common assumption when processing mixtures in the time-frequency domain is that sources are not fully overlapped. However, in this work we consider some cases where the overlap is severe — for instance, when instruments play the same note (unison) or when many people speak concurrently ("cocktail party") — highlighting the need for new representations and more powerful models. To address the problems of source separation and count estimation, we use conventional signal processing techniques as well as deep neural networks (DNN). We first address the source separation problem for unison instrument mixtures, studying the distinct spectro-temporal modulations caused by vibrato. To exploit these modulations, we developed a method based on time warping, informed by an estimate of the fundamental frequency. For cases where such estimates are not available, we present an unsupervised model, inspired by the way humans group time-varying sources (common fate). This contribution comes with a novel representation that improves separation for overlapped and modulated sources on unison mixtures but also improves vocal and accompaniment separation when used as an input for a DNN model. Then, we focus on estimating the number of sources in a mixture, which is important for real-world scenarios. Our work on count estimation was motivated by a study on how humans can address this task, which lead us to conduct listening experiments, confirming that humans are only able to estimate the number of up to four sources correctly. To answer the question of whether machines can perform similarly, we present a DNN architecture, trained to estimate the number of concurrent speakers. Our results show improvements compared to other methods, and the model even outperformed humans on the same task. In both the source separation and source count estimation tasks, the key contribution of this thesis is the concept of “modulation”, which is important to computationally mimic human performance. Our proposed Common Fate Transform is an adequate representation to disentangle overlapping signals for separation, and an inspection of our DNN count estimation model revealed that it proceeds to find modulation-like intermediate features.Im Alltag sind wir von gemischten Signalen umgeben: Musik besteht aus einer Mischung von Instrumenten; in einem Meeting oder auf einer Konferenz sind wir einer Mischung menschlicher Stimmen ausgesetzt. Für diese Mischungen ist die automatische Quellentrennung oder die Bestimmung der Anzahl an Quellen eine anspruchsvolle Aufgabe. Eine häufige Annahme bei der Verarbeitung von gemischten Signalen im Zeit-Frequenzbereich ist, dass die Quellen sich nicht vollständig überlappen. In dieser Arbeit betrachten wir jedoch einige Fälle, in denen die Überlappung immens ist zum Beispiel, wenn Instrumente den gleichen Ton spielen (unisono) oder wenn viele Menschen gleichzeitig sprechen (Cocktailparty) —, so dass neue Signal-Repräsentationen und leistungsfähigere Modelle notwendig sind. Um die zwei genannten Probleme zu bewältigen, verwenden wir sowohl konventionelle Signalverbeitungsmethoden als auch tiefgehende neuronale Netze (DNN). Wir gehen zunächst auf das Problem der Quellentrennung für Unisono-Instrumentenmischungen ein und untersuchen die speziellen, durch Vibrato ausgelösten, zeitlich-spektralen Modulationen. Um diese Modulationen auszunutzen entwickelten wir eine Methode, die auf Zeitverzerrung basiert und eine Schätzung der Grundfrequenz als zusätzliche Information nutzt. Für Fälle, in denen diese Schätzungen nicht verfügbar sind, stellen wir ein unüberwachtes Modell vor, das inspiriert ist von der Art und Weise, wie Menschen zeitveränderliche Quellen gruppieren (Common Fate). Dieser Beitrag enthält eine neuartige Repräsentation, die die Separierbarkeit für überlappte und modulierte Quellen in Unisono-Mischungen erhöht, aber auch die Trennung in Gesang und Begleitung verbessert, wenn sie in einem DNN-Modell verwendet wird. Im Weiteren beschäftigen wir uns mit der Schätzung der Anzahl von Quellen in einer Mischung, was für reale Szenarien wichtig ist. Unsere Arbeit an der Schätzung der Anzahl war motiviert durch eine Studie, die zeigt, wie wir Menschen diese Aufgabe angehen. Dies hat uns dazu veranlasst, eigene Hörexperimente durchzuführen, die bestätigten, dass Menschen nur in der Lage sind, die Anzahl von bis zu vier Quellen korrekt abzuschätzen. Um nun die Frage zu beantworten, ob Maschinen dies ähnlich gut können, stellen wir eine DNN-Architektur vor, die erlernt hat, die Anzahl der gleichzeitig sprechenden Sprecher zu ermitteln. Die Ergebnisse zeigen Verbesserungen im Vergleich zu anderen Methoden, aber vor allem auch im Vergleich zu menschlichen Hörern. Sowohl bei der Quellentrennung als auch bei der Schätzung der Anzahl an Quellen ist ein Kernbeitrag dieser Arbeit das Konzept der “Modulation”, welches wichtig ist, um die Strategien von Menschen mittels Computern nachzuahmen. Unsere vorgeschlagene Common Fate Transformation ist eine adäquate Darstellung, um die Überlappung von Signalen für die Trennung zugänglich zu machen und eine Inspektion unseres DNN-Zählmodells ergab schließlich, dass sich auch hier modulationsähnliche Merkmale finden lassen

    Statistical inference for periodic and partially observable poisson processes

    This thesis develops practical Bayesian estimators and exploration methods for count data collected by autonomous robots with unreliable sensors for long periods of time. It addresses the problems of drawing inferences from temporally incomplete and unreliable count data. This thesis contributes statistical models with spectral analysis which are able to capture the periodic structure of count data on extended temporal scales from temporally sparse observations. It is shown how to use these patterns to i) predict the human activity level at particular times and places and ii) categorize locations based on their periodic patterns. The second main contribution is a set of inference methods for a Poisson process which takes into account the unreliability of the detection algorithms used to count events. Two tractable approximations to the posterior of such Poisson processes are presented to cope with the absence of a conjugate density. Variations of these processes are presented, in which (i) sensors are uncorrelated, (ii) sensors are correlated, (iii) the unreliability of the observation model, when built from data, is accounted for. A simulation study shows that these partially observable Poisson process (POPP) filters correct the over- and under-counts produced by sensors. The third main contribution is a set of exploration methods which brings together the spectral models and the POPP filters to drive exploration by a mobile robot for a series of nine-week deployments. This leads to (i) a labelled data set and (ii) solving an exploration exploitation trade-off: the robot must explore to find out where activities congregate, so as to then exploit that by observing as many activities

    Estimation and Detection

