795 research outputs found

    Multimodal methods for blind source separation of audio sources

    Get PDF
    The enhancement of the performance of frequency domain convolutive blind source separation (FDCBSS) techniques when applied to the problem of separating audio sources recorded in a room environment is the focus of this thesis. This challenging application is termed the cocktail party problem and the ultimate aim would be to build a machine which matches the ability of a human being to solve this task. Human beings exploit both their eyes and their ears in solving this task and hence they adopt a multimodal approach, i.e. they exploit both audio and video modalities. New multimodal methods for blind source separation of audio sources are therefore proposed in this work as a step towards realizing such a machine. The geometry of the room environment is initially exploited to improve the separation performance of a FDCBSS algorithm. The positions of the human speakers are monitored by video cameras and this information is incorporated within the FDCBSS algorithm in the form of constraints added to the underlying cross-power spectral density matrix-based cost function which measures separation performance. [Continues.

    A multimodal approach to blind source separation of moving sources

    Get PDF
    A novel multimodal approach is proposed to solve the problem of blind source separation (BSS) of moving sources. The challenge of BSS for moving sources is that the mixing filters are time varying; thus, the unmixing filters should also be time varying, which are difficult to calculate in real time. In the proposed approach, the visual modality is utilized to facilitate the separation for both stationary and moving sources. The movement of the sources is detected by a 3-D tracker based on video cameras. Positions and velocities of the sources are obtained from the 3-D tracker based on a Markov Chain Monte Carlo particle filter (MCMC-PF), which results in high sampling efficiency. The full BSS solution is formed by integrating a frequency domain blind source separation algorithm and beamforming: if the sources are identified as stationary for a certain minimum period, a frequency domain BSS algorithm is implemented with an initialization derived from the positions of the source signals. Once the sources are moving, a beamforming algorithm which requires no prior statistical knowledge is used to perform real time speech enhancement and provide separation of the sources. Experimental results confirm that by utilizing the visual modality, the proposed algorithm not only improves the performance of the BSS algorithm and mitigates the permutation problem for stationary sources, but also provides a good BSS performance for moving sources in a low reverberant environment

    Multimodal blind source separation for moving sources

    Get PDF
    A novel multimodal approach is proposed to solve the problem of blind source separation (BSS) of moving sources. The challenge of BSS for moving sources is that the mixing filters are time varying, thus the unmixing filters should also be time varying, which are difficult to track in real time. In the proposed approach, the visual modality is utilized to facilitate the separation for both stationary and moving sources. The movement of the sources is detected by a 3-D tracker based on particle filtering. The full BSS solution is formed by integrating a frequency domain blind source separation algorithm and beamforming: if the sources are identified as stationary, a frequency domain BSS algorithm is implemented with an initialization derived from the visual information. Once the sources are moving, a beamforming algorithm is used to perform real time speech enhancement and provide separation of the sources. Experimental results show that by utilizing the visual modality, the proposed algorithm can not only improve the performance of the BSS algorithm and mitigate the permutation problem for stationary sources, but also provide a good BSS performance for moving sources in a low reverberant environment

    Methods for three-dimensional Registration of Multimodal Abdominal Image Data

    Get PDF
    Multimodal image registration benefits the diagnosis, treatment planning and the performance of image-guided procedures in the liver, since it enables the fusion of complementary information provided by pre- and intrainterventional data about tumor localization and access. Although there exist various registration methods, approaches which are specifically optimized for the registration of multimodal abdominal scans are only scarcely available. The work presented in this thesis aims to tackle this problem by focusing on the development, optimization and evaluation of registration methods specifically for the registration of multimodal liver scans. The contributions to the research field of medical image registration include the development of a registration evaluation methodology that enables the comparison and optimization of linear and non-linear registration algorithms using a point-based accuracy measure. This methodology has been used to benchmark standard registration methods as well as novel approaches that were developed within the frame of this thesis. The results of the methodology showed that the employed similarity measure used during the registration has a major impact on the registration accuracy of the method. Due to this influence, two alternative similarity metrics bearing the potential to be used on multimodal image data are proposed and evaluated. The first metric relies on the use of gradient information in form of Histograms of Oriented Gradients (HOG) whereas the second metric employs a siamese neural network to learn a similarity measure directly on the image data. The evaluation showed, that both metrics could compete with state of the art similarity measures in terms of registration accuracy. The HOG-metric offers the advantage that it does not require ground truth data to learn a similarity estimation, but instead it is applicable to various data sets with the sole requirement of distinct gradients. However, the Siamese metric is characterized by a higher robustness for large rotations than the HOG-metric. To train such a network, registered ground truth data is required which may be critical for multimodal image data. Yet, the results show that it is possible to apply models trained on registered synthetic data on real patient data. The last part of this thesis focuses on methods to learn an entire registration process using neural networks, thereby offering the advantage to replace the traditional, time-consuming iterative registration procedure. Within the frame of this thesis, the so-called VoxelMorph network which was originally proposed for monomodal, non-linear registration learning is extended for affine and multimodal registration learning tasks. This extension includes the consideration of an image mask during metric evaluation as well as loss functions for multimodal data, such as the pretrained Siamese metric and a loss relying on the comparison of deformation fields. Based on the developed registration evaluation methodology, the performance of the original network as well as the extended variants are evaluated for monomodal and multimodal registration tasks using multiple data sets. With the extended network variants, it is possible to learn an entire multimodal registration process for the correction of large image displacements. As for the Siamese metric, the results imply a general transferability of models trained with synthetic data to registration tasks including real patient data. Due to the lack of multimodal ground truth data, this transfer represents an important step towards making Deep Learning based registration procedures clinically usable

    Enhanced independent vector analysis for audio separation in a room environment

    Get PDF
    Independent vector analysis (IVA) is studied as a frequency domain blind source separation method, which can theoretically avoid the permutation problem by retaining the dependency between different frequency bins of the same source vector while removing the dependency between different source vectors. This thesis focuses upon improving the performance of independent vector analysis when it is used to solve the audio separation problem in a room environment. A specific stability problem of IVA, i.e. the block permutation problem, is identified and analyzed. Then a robust IVA method is proposed to solve this problem by exploiting the phase continuity of the unmixing matrix. Moreover, an auxiliary function based IVA algorithm with an overlapped chain type source prior is proposed as well to mitigate this problem. Then an informed IVA scheme is proposed which combines the geometric information of the sources from video to solve the problem by providing an intelligent initialization for optimal convergence. The proposed informed IVA algorithm can also achieve a faster convergence in terms of iteration numbers and better separation performance. A pitch based evaluation method is defined to judge the separation performance objectively when the information describing the mixing matrix and sources is missing. In order to improve the separation performance of IVA, an appropriate multivariate source prior is needed to better preserve the dependency structure within the source vectors. A particular multivariate generalized Gaussian distribution is adopted as the source prior. The nonlinear score function derived from this proposed source prior contains the fourth order relationships between different frequency bins, which provides a more informative and stronger dependency structure compared with the original IVA algorithm and thereby improves the separation performance. Copula theory is a central tool to model the nonlinear dependency structure. The t copula is proposed to describe the dependency structure within the frequency domain speech signals due to its tail dependency property, which means if one variable has an extreme value, other variables are expected to have extreme values. A multivariate student's t distribution constructed by using a t copula with the univariate student's t marginal distribution is proposed as the source prior. Then the IVA algorithm with the proposed source prior is derived. The proposed algorithms are tested with real speech signals in different reverberant room environments both using modelled room impulse response and real room recordings. State-of-the-art criteria are used to evaluate the separation performance, and the experimental results confirm the advantage of the proposed algorithms

    Experimental and Data-driven Workflows for Microstructure-based Damage Prediction

    Get PDF
    Materialermüdung ist die häufigste Ursache für mechanisches Versagen. Die Degradationsmechanismen, welche die Lebensdauer von Bauteilen bei vergleichsweise ausgeprägten zyklischen Belastungen bestimmen, sind gut bekannt. Bei Belastungen im makroskopisch elastischen Bereich hingegen, der (sehr) hochzyklischen Ermüdung, bestimmen die innere Struktur eines Werkstoffs und die Wechselwirkung kristallografischer Defekte die Lebensdauer. Unter diesen Umständen sind die inneren Degradationsphänomene auf der mikroskopischen Skala weitgehend reversibel und führen nicht zur Bildung kritischer Schädigungen, die kontinuierlich wachsen können. Allerdings sind einige Kornensembles in polykristallinen Metallen, je nach den lokalen mikrostrukturellen Gegebenheiten, anfällig für Schädigungsinitiierung, Rissbildung und -wachstum und wirken daher als Schwachstellen. Daher weisen Bauteile, die solchen Belastungen ausgesetzt sind, oft eine ausgeprägte Lebensdauerstreuung auf. Die Tatsache, dass ein umfassendes mechanistisches Verständnis für diese Degradationsprozesse in verschiedenen Werkstoffen nicht vorliegt, hat zur Folge, dass die derzeitigen Modellierungsbemühungen die mittlere Lebensdauer und ihre Varianz in der Regel nur mit unbefriedigender Genauigkeit vorhersagen. Dies wiederum erschwert die Bauteilauslegung und macht die Nutzung von Sicherheitsfaktoren während des Dimensionierungsprozesses erforderlich. Abhilfe kann geschaffen werden, indem umfangreiche Daten zu Einflussfaktoren und deren Wirkung auf die Bildung initialer Ermüdungsschädigungen erhoben werden. Die Datenknappheit wirkt sich nach wie vor negativ auf Datenwissenschaftler und Modellierungsexperten aus, die versuchen, trotz geringer Stichprobengröße und unvollständigen Merkmalsräumen, mikrostrukturelle Abhängigkeiten abzuleiten, datengetriebene Vorhersagemodelle zu trainieren oder physikalische, regelbasierte Modelle zu parametrisieren. Die Tatsache, dass nur wenige kritische Schädigungen bezogen auf das gesamte Probenvolumen auftreten und die hochzyklische Ermüdung eine Vielzahl unterschiedlicher Abhängigkeiten aufweist, impliziert einige Anforderungen an die Datenerfassung und -verarbeitung. Am wichtigsten ist, dass die Messtechniken so empfindlich sind, dass nuancierte Schwankungen im Probenzustand erfasst werden können, dass die gesamte Routine effizient ist und dass die korrelative Mikroskopie räumliche Informationen aus verschiedenen Messungen miteinander verbindet. Das Hauptziel dieser Arbeit besteht darin, einen Workflow zu etablieren, der den Datenmangel behebt, so dass die zukünftige virtuelle Auslegung von Komponenten effizienter, zuverlässiger und nachhaltiger gestaltet werden kann. Zu diesem Zweck wird in dieser Arbeit ein kombinierter experimenteller und datenverarbeitender Workflow vorgeschlagen, um multimodale Datensätze zu Ermüdungsschädigungen zu erzeugen. Der Schwerpunkt liegt dabei auf dem Auftreten von lokalen Gleitbändern, der Rissinitiierung und dem Wachstum mikrostrukturell kurzer Risse. Der Workflow vereint die Ermüdungsprüfung von mesoskaligen Proben, um die Empfindlichkeit der Schädigungsdetektion zu erhöhen, die ergänzende Charakterisierung, die multimodale Registrierung und Datenfusion der heterogenen Daten, sowie die bildverarbeitungsbasierte Schädigungslokalisierung und -bewertung. Mesoskalige Biegeresonanzprüfung ermöglicht das Erreichen des hochzyklischen Ermüdungszustands in vergleichsweise kurzen Zeitspannen bei gleichzeitig verbessertem Auflösungsvermögen der Schädigungsentwicklung. Je nach Komplexität der einzelnen Bildverarbeitungsaufgaben und Datenverfügbarkeit werden entweder regelbasierte Bildverarbeitungsverfahren oder Repräsentationslernen gezielt eingesetzt. So sorgt beispielsweise die semantische Segmentierung von Schädigungsstellen dafür, dass wichtige Ermüdungsmerkmale aus mikroskopischen Abbildungen extrahiert werden können. Entlang des Workflows wird auf einen hohen Automatisierungsgrad Wert gelegt. Wann immer möglich, wurde die Generalisierbarkeit einzelner Workflow-Elemente untersucht. Dieser Workflow wird auf einen ferritischen Stahl (EN 1.4003) angewendet. Der resultierende Datensatz verknüpft unter anderem große verzerrungskorrigierte Mikrostrukturdaten mit der Schädigungslokalisierung und deren zyklischer Entwicklung. Im Zuge der Arbeit wird der Datensatz wird im Hinblick auf seinen Informationsgehalt untersucht, indem detaillierte, analytische Studien zur einzelnen Schädigungsbildung durchgeführt werden. Auf diese Weise konnten unter anderem neuartige, quantitative Erkenntnisse über mikrostrukturinduzierte plastische Verformungs- und Rissstopmechanismen gewonnen werden. Darüber hinaus werden aus dem Datensatz abgeleitete kornweise Merkmalsvektoren und binäre Schädigungskategorien verwendet, um einen Random-Forest-Klassifikator zu trainieren und dessen Vorhersagegüte zu bewerten. Der vorgeschlagene Workflow hat das Potenzial, die Grundlage für künftiges Data Mining und datengetriebene Modellierung mikrostrukturempfindlicher Ermüdung zu legen. Er erlaubt die effiziente Erhebung statistisch repräsentativer Datensätze mit gleichzeitig hohem Informationsgehalt und kann auf eine Vielzahl von Werkstoffen ausgeweitet werden

    Multimodal Three Dimensional Scene Reconstruction, The Gaussian Fields Framework

    Get PDF
    The focus of this research is on building 3D representations of real world scenes and objects using different imaging sensors. Primarily range acquisition devices (such as laser scanners and stereo systems) that allow the recovery of 3D geometry, and multi-spectral image sequences including visual and thermal IR images that provide additional scene characteristics. The crucial technical challenge that we addressed is the automatic point-sets registration task. In this context our main contribution is the development of an optimization-based method at the core of which lies a unified criterion that solves simultaneously for the dense point correspondence and transformation recovery problems. The new criterion has a straightforward expression in terms of the datasets and the alignment parameters and was used primarily for 3D rigid registration of point-sets. However it proved also useful for feature-based multimodal image alignment. We derived our method from simple Boolean matching principles by approximation and relaxation. One of the main advantages of the proposed approach, as compared to the widely used class of Iterative Closest Point (ICP) algorithms, is convexity in the neighborhood of the registration parameters and continuous differentiability, allowing for the use of standard gradient-based optimization techniques. Physically the criterion is interpreted in terms of a Gaussian Force Field exerted by one point-set on the other. Such formulation proved useful for controlling and increasing the region of convergence, and hence allowing for more autonomy in correspondence tasks. Furthermore, the criterion can be computed with linear complexity using recently developed Fast Gauss Transform numerical techniques. In addition, we also introduced a new local feature descriptor that was derived from visual saliency principles and which enhanced significantly the performance of the registration algorithm. The resulting technique was subjected to a thorough experimental analysis that highlighted its strength and showed its limitations. Our current applications are in the field of 3D modeling for inspection, surveillance, and biometrics. However, since this matching framework can be applied to any type of data, that can be represented as N-dimensional point-sets, the scope of the method is shown to reach many more pattern analysis applications

    Shape/image registration for medical imaging : novel algorithms and applications.

    Get PDF
    This dissertation looks at two different categories of the registration approaches: Shape registration, and Image registration. It also considers the applications of these approaches into the medical imaging field. Shape registration is an important problem in computer vision, computer graphics and medical imaging. It has been handled in different manners in many applications like shapebased segmentation, shape recognition, and tracking. Image registration is the process of overlaying two or more images of the same scene taken at different times, from different viewpoints, and/or by different sensors. Many image processing applications like remote sensing, fusion of medical images, and computer-aided surgery need image registration. This study deals with two different applications in the field of medical image analysis. The first one is related to shape-based segmentation of the human vertebral bodies (VBs). The vertebra consists of the VB, spinous, and other anatomical regions. Spinous pedicles, and ribs should not be included in the bone mineral density (BMD) measurements. The VB segmentation is not an easy task since the ribs have similar gray level information. This dissertation investigates two different segmentation approaches. Both of them are obeying the variational shape-based segmentation frameworks. The first approach deals with two dimensional (2D) case. This segmentation approach starts with obtaining the initial segmentation using the intensity/spatial interaction models. Then, shape model is registered to the image domain. Finally, the optimal segmentation is obtained using the optimization of an energy functional which integrating the shape model with the intensity information. The second one is a 3D simultaneous segmentation and registration approach. The information of the intensity is handled by embedding a Willmore flow into the level set segmentation framework. Then the shape variations are estimated using a new distance probabilistic model. The experimental results show that the segmentation accuracy of the framework are much higher than other alternatives. Applications on BMD measurements of vertebral body are given to illustrate the accuracy of the proposed segmentation approach. The second application is related to the field of computer-aided surgery, specifically on ankle fusion surgery. The long-term goal of this work is to apply this technique to ankle fusion surgery to determine the proper size and orientation of the screws that are used for fusing the bones together. In addition, we try to localize the best bone region to fix these screws. To achieve these goals, the 2D-3D registration is introduced. The role of 2D-3D registration is to enhance the quality of the surgical procedure in terms of time and accuracy, and would greatly reduce the need for repeated surgeries; thus, saving the patients time, expense, and trauma
    corecore