    3D shape matching and registration : a probabilistic perspective

    Dense correspondence is a key area in computer vision and medical image analysis. It has applications in registration and shape analysis. In this thesis, we develop a technique to recover dense correspondences between the surfaces of neuroanatomical objects over heterogeneous populations of individuals. We recover dense correspondences based on 3D shape matching. In this thesis, the 3D shape matching problem is formulated under the framework of Markov Random Fields (MRFs). We represent the surfaces of neuroanatomical objects as genus zero voxel-based meshes. The surface meshes are projected into a Markov random field space. The projection carries both geometric and topological information in terms of Gaussian curvature and mesh neighbourhood from the original space to the random field space. Gaussian curvature is projected to the nodes of the MRF, and the mesh neighbourhood structure is projected to the edges. 3D shape matching between two surface meshes is then performed by solving an energy function minimisation problem formulated with MRFs. The outcome of the 3D shape matching is dense point-to-point correspondences. However, the minimisation of the energy function is NP hard. In this thesis, we use belief propagation to perform the probabilistic inference for 3D shape matching. A sparse update loopy belief propagation algorithm adapted to the 3D shape matching is proposed to obtain an approximate global solution for the 3D shape matching problem. The sparse update loopy belief propagation algorithm demonstrates significant efficiency gain compared to standard belief propagation. The computational complexity and convergence property analysis for the sparse update loopy belief propagation algorithm are also conducted in the thesis. We also investigate randomised algorithms to minimise the energy function. In order to enhance the shape matching rate and increase the inlier support set, we propose a novel clamping technique. The clamping technique is realized by combining the loopy belief propagation message updating rule with the feedback from 3D rigid body registration. By using this clamping technique, the correct shape matching rate is increased significantly. Finally, we investigate 3D shape registration techniques based on the 3D shape matching result. Based on the point-to-point dense correspondences obtained from the 3D shape matching, a three-point based transformation estimation technique is combined with the RANdom SAmple Consensus (RANSAC) algorithm to obtain the inlier support set. The global registration approach is purely dependent on point-wise correspondences between two meshed surfaces. It has the advantage that the need for orientation initialisation is eliminated and that all shapes of spherical topology. The comparison of our MRF based 3D registration approach with a state-of-the-art registration algorithm, the first order ellipsoid template, is conducted in the experiments. These show dense correspondence for pairs of hippocampi from two different data sets, each of around 20 60+ year old healthy individuals

    Coping With New Challengens for Density-Based Clustering

    Knowledge Discovery in Databases (KDD) is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. The core step of the KDD process is the application of a Data Mining algorithm in order to produce a particular enumeration of patterns and relationships in large databases. Clustering is one of the major data mining tasks and aims at grouping the data objects into meaningful classes (clusters) such that the similarity of objects within clusters is maximized, and the similarity of objects from different clusters is minimized. Beside many others, the density-based clustering notion underlying the algorithm DBSCAN and its hierarchical extension OPTICS has been proposed recently, being one of the most successful approaches to clustering. In this thesis, our aim is to advance the state-of-the-art clustering, especially density-based clustering by identifying novel challenges for density-based clustering and proposing innovative and solid solutions for these challenges. We describe the development of the industrial prototype BOSS (Browsing OPTICS plots for Similarity Search) which is a first step towards developing a comprehensive, scalable and distributed computing solution designed to make the efficiency and analytical capabilities of OPTICS available to a broader audience. For the development of BOSS, several key enhancements of OPTICS are required which are addressed in this thesis. We develop incremental algorithms of OPTICS to efficiently reconstruct the hierarchical clustering structure in frequently updated databases, in particular, when a set of objects is inserted in or deleted from the database. We empirically show that these incremental algorithms yield significant speed-up factors over the original OPTICS algorithm. Furthermore, we propose a novel algorithm for automatic extraction of clusters from hierarchical clustering representations that outperforms comparative methods, and introduce two novel approaches for selecting meaningful representatives, using the density-based concepts of OPTICS and producing better results than the related medoid approach. Another major challenge for density-based clustering is to cope with high dimensional data. Many today's real-world data sets contain a large number of measurements (or features) for a single data object. Usually, global feature reduction techniques cannot be applied to these data sets. Thus, the task of feature selection must be combined with and incooperated into the clustering process. In this thesis, we present original extensions and enhancements of the density-based clustering notion to cope with high dimensional data. In particular, we propose an algorithm called SUBCLU (density based SUBspace CLUstering) that extends DBSCAN to the problem of subspace clustering. SUBCLU efficiently computes all clusters that would have been found if DBSCAN is applied to all possible subspaces of the feature space. An experimental evaluation on real-world data sets illustrates that SUBCLU is more effective than existing subspace clustering algorithms because it is able to find clusters of arbitrary size and shape, and produces determine results. A semi-hierarchical extension of SUBCLU called RIS (Ranking Interesting Subspaces) is proposed that does not compute the subspace clusters directly, but generates a list of subspaces ranked by their clustering characteristics. A hierarchical clustering algorithm can be applied to these interesting subspaces in order to compute a hierarchical (subspace) clustering. A comparative evaluation of RIS and SUBCLU shows that RIS in combination with OPTICS can achieve an information gain over SUBCLU. In addition, we propose the algorithm 4C (Computing Correlation Connected Clusters) that extends the concepts of DBSCAN to compute density-based correlation clusters. 4C benefits from an innovative, well-defined and effective clustering model, outperforming related approaches in terms of clustering quality on real-world data sets.Knowledge Discovery in Databases (KDD) ist der Prozess der (semi-)automatischen Extraktion von Wissen aus Datenbanken, das gĂŒltig, bisher unbekannt und potentiell nĂŒtzlich fĂŒr eine gegebene Anwendung ist. Der zentrale Schritt des KDD-Prozesses ist das Data Mining. Eine der wichtigsten Aufgaben des Data Mining ist Clustering. Dabei sollen die Objekte einer Datenbank in Gruppen (Cluster) partitioniert werden, so dass Objekte eines Clusters möglichst Ă€hnlich und Objekte verschiedener Cluster möglichst unĂ€hnlich zu einander sind. Das dichtebasierte Clustermodell und die darauf aufbauenden Algorithmen DBSCAN und OPTICS sind unter einer Vielzahl anderer Clustering-AnsĂ€tze eine der erfolgreichsten Methoden zum Clustering. Im Rahmen dieser Dissertation wollen wir den aktuellen Stand der Technik im Bereich Clustering und speziell im Bereich dichtebasiertes Clustering voranbringen. Dazu erarbeiten wir neue Herausforderungen fĂŒr das dichtebasierte Clustermodell und schlagen dazu innovative Lösungen vor. ZunĂ€chst steht die Entwicklung des industriellen Prototyps BOSS (Browsing OPTICS plots for Similarity Search) im Mittelpunkt dieser Arbeit. BOSS ist ein erster Beitrag zu einer umfassenden, skalierbaren und verteilten Softwarelösung, die eine Nutzung der Effizienzvorteile und die analytischen Möglichkeiten des dichtebasierten, hierarchischen Clustering-Algorithmus OPTICS fĂŒr ein breites Publikum ermöglichen. Zur Entwicklung von BOSS werden drei entscheidende Erweiterungen von OPTICS benötigt: Wir entwickeln eine inkrementelle Version von OPTICS um nach einem Update der Datenbank (EinfĂŒgen/Löschen einer Menge von Objekten) die hierarchische Clustering Struktur effizient zu reorganisieren. Anhand von Experimenten mit synthetischen und realen Daten zeigen wir, dass die vorgeschlagenen, inkrementellen Algorithmen deutliche Beschleunigungsfaktoren gegenĂŒber dem originalen OPTICS-Algorithmus erzielen. Desweiteren schlagen wir einen neuen Algorithmus zur automatischen Clusterextraktion aus hierarchischen ReprĂ€sentationen und zwei innovative Methoden zur automatischen Auswahl geeigneter ClusterreprĂ€sentaten vor. Unsere neuen Techniken erzielen bei Tests auf mehreren realen Datenbanken im Vergleich zu den konkurrierenden Verfahren bessere Ergebnisse. Eine weitere Herausforderung fĂŒr Clustering-Verfahren stellen hochdimensionale FeaturerĂ€ume dar. Reale DatensĂ€tze beinhalten dank moderner Verfahren zur Datenerhebung hĂ€ufig sehr viele Merkmale. Teile dieser Merkmale unterliegen oft Rauschen oder AbhĂ€ngigkeiten und können meist nicht im Vorfeld ausgesiebt werden, da diese Effekte jeweils in Teilen der Datenbank unterschiedlich ausgeprĂ€gt sind. Daher muss die Wahl der Features mit dem Data-Mining-Verfahren verknĂŒpft werden. Im Rahmen dieser Arbeit stellen wir innovative Erweiterungen des dichtebasierten Clustermodells fĂŒr hochdimensionale Daten vor. Wir entwickeln SUBCLU (dichtebasiertes SUBspace CLUstering), ein auf DBSCAN basierender Subspace Clustering Algorithmus. SUBCLU erzeugt effizient alle Cluster, die gefunden werden, wenn man DBSCAN auf alle möglichen TeilrĂ€ume des Datensatzes anwendet. Experimente auf realen Daten zeigen, dass SUBCLU effektiver als vergleichbare Algorithmen ist. RIS (Ranking Interesting Subspaces), eine semi-hierarchische Erweiterung von SUBCLU, wird vorgeschlagen, das nicht mehr direkt die Teilraumcluster berechnet, sondern eine Liste von TeilrĂ€umen geordnet anhand ihrer Clustering-QualitĂ€t erzeugt. Dadurch können hierarchische Partitionierungen auf ausgewĂ€hlten TeilrĂ€umen erzeugt werden. Experimente belegen, dass RIS in Kombination mit OPTICS ein Informationsgewinn gegenĂŒber SUBCLU erreicht. Außerdem stellen wir den neuartigen Korrelationscluster Algorithmus 4C (Computing Correlation Connected Clusters) vor. 4C basiert auf einem innovativen und wohldefinierten Clustermodell und erzielt in unseren Experimenten mit realen Daten bessere Ergebnisse als vergleichbare Clustering-AnsĂ€tze

    Efficient and Effective Similarity Search on Complex Objects

    Due to the rapid development of computer technology and new methods for the extraction of data in the last few years, more and more applications of databases have emerged, for which an efficient and effective similarity search is of great importance. Application areas of similarity search include multimedia, computer aided engineering, marketing, image processing and many more. Special interest adheres to the task of finding similar objects in large amounts of data having complex representations. For example, set-valued objects as well as tree or graph structured objects are among these complex object representations. The grouping of similar objects, the so-called clustering, is a fundamental analysis technique, which allows to search through extensive data sets. The goal of this dissertation is to develop new efficient and effective methods for similarity search in large quantities of complex objects. Furthermore, the efficiency of existing density-based clustering algorithms is to be improved when applied to complex objects. The first part of this work motivates the use of vector sets for similarity modeling. For this purpose, a metric distance function is defined, which is suitable for various application ranges, but time-consuming to compute. Therefore, a filter refinement technology is suggested to efficiently process range queries and k-nearest neighbor queries, two basic query types within the field of similarity search. Several filter distances are presented, which approximate the exact object distance and can be computed efficiently. Moreover, a multi-step query processing approach is described, which can be directly integrated into the well-known density-based clustering algorithms DBSCAN and OPTICS. In the second part of this work, new application ranges for density-based hierarchical clustering using OPTICS are discussed. A prototype is introduced, which has been developed for these new application areas and is based on the aforementioned similarity models and accelerated clustering algorithms for complex objects. This prototype facilitates interactive semi-automatic cluster analysis and allows visual search for similar objects in multimedia databases. Another prototype extends these concepts and enables the user to analyze multi-represented and multi-instance data. Finally, the problem of music genre classification is addressed as another application supporting multi-represented and multi-instance data objects. An extensive experimental evaluation examines efficiency and effectiveness of the presented techniques using real-world data and points out advantages in comparison to conventional approaches

    New Techniques for Clustering Complex Objects

    The tremendous amount of data produced nowadays in various application domains such as molecular biology or geography can only be fully exploited by efficient and effective data mining tools. One of the primary data mining tasks is clustering, which is the task of partitioning points of a data set into distinct groups (clusters) such that two points from one cluster are similar to each other whereas two points from distinct clusters are not. Due to modern database technology, e.g.object relational databases, a huge amount of complex objects from scientific, engineering or multimedia applications is stored in database systems. Modelling such complex data often results in very high-dimensional vector data ("feature vectors"). In the context of clustering, this causes a lot of fundamental problems, commonly subsumed under the term "Curse of Dimensionality". As a result, traditional clustering algorithms often fail to generate meaningful results, because in such high-dimensional feature spaces data does not cluster anymore. But usually, there are clusters embedded in lower dimensional subspaces, i.e. meaningful clusters can be found if only a certain subset of features is regarded for clustering. The subset of features may even be different for varying clusters. In this thesis, we present original extensions and enhancements of the density-based clustering notion to cope with high-dimensional data. In particular, we propose an algorithm called SUBCLU (density-connected Subspace Clustering) that extends DBSCAN (Density-Based Spatial Clustering of Applications with Noise) to the problem of subspace clustering. SUBCLU efficiently computes all clusters of arbitrary shape and size that would have been found if DBSCAN were applied to all possible subspaces of the feature space. Two subspace selection techniques called RIS (Ranking Interesting Subspaces) and SURFING (SUbspaces Relevant For clusterING) are proposed. They do not compute the subspace clusters directly, but generate a list of subspaces ranked by their clustering characteristics. A hierarchical clustering algorithm can be applied to these interesting subspaces in order to compute a hierarchical (subspace) clustering. In addition, we propose the algorithm 4C (Computing Correlation Connected Clusters) that extends the concepts of DBSCAN to compute density-based correlation clusters. 4C searches for groups of objects which exhibit an arbitrary but uniform correlation. Often, the traditional approach of modelling data as high-dimensional feature vectors is no longer able to capture the intuitive notion of similarity between complex objects. Thus, objects like chemical compounds, CAD drawings, XML data or color images are often modelled by using more complex representations like graphs or trees. If a metric distance function like the edit distance for graphs and trees is used as similarity measure, traditional clustering approaches like density-based clustering are applicable to those data. However, we face the problem that a single distance calculation can be very expensive. As clustering performs a lot of distance calculations, approaches like filter and refinement and metric indices get important. The second part of this thesis deals with special approaches for clustering in application domains with complex similarity models. We show, how appropriate filters can be used to enhance the performance of query processing and, thus, clustering of hierarchical objects. Furthermore, we describe how the two paradigms of filtering and metric indexing can be combined. As complex objects can often be represented by using different similarity models, a new clustering approach is presented that is able to cluster objects that provide several different complex representations

    Spatial Database Support for Virtual Engineering

    The development, design, manufacturing and maintenance of modern engineering products is a very expensive and complex task. Shorter product cycles and a greater diversity of models are becoming decisive competitive factors in the hard-fought automobile and plane market. In order to support engineers to create complex products when being pressed for time, systems are required which answer collision and similarity queries effectively and efficiently. In order to achieve industrial strength, the required specialized functionality has to be integrated into fully-fledged database systems, so that fundamental services of these systems can be fully reused, including transactions, concurrency control and recovery. This thesis aims at the development of theoretical sound and practical realizable algorithms which effectively and efficiently detect colliding and similar complex spatial objects. After a short introductory Part I, we look in Part II at different spatial index structures and discuss their integrability into object-relational database systems. Based on this discussion, we present two generic approaches for accelerating collision queries. The first approach exploits available statistical information in order to accelerate the query process. The second approach is based on a cost-based decompositioning of complex spatial objects. In a broad experimental evaluation based on real-world test data sets, we demonstrate the usefulness of the presented techniques which allow interactive query response times even for large data sets of complex objects. In Part III of the thesis, we discuss several similarity models for spatial objects. We show by means of a new evaluation method that data-partitioning similarity models yield more meaningful results than space-partitioning similarity models. We introduce a very effective similarity model which is based on a new paradigm in similarity search, namely the use of vector set represented objects. In order to guarantee efficient query processing, suitable filters are introduced for accelerating similarity queries on complex spatial objects. Based on clustering and the introduced similarity models we present an industrial prototype which helps the user to navigate through massive data sets.Ein schneller und reibungsloser Entwicklungsprozess neuer Produkte ist ein wichtiger Faktor fĂŒr den wirtschaftlichen Erfolg vieler Unternehmen insbesondere aus der Luft- und Raumfahrttechnik und der Automobilindustrie. Damit Ingenieure in immer kĂŒrzerer Zeit immer anspruchsvollere Produkte entwickeln können, werden effektive und effiziente Kollisions- und Ähnlichkeitsanfragen auf komplexen rĂ€umlichen Objekten benötigt. Um den hohen Anforderungen eines produktiven Einsatzes zu genĂŒgen, mĂŒssen entsprechend spezialisierte Zugriffsmethoden in vollwertige Datenbanksysteme integriert werden, so dass zentrale Datenbankdienste wie Trans-aktionen, kontrollierte NebenlĂ€ufigkeit und Wiederanlauf sichergestellt sind. Ziel dieser Doktorarbeit ist es deshalb, effektive und effiziente Algorithmen fĂŒr Kollisions- und Ähnlichkeitsanfragen auf komplexen rĂ€umlichen Objekten zu ent-wickeln und diese in kommerzielle Objekt-Relationale Datenbanksysteme zu integrieren. Im ersten Teil der Arbeit werden verschiedene rĂ€umliche Indexstrukturen zur effizienten Bearbeitung von Kollisionsanfragen diskutiert und auf ihre IntegrationsfĂ€higkeit in Objekt-Relationale Datenbanksysteme hin untersucht. Daran an-knĂŒpfend werden zwei generische Verfahren zur Beschleunigung von Kollisionsanfragen vorgestellt. Das erste Verfahren benutzt statistische Informationen rĂ€umlicher Indexstrukturen, um eine gegebene Anfrage zu beschleunigen. Das zweite Verfahren beruht auf einer kostenbasierten Zerlegung komplexer rĂ€umlicher Datenbank- Objekte. Diese beiden Verfahren ergĂ€nzen sich gegenseitig und können unabhĂ€ngig voneinander oder zusammen eingesetzt werden. In einer ausfĂŒhrlichen experimentellen Evaluation wird gezeigt, dass die beiden vorgestellten Verfahren interaktive Kollisionsanfragen auf umfangreichen Datenmengen und komplexen Objekten ermöglichen. Im zweiten Teil der Arbeit werden verschiedene Ähnlichkeitsmodelle fĂŒr rĂ€um-liche Objekte vorgestellt. Es wird experimentell aufgezeigt, dass datenpartitionierende Modelle effektiver sind als raumpartitionierende Verfahren. Weiterhin werden geeignete Filtertechniken zur Beschleunigung des Anfrageprozesses entwickelt und experimentell untersucht. Basierend auf Clustering und den entwickelten Ähnlichkeitsmodellen wird ein industrietauglicher Prototyp vorgestellt, der Benutzern hilft, durch große Datenmengen zu navigieren