1,650 research outputs found

    Density-based algorithms for active and anytime clustering

    Get PDF
    Data intensive applications like biology, medicine, and neuroscience require effective and efficient data mining technologies. Advanced data acquisition methods produce a constantly increasing volume and complexity. As a consequence, the need of new data mining technologies to deal with complex data has emerged during the last decades. In this thesis, we focus on the data mining task of clustering in which objects are separated in different groups (clusters) such that objects inside a cluster are more similar than objects in different clusters. Particularly, we consider density-based clustering algorithms and their applications in biomedicine. The core idea of the density-based clustering algorithm DBSCAN is that each object within a cluster must have a certain number of other objects inside its neighborhood. Compared with other clustering algorithms, DBSCAN has many attractive benefits, e.g., it can detect clusters with arbitrary shape and is robust to outliers, etc. Thus, DBSCAN has attracted a lot of research interest during the last decades with many extensions and applications. In the first part of this thesis, we aim at developing new algorithms based on the DBSCAN paradigm to deal with the new challenges of complex data, particularly expensive distance measures and incomplete availability of the distance matrix. Like many other clustering algorithms, DBSCAN suffers from poor performance when facing expensive distance measures for complex data. To tackle this problem, we propose a new algorithm based on the DBSCAN paradigm, called Anytime Density-based Clustering (A-DBSCAN), that works in an anytime scheme: in contrast to the original batch scheme of DBSCAN, the algorithm A-DBSCAN first produces a quick approximation of the clustering result and then continuously refines the result during the further run. Experts can interrupt the algorithm, examine the results, and choose between (1) stopping the algorithm at any time whenever they are satisfied with the result to save runtime and (2) continuing the algorithm to achieve better results. Such kind of anytime scheme has been proven in the literature as a very useful technique when dealing with time consuming problems. We also introduced an extended version of A-DBSCAN called A-DBSCAN-XS which is more efficient and effective than A-DBSCAN when dealing with expensive distance measures. Since DBSCAN relies on the cardinality of the neighborhood of objects, it requires the full distance matrix to perform. For complex data, these distances are usually expensive, time consuming or even impossible to acquire due to high cost, high time complexity, noisy and missing data, etc. Motivated by these potential difficulties of acquiring the distances among objects, we propose another approach for DBSCAN, called Active Density-based Clustering (Act-DBSCAN). Given a budget limitation B, Act-DBSCAN is only allowed to use up to B pairwise distances ideally to produce the same result as if it has the entire distance matrix at hand. The general idea of Act-DBSCAN is that it actively selects the most promising pairs of objects to calculate the distances between them and tries to approximate as much as possible the desired clustering result with each distance calculation. This scheme provides an efficient way to reduce the total cost needed to perform the clustering. Thus it limits the potential weakness of DBSCAN when dealing with the distance sparseness problem of complex data. As a fundamental data clustering algorithm, density-based clustering has many applications in diverse fields. In the second part of this thesis, we focus on an application of density-based clustering in neuroscience: the segmentation of the white matter fiber tracts in human brain acquired from Diffusion Tensor Imaging (DTI). We propose a model to evaluate the similarity between two fibers as a combination of structural similarity and connectivity-related similarity of fiber tracts. Various distance measure techniques from fields like time-sequence mining are adapted to calculate the structural similarity of fibers. Density-based clustering is used as the segmentation algorithm. We show how A-DBSCAN and A-DBSCAN-XS are used as novel solutions for the segmentation of massive fiber datasets and provide unique features to assist experts during the fiber segmentation process.Datenintensive Anwendungen wie Biologie, Medizin und Neurowissenschaften erfordern effektive und effiziente Data-Mining-Technologien. Erweiterte Methoden der Datenerfassung erzeugen stetig wachsende Datenmengen und Komplexit\"at. In den letzten Jahrzehnten hat sich daher ein Bedarf an neuen Data-Mining-Technologien f\"ur komplexe Daten ergeben. In dieser Arbeit konzentrieren wir uns auf die Data-Mining-Aufgabe des Clusterings, in der Objekte in verschiedenen Gruppen (Cluster) getrennt werden, so dass Objekte in einem Cluster untereinander viel \"ahnlicher sind als Objekte in verschiedenen Clustern. Insbesondere betrachten wir dichtebasierte Clustering-Algorithmen und ihre Anwendungen in der Biomedizin. Der Kerngedanke des dichtebasierten Clustering-Algorithmus DBSCAN ist, dass jedes Objekt in einem Cluster eine bestimmte Anzahl von anderen Objekten in seiner Nachbarschaft haben muss. Im Vergleich mit anderen Clustering-Algorithmen hat DBSCAN viele attraktive Vorteile, zum Beispiel kann es Cluster mit beliebiger Form erkennen und ist robust gegen\"uber Ausrei{\ss}ern. So hat DBSCAN in den letzten Jahrzehnten gro{\ss}es Forschungsinteresse mit vielen Erweiterungen und Anwendungen auf sich gezogen. Im ersten Teil dieser Arbeit wollen wir auf die Entwicklung neuer Algorithmen eingehen, die auf dem DBSCAN Paradigma basieren, um mit den neuen Herausforderungen der komplexen Daten, insbesondere teurer Abstandsma{\ss}e und unvollst\"andiger Verf\"ugbarkeit der Distanzmatrix umzugehen. Wie viele andere Clustering-Algorithmen leidet DBSCAN an schlechter Per- formanz, wenn es teuren Abstandsma{\ss}en f\"ur komplexe Daten gegen\"uber steht. Um dieses Problem zu l\"osen, schlagen wir einen neuen Algorithmus vor, der auf dem DBSCAN Paradigma basiert, genannt Anytime Density-based Clustering (A-DBSCAN), der mit einem Anytime Schema funktioniert. Im Gegensatz zu dem urspr\"unglichen Schema DBSCAN, erzeugt der Algorithmus A-DBSCAN zuerst eine schnelle Ann\"aherung des Clusterings-Ergebnisses und verfeinert dann kontinuierlich das Ergebnis im weiteren Verlauf. Experten k\"onnen den Algorithmus unterbrechen, die Ergebnisse pr\"ufen und w\"ahlen zwischen (1) Anhalten des Algorithmus zu jeder Zeit, wann immer sie mit dem Ergebnis zufrieden sind, um Laufzeit sparen und (2) Fortsetzen des Algorithmus, um bessere Ergebnisse zu erzielen. Eine solche Art eines "Anytime Schemas" ist in der Literatur als eine sehr n\"utzliche Technik erprobt, wenn zeitaufwendige Problemen anfallen. Wir stellen auch eine erweiterte Version von A-DBSCAN als A-DBSCAN-XS vor, die effizienter und effektiver als A-DBSCAN beim Umgang mit teuren Abstandsma{\ss}en ist. Da DBSCAN auf der Kardinalit\"at der Nachbarschaftsobjekte beruht, ist es notwendig, die volle Distanzmatrix auszurechen. F\"ur komplexe Daten sind diese Distanzen in der Regel teuer, zeitaufwendig oder sogar unm\"oglich zu errechnen, aufgrund der hohen Kosten, einer hohen Zeitkomplexit\"at oder verrauschten und fehlende Daten. Motiviert durch diese m\"oglichen Schwierigkeiten der Berechnung von Entfernungen zwischen Objekten, schlagen wir einen anderen Ansatz f\"ur DBSCAN vor, namentlich Active Density-based Clustering (Act-DBSCAN). Bei einer Budgetbegrenzung B, darf Act-DBSCAN nur bis zu B ideale paarweise Distanzen verwenden, um das gleiche Ergebnis zu produzieren, wie wenn es die gesamte Distanzmatrix zur Hand h\"atte. Die allgemeine Idee von Act-DBSCAN ist, dass es aktiv die erfolgversprechendsten Paare von Objekten w\"ahlt, um die Abst\"ande zwischen ihnen zu berechnen, und versucht, sich so viel wie m\"oglich dem gew\"unschten Clustering mit jeder Abstandsberechnung zu n\"ahern. Dieses Schema bietet eine effiziente M\"oglichkeit, die Gesamtkosten der Durchf\"uhrung des Clusterings zu reduzieren. So schr\"ankt sie die potenzielle Schw\"ache des DBSCAN beim Umgang mit dem Distance Sparseness Problem von komplexen Daten ein. Als fundamentaler Clustering-Algorithmus, hat dichte-basiertes Clustering viele Anwendungen in den unterschiedlichen Bereichen. Im zweiten Teil dieser Arbeit konzentrieren wir uns auf eine Anwendung des dichte-basierten Clusterings in den Neurowissenschaften: Die Segmentierung der wei{\ss}en Substanz bei Faserbahnen im menschlichen Gehirn, die vom Diffusion Tensor Imaging (DTI) erfasst werden. Wir schlagen ein Modell vor, um die \"Ahnlichkeit zwischen zwei Fasern als einer Kombination von struktureller und konnektivit\"atsbezogener \"Ahnlichkeit von Faserbahnen zu beurteilen. Verschiedene Abstandsma{\ss}e aus Bereichen wie dem Time-Sequence Mining werden angepasst, um die strukturelle \"Ahnlichkeit von Fasern zu berechnen. Dichte-basiertes Clustering wird als Segmentierungsalgorithmus verwendet. Wir zeigen, wie A-DBSCAN und A-DBSCAN-XS als neuartige L\"osungen f\"ur die Segmentierung von sehr gro{\ss}en Faserdatens\"atzen verwendet werden, und bieten innovative Funktionen, um Experten w\"ahrend des Fasersegmentierungsprozesses zu unterst\"utzen

    Improving the Tractography Pipeline: on Evaluation, Segmentation, and Visualization

    Get PDF
    Recent advances in tractography allow for connectomes to be constructed in vivo. These have applications for example in brain tumor surgery and understanding of brain development and diseases. The large size of the data produced by these methods lead to a variety problems, including how to evaluate tractography outputs, development of faster processing algorithms for tractography and clustering, and the development of advanced visualization methods for verification and exploration. This thesis presents several advances in these fields. First, an evaluation is presented for the robustness to noise of multiple commonly used tractography algorithms. It employs a Monte–Carlo simulation of measurement noise on a constructed ground truth dataset. As a result of this evaluation, evidence for obustness of global tractography is found, and algorithmic sources of uncertainty are identified. The second contribution is a fast clustering algorithm for tractography data based on k–means and vector fields for representing the flow of each cluster. It is demonstrated that this algorithm can handle large tractography datasets due to its linear time and memory complexity, and that it can effectively integrate interrupted fibers that would be rejected as outliers by other algorithms. Furthermore, a visualization for the exploration of structural connectomes is presented. It uses illustrative rendering techniques for efficient presentation of connecting fiber bundles in context in anatomical space. Visual hints are employed to improve the perception of spatial relations. Finally, a visualization method with application to exploration and verification of probabilistic tractography is presented, which improves on the previously presented Fiber Stippling technique. It is demonstrated that the method is able to show multiple overlapping tracts in context, and correctly present crossing fiber configurations

    Synchronization Inspired Data Mining

    Get PDF
    Advances of modern technologies produce huge amounts of data in various fields, increasing the need for efficient and effective data mining tools to uncover the information contained implicitly in the data. This thesis mainly aims to propose innovative and solid algorithms for data mining from a novel perspective: synchronization. Synchronization is a prevalent phenomenon in nature that a group of events spontaneously come into co-occurrence with a common rhythm through mutual interactions. The mechanism of synchronization allows controlling of complex processes by simple operations based on interactions between objects. The first main part of this thesis focuses on developing the innovative algorithms for data mining. Inspired by the concept of synchronization, this thesis presents Sync (Clustering by Synchronization), a novel approach to clustering. In combination with the Minimum Description Length principle (MDL), it allows discovering the intrinsic clusters without any data distribution assumptions and parameters setting. In addition, relying on the dierent dynamic behaviors of objects during the process towards synchronization,the algorithm SOD (Synchronization-based Outlier Detection) is further proposed. The outlier objects can be naturally flagged by the denition of Local Synchronization Factor (LSF). To cure the curse of dimensionality in clustering,a subspace clustering algorithm ORSC is introduced which automatically detects clusters in subspaces of the original feature space. This approach proposes a weighted local interaction model to ensure all objects in a common cluster, which accommodate in arbitrarily oriented subspace, naturally move together. In order to reveal the underlying patterns in graphs, a graph partitioning approach RSGC (Robust Synchronization-based Graph Clustering) is presented. The key philosophy of RSGC is to consider graph clustering as a dynamic process towards synchronization. Inherited from the powerful concept of synchronization, RSGC shows several desirable properties that don't exist in other competitive methods. For all presented algorithms, their efficiency and eectiveness are thoroughly analyzed. The benets over traditional approaches are further demonstrated by evaluating them on synthetic as well as real-world data sets. Not only the theory research on novel data mining algorithms, the second main part of the thesis focuses on brain network analysis based on Diusion Tensor Images (DTI). A new framework for automated white matter tracts clustering is rst proposed to identify the meaningful ber bundles in the Human Brain by combining ideas from time series mining with density-based clustering. Subsequently, the enhancement and variation of this approach is discussed allowing for a more robust, efficient, or eective way to find hierarchies of ber bundles. Based on the structural connectivity network, an automated prediction framework is proposed to analyze and understand the abnormal patterns in patients of Alzheimer's Disease

    Homogeneity based segmentation and enhancement of Diffusion Tensor Images : a white matter processing framework

    Get PDF
    In diffusion magnetic resonance imaging (DMRI) the Brownian motion of the water molecules, within biological tissue, is measured through a series of images. In diffusion tensor imaging (DTI) this diffusion is represented using tensors. DTI describes, in a non-invasive way, the local anisotropy pattern enabling the reconstruction of the nervous fibers - dubbed tractography. DMRI constitutes a powerful tool to analyse the structure of the white matter within a voxel, but also to investigate the anatomy of the brain and its connectivity. DMRI has been proved useful to characterize brain disorders, to analyse the differences on white matter and consequences in brain function. These procedures usually involve the virtual dissection of white matters tracts of interest. The manual isolation of these bundles requires a great deal of neuroanatomical knowledge and can take up to several hours of work. This thesis focuses on the development of techniques able to automatically perform the identification of white matter structures. To segment such structures in a tensor field, the similarity of diffusion tensors must be assessed for partitioning data into regions, which are homogeneous in terms of tensor characteristics. This concept of tensor homogeneity is explored in order to achieve new methods for segmenting, filtering and enhancing diffusion images. First, this thesis presents a novel approach to semi-automatically define the similarity measures that better suit the data. Following, a multi-resolution watershed framework is presented, where the tensor field’s homogeneity is used to automatically achieve a hierarchical representation of white matter structures in the brain, allowing the simultaneous segmentation of different structures with different sizes. The stochastic process of water diffusion within tissues can be modeled, inferring the homogeneity characteristics of the diffusion field. This thesis presents an accelerated convolution method of diffusion images, where these models enable the contextual processing of diffusion images for noise reduction, regularization and enhancement of structures. These new methods are analysed and compared on the basis of their accuracy, robustness, speed and usability - key points for their application in a clinical setting. The described methods enrich the visualization and exploration of white matter structures, fostering the understanding of the human brain

    Characterising population variability in brain structure through models of whole-brain structural connectivity

    No full text
    Models of whole-brain connectivity are valuable for understanding neurological function. This thesis seeks to develop an optimal framework for extracting models of whole-brain connectivity from clinically acquired diffusion data. We propose new approaches for studying these models. The aim is to develop techniques which can take models of brain connectivity and use them to identify biomarkers or phenotypes of disease. The models of connectivity are extracted using a standard probabilistic tractography algorithm, modified to assess the structural integrity of tracts, through estimates of white matter anisotropy. Connections are traced between 77 regions of interest, automatically extracted by label propagation from multiple brain atlases followed by classifier fusion. The estimates of tissue integrity for each tract are input as indices in 77x77 ”connectivity” matrices, extracted for large populations of clinical data. These are compared in subsequent studies. To date, most whole-brain connectivity studies have characterised population differences using graph theory techniques. However these can be limited in their ability to pinpoint the locations of differences in the underlying neural anatomy. Therefore, this thesis proposes new techniques. These include a spectral clustering approach for comparing population differences in the clustering properties of weighted brain networks. In addition, machine learning approaches are suggested for the first time. These are particularly advantageous as they allow classification of subjects and extraction of features which best represent the differences between groups. One limitation of the proposed approach is that errors propagate from segmentation and registration steps prior to tractography. This can cumulate in the assignment of false positive connections, where the contribution of these factors may vary across populations, causing the appearance of population differences where there are none. The final contribution of this thesis is therefore to develop a common co-ordinate space approach. This combines probabilistic models of voxel-wise diffusion for each subject into a single probabilistic model of diffusion for the population. This allows tractography to be performed only once, ensuring that there is one model of connectivity. Cross-subject differences can then be identified by mapping individual subjects’ anisotropy data to this model. The approach is used to compare populations separated by age and gender

    Identifying Changes of Functional Brain Networks using Graph Theory

    Get PDF
    This thesis gives an overview on how to estimate changes in functional brain networks using graph theoretical measures. It explains the assessment and definition of functional brain networks derived from fMRI data. More explicitly, this thesis provides examples and newly developed methods on the measurement and visualization of changes due to pathology, external electrical stimulation or ongoing internal thought processes. These changes can occur on long as well as on short time scales and might be a key to understanding brain pathologies and their development. Furthermore, this thesis describes new methods to investigate and visualize these changes on both time scales and provides a more complete picture of the brain as a dynamic and constantly changing network.:1 Introduction 1.1 General Introduction 1.2 Functional Magnetic Resonance Imaging 1.3 Resting-state fMRI 1.4 Brain Networks and Graph Theory 1.5 White-Matter Lesions and Small Vessel Disease 1.6 Transcranial Direct Current Stimulation 1.7 Dynamic Functional Connectivity 2 Publications 2.1 Resting developments: a review of fMRI post-processing methodologies for spontaneous brain activity 2.2 Early small vessel disease affects fronto-parietal and cerebellar hubs in close correlation with clinical symptoms - A resting-state fMRI study 2.3 Dynamic modulation of intrinsic functional connectivity by transcranial direct current stimulation 2.4 Three-dimensional mean-shift edge bundling for the visualization of functional connectivity in the brain 2.5 Dynamic network participation of functional connectivity hubs assessed by resting-state fMRI 3 Summary 4 Bibliography 5. Appendix 5.1 Erklärung über die eigenständige Abfassung der Arbeit 5.2 Curriculum vitae 5.3 Publications 5.4 Acknowledgement

    A New Shape Similarity Framework for brain fibers classification

    Get PDF
    Diffusion Magnetic Resonance Imaging (dMRI) techniques provide a non-invasive way to explore organization and integrity of the white matter structures in human brain. dMRI quantifies in each voxel, the diffusion process of water molecules which are mechanically constrained in their motion by the axons of the neurons. This technique can be used in surgical planning and in the study of anatomical connectivity, brain changes and mental disorders. From dMRI data, white matter fiber tracts can be reconstructed using a class of technique called tractography. The dataset derived by tractography is composed by a large number of streamlines, which are sequences of points in 3D space. To simplify the visualization and analysis of white matter fiber tracts obtained from tracking algorithms, it is often necessary to group them into larger clusters or bundles. This step is called clustering. In order to perform clustering, a mathematical definition of fiber similarity (or more commonly a fiber distance) must be specified. On the basis of this metric, pairwise fiber distance can be computed and used as input for a clustering algorithm. The most common metrics used for distance measure are able to capture only the local relationship between streamlines but not the global structure of the fiber. The global structure refers to the variability of the shape. Together, local and global information, can define a better metric of similarity. We have extracted the global information using a mathematical representation based on the study of the tract with Frénet equations. In particular, we have defined some intrinsic parameters of the fibers that led to a classification of the tracts based on global geometrical characteristics. Using these parameters, a new distance metric for fiber similarity has been developed. For the evaluation of the goodness of the new metric, indices were used for a qualitative study of the results
    • …
    corecore