873 research outputs found

    Robust And Scalable Learning Of Complex Dataset Topologies Via Elpigraph

    Full text link
    Large datasets represented by multidimensional data point clouds often possess non-trivial distributions with branching trajectories and excluded regions, with the recent single-cell transcriptomic studies of developing embryo being notable examples. Reducing the complexity and producing compact and interpretable representations of such data remains a challenging task. Most of the existing computational methods are based on exploring the local data point neighbourhood relations, a step that can perform poorly in the case of multidimensional and noisy data. Here we present ElPiGraph, a scalable and robust method for approximation of datasets with complex structures which does not require computing the complete data distance matrix or the data point neighbourhood graph. This method is able to withstand high levels of noise and is capable of approximating complex topologies via principal graph ensembles that can be combined into a consensus principal graph. ElPiGraph deals efficiently with large and complex datasets in various fields from biology, where it can be used to infer gene dynamics from single-cell RNA-Seq, to astronomy, where it can be used to explore complex structures in the distribution of galaxies.Comment: 32 pages, 14 figure

    On improving the performance of optimistic distributed simulations

    No full text
    This report investigates means of improving the performance of optimistic distributed simulations without affecting the simulation accuracy. We argue that existing clustering algorithms are not adequate for application in distributed simulations, and outline some characteristics of an ideal algorithm that could be applied in this field. This report is structured as follows. We start by introducing the area of distributed simulation. Following a comparison of the dominant protocols used in distributed simulation, we elaborate on the current approaches of improving the simulation performance, using computation efficient techniques, exploiting the hardware configuration of processors, optimizations that can be derived from the simulation scenario, etc. We introduce the core characteristics of clustering approaches and argue that these cannot be applied in real-life distributed simulation problems. We present a typical distributed simulation setting and elaborate on the reasons that existing clustering approaches are not expected to improve the performance of a distributed simulation. We introduce a prototype distributed simulation platform that has been developed in the scope of this research, focusing on the area of emergency response and specifically building evacuation. We continue by outlining our current work on this issue, and finally, we end this report by outlining next actions which could be made in this field

    Visual Analysis of High-Dimensional Point Clouds using Topological Abstraction

    Get PDF
    This thesis is about visualizing a kind of data that is trivial to process by computers but difficult to imagine by humans because nature does not allow for intuition with this type of information: high-dimensional data. Such data often result from representing observations of objects under various aspects or with different properties. In many applications, a typical, laborious task is to find related objects or to group those that are similar to each other. One classic solution for this task is to imagine the data as vectors in a Euclidean space with object variables as dimensions. Utilizing Euclidean distance as a measure of similarity, objects with similar properties and values accumulate to groups, so-called clusters, that are exposed by cluster analysis on the high-dimensional point cloud. Because similar vectors can be thought of as objects that are alike in terms of their attributes, the point cloud\''s structure and individual cluster properties, like their size or compactness, summarize data categories and their relative importance. The contribution of this thesis is a novel analysis approach for visual exploration of high-dimensional point clouds without suffering from structural occlusion. The work is based on implementing two key concepts: The first idea is to discard those geometric properties that cannot be preserved and, thus, lead to the typical artifacts. Topological concepts are used instead to shift away the focus from a point-centered view on the data to a more structure-centered perspective. The advantage is that topology-driven clustering information can be extracted in the data\''s original domain and be preserved without loss in low dimensions. The second idea is to split the analysis into a topology-based global overview and a subsequent geometric local refinement. The occlusion-free overview enables the analyst to identify features and to link them to other visualizations that permit analysis of those properties not captured by the topological abstraction, e.g. cluster shape or value distributions in particular dimensions or subspaces. The advantage of separating structure from data point analysis is that restricting local analysis only to data subsets significantly reduces artifacts and the visual complexity of standard techniques. That is, the additional topological layer enables the analyst to identify structure that was hidden before and to focus on particular features by suppressing irrelevant points during local feature analysis. This thesis addresses the topology-based visual analysis of high-dimensional point clouds for both the time-invariant and the time-varying case. Time-invariant means that the points do not change in their number or positions. That is, the analyst explores the clustering of a fixed and constant set of points. The extension to the time-varying case implies the analysis of a varying clustering, where clusters appear as new, merge or split, or vanish. Especially for high-dimensional data, both tracking---which means to relate features over time---but also visualizing changing structure are difficult problems to solve

    K-means based clustering and context quantization

    Get PDF

    Some new techniques for pattern recognition research and lung sound signal analysis

    Get PDF
    This thesis describes the results of a collaborative research programme between the Department of Electronics & Electrical Engineering, University of Glasgow, and the Centre for Respiratory Investigation, Glasgow Royal Infirmary. The research was initially aimed at studying lung sound using signal processing and pattern recognition techniques. The use of pattern recogntion techniques was largely confined to exploratory data analysis which led to an interest in the methods themselves. A study was carried out to apply recent research in computational geometry to clustering Two geometric structures, the Gabriel graph and the relative neighbourhood graph, are both defined by a region of influence. A generalization of these graphs is used to find the conditions under which graphs defined by a region of influence are connected and planar. The Gabriel graph may be considered to be just planar and the relative neighbourhood graph to be just connected. From this two variable regions of influence were defined that were aimed at producing disconnected graphs and hence a partitioning of the data set, A hierarchic clustering based on relative distance may be generated by varying the size of the region of influence. The value of the clustering method is examined in terms of admissibility criteria and by a case study. An interactive display to complement the graph theoretical clustering was also developed. This display allows a partition in the clustering to be examined. The relationship between clusters in the partition may be studied by using the partition to define a contracted graph which is then displayed. Subgraphs of the original graph may be used to provide displays of individual clusterso This display should provide additional information about a partition and hence allow the user to understand the data better. The remainder of the work in this thesis concerns the application of pattern recogntition techniques to the analysis of lung sound signals. Breath sound was analysed using frequency domain methods since it is basically a continuous signal. Initially, a rather ad hoc method was used for feature extraction which was based on a piecewise constant approximation to the amplitude spectrum. While this method provided a useful set of features, it is clear that more systematic methods are required. These methods were used to study lung sound in four groups of patients: (1) normal patients, (2) patients with asbestosis, (3) patients with cryptogenic fibrosing alveolitis (CFA) and (4) patients with interstitial pulmonary oedema. The data sets were analysed using principal components analysis and the new graph theroretical clustering method (this data was used as a case study for the clustering method). Three groups of patients could be identified from the data;- (a) normal subjects, (b) patients with fibrosis of the lungs (asbestosis & CFA) and (c) patients with pulmonary oedema. These results suggest that lung sound may be able to make a useful contribution to non-invasive diagnosis. However more extensive studies are required before the real value of lung sound in diagnosis is established

    Organising a photograph collection based on human appearance

    Get PDF
    This thesis describes a complete framework for organising digital photographs in an unsupervised manner, based on the appearance of people captured in the photographs. Organising a collection of photographs manually, especially providing the identities of people captured in photographs, is a time consuming task. Unsupervised grouping of images containing similar persons makes annotating names easier (as a group of images can be named at once) and enables quick search based on query by example. The full process of unsupervised clustering is discussed in this thesis. Methods for locating facial components are discussed and a technique based on colour image segmentation is proposed and tested. Additionally a method based on the Principal Component Analysis template is tested, too. These provide eye locations required for acquiring a normalised facial image. This image is then preprocessed by a histogram equalisation and feathering, and the features of MPEG-7 face recognition descriptor are extracted. A distance measure proposed in the MPEG-7 standard is used as a similarity measure. Three approaches to grouping that use only face recognition features for clustering are analysed. These are modified k-means, single-link and a method based on a nearest neighbour classifier. The nearest neighbour-based technique is chosen for further experiments with fusing information from several sources. These sources are context-based such as events (party, trip, holidays), the ownership of photographs, and content-based such as information about the colour and texture of the bodies of humans appearing in photographs. Two techniques are proposed for fusing event and ownership (user) information with the face recognition features: a Transferable Belief Model (TBM) and three level clustering. The three level clustering is carried out at “event” level, “user” level and “collection” level. The latter technique proves to be most efficient. For combining body information with the face recognition features, three probabilistic fusion methods are tested. These are the average sum, the generalised product and the maximum rule. Combinations are tested within events and within user collections. This work concludes with a brief discussion on extraction of key images for a representation of each cluster

    Methods for Learning Structured Prediction in Semantic Segmentation of Natural Images

    Get PDF
    Automatic segmentation and recognition of semantic classes in natural images is an important open problem in computer vision. In this work, we investigate three different approaches to recognition: without supervision, with supervision on level of images, and with supervision on the level of pixels. The thesis comprises three parts. The first part introduces a clustering algorithm that optimizes a novel information-theoretic objective function. We show that the proposed algorithm has clear advantages over standard algorithms from the literature on a wide array of datasets. Clustering algorithms are an important building block for higher-level computer vision applications, in particular for semantic segmentation. The second part of this work proposes an algorithm for automatic segmentation and recognition of object classes in natural images, that learns a segmentation model solely from annotation in the form of presence and absence of object classes in images. The third and main part of this work investigates one of the most popular approaches to the task of object class segmentation and semantic segmentation, based on conditional random fields and structured prediction. We investigate several learning algorithms, in particular in combination with approximate inference procedures. We show how structured models for image segmentation can be learned exactly in practical settings, even in the presence of many loops in the underlying neighborhood graphs. The introduced methods provide results advancing the state-of-the-art on two complex benchmark datasets for semantic segmentation, the MSRC-21 Dataset of RGB images and the NYU V2 Dataset or RGB-D images of indoor scenes. Finally, we introduce a software library that al- lows us to perform extensive empirical comparisons of state-of-the-art structured learning approaches. This allows us to characterize their practical properties in a range of applications, in particular for semantic segmentation and object class segmentation.Methoden zum Lernen von Strukturierter Vorhersage in Semantischer Segmentierung von NatĂŒrlichen Bildern Automatische Segmentierung und Erkennung von semantischen Klassen in natĂŒr- lichen Bildern ist ein wichtiges offenes Problem des maschinellen Sehens. In dieser Arbeit untersuchen wir drei möglichen AnsĂ€tze der Erkennung: ohne Überwachung, mit Überwachung auf Ebene von Bildern und mit Überwachung auf Ebene von Pixeln. Diese Arbeit setzt sich aus drei Teilen zusammen. Im ersten Teil der Arbeit schlagen wir einen Clustering-Algorithmus vor, der eine neuartige, informationstheoretische Zielfunktion optimiert. Wir zeigen, dass der vorgestellte Algorithmus ĂŒblichen Standardverfahren aus der Literatur gegenĂŒber klare Vorteile auf vielen verschiedenen DatensĂ€tzen hat. Clustering ist ein wichtiger Baustein in vielen Applikationen des machinellen Sehens, insbesondere in der automatischen Segmentierung. Der zweite Teil dieser Arbeit stellt ein Verfahren zur automatischen Segmentierung und Erkennung von Objektklassen in natĂŒrlichen Bildern vor, das mit Hilfe von Supervision in Form von Klassen-Vorkommen auf Bildern in der Lage ist ein Segmentierungsmodell zu lernen. Der dritte Teil der Arbeit untersucht einen der am weitesten verbreiteten AnsĂ€tze zur semantischen Segmentierung und Objektklassensegmentierung, Conditional Random Fields, verbunden mit Verfahren der strukturierten Vorhersage. Wir untersuchen verschiedene Lernalgorithmen des strukturierten Lernens, insbesondere im Zusammenhang mit approximativer Vorhersage. Wir zeigen, dass es möglich ist trotz des Vorhandenseins von Kreisen in den betrachteten Nachbarschaftsgraphen exakte strukturierte Modelle zur Bildsegmentierung zu lernen. Mit den vorgestellten Methoden bringen wir den Stand der Kunst auf zwei komplexen DatensĂ€tzen zur semantischen Segmentierung voran, dem MSRC-21 Datensatz von RGB-Bildern und dem NYU V2 Datensatz von RGB-D Bildern von Innenraum-Szenen. Wir stellen außerdem eine Software-Bibliothek vor, die es erlaubt einen weitreichenden Vergleich der besten Lernverfahren fĂŒr strukturiertes Lernen durchzufĂŒhren. Unsere Studie erlaubt uns eine Charakterisierung der betrachteten Algorithmen in einer Reihe von Anwendungen, insbesondere der semantischen Segmentierung und Objektklassensegmentierung

    APPLICATION OF IMAGE ANALYSIS TECHNIQUES TO SATELLITE CLOUD MOTION TRACKING

    Get PDF
    Cloud motion wind (CMW) determination requires tracking of individual cloud targets. This is achieved by first clustering and then tracking each cloud cluster. Ideally, different cloud clusters correspond to diiferent pressure levels. Two new clustering techniques have been developed for the identification of cloud types in multi-spectral satellite imagery. The first technique is the Global-Local clustering algorithm. It is a cascade of a histogram clustering algorithm and a dynamic clustering algorithm. The histogram clustering algorithm divides the multi-spectral histogram into'non-overlapped regions, and these regions are used to initialise the dynamic clustering algorithm. The dynamic clustering algorithm assumes clusters have a Gaussian distributed probability density function with diiferent population size and variance. The second technique uses graph theory to exploit the spatial information which is often ignored in per-pixel clustering. The algorithm is in two stages: spatial clustering and spectral clustering. The first stage extracts homogeneous objects in the image using a family of algorithms based on stepwise optimization. This family of algorithms can be further divided into two approaches: Top-down and Bottom-up. The second stage groups similar segments into clusters using a statistical hypothesis test on their similarities. The clusters generated are less noisy along class boundaries and are in hierarchical order. A criterion based on mutual information is derived to monitor the spatial clustering process and to suggest an optimal number of segments. An automated cloud motion tracking program has been developed. Three images (each separated by 30 minutes) are used to track cloud motion and the middle image is clustered using Global-Local clustering prior to tracking. Compared with traditional methods based on raw images, it is found that separation of cloud types before cloud tracking can reduce the ambiguity due to multi-layers of cloud moving at different speeds and direction. Three matching techniques are used and their reliability compared. Target sizes ranging from 4 x 4 to 32 x 32 are tested and their errors compared. The optimum target size for first generation METEOSAT images has also been found.Meteorological Office, Bracknel
    • 

    corecore