873 research outputs found
Robust And Scalable Learning Of Complex Dataset Topologies Via Elpigraph
Large datasets represented by multidimensional data point clouds often
possess non-trivial distributions with branching trajectories and excluded
regions, with the recent single-cell transcriptomic studies of developing
embryo being notable examples. Reducing the complexity and producing compact
and interpretable representations of such data remains a challenging task. Most
of the existing computational methods are based on exploring the local data
point neighbourhood relations, a step that can perform poorly in the case of
multidimensional and noisy data. Here we present ElPiGraph, a scalable and
robust method for approximation of datasets with complex structures which does
not require computing the complete data distance matrix or the data point
neighbourhood graph. This method is able to withstand high levels of noise and
is capable of approximating complex topologies via principal graph ensembles
that can be combined into a consensus principal graph. ElPiGraph deals
efficiently with large and complex datasets in various fields from biology,
where it can be used to infer gene dynamics from single-cell RNA-Seq, to
astronomy, where it can be used to explore complex structures in the
distribution of galaxies.Comment: 32 pages, 14 figure
On improving the performance of optimistic distributed simulations
This report investigates means of improving the performance of optimistic distributed simulations
without affecting the simulation accuracy. We argue that existing clustering algorithms
are not adequate for application in distributed simulations, and outline some characteristics
of an ideal algorithm that could be applied in this field. This report is structured as follows.
We start by introducing the area of distributed simulation. Following a comparison of the
dominant protocols used in distributed simulation, we elaborate on the current approaches
of improving the simulation performance, using computation efficient techniques, exploiting
the hardware configuration of processors, optimizations that can be derived from the
simulation scenario, etc. We introduce the core characteristics of clustering approaches and
argue that these cannot be applied in real-life distributed simulation problems. We present
a typical distributed simulation setting and elaborate on the reasons that existing clustering
approaches are not expected to improve the performance of a distributed simulation. We
introduce a prototype distributed simulation platform that has been developed in the scope
of this research, focusing on the area of emergency response and specifically building evacuation.
We continue by outlining our current work on this issue, and finally, we end this
report by outlining next actions which could be made in this field
Visual Analysis of High-Dimensional Point Clouds using Topological Abstraction
This thesis is about visualizing a kind of data that is trivial to process by computers but difficult to imagine by humans because nature does not allow for intuition with this type of information: high-dimensional data. Such data often result from representing observations of objects under various aspects or with different properties. In many applications, a typical, laborious task is to find related objects or to group those that are similar to each other. One classic solution for this task is to imagine the data as vectors in a Euclidean space with object variables as dimensions. Utilizing Euclidean distance as a measure of similarity, objects with similar properties and values accumulate to groups, so-called clusters, that are exposed by cluster analysis on the high-dimensional point cloud. Because similar vectors can be thought of as objects that are alike in terms of their attributes, the point cloud\''s structure and individual cluster properties, like their size or compactness, summarize data categories and their relative importance. The contribution of this thesis is a novel analysis approach for visual exploration of high-dimensional point clouds without suffering from structural occlusion. The work is based on implementing two key concepts: The first idea is to discard those geometric properties that cannot be preserved and, thus, lead to the typical artifacts. Topological concepts are used instead to shift away the focus from a point-centered view on the data to a more structure-centered perspective. The advantage is that topology-driven clustering information can be extracted in the data\''s original domain and be preserved without loss in low dimensions. The second idea is to split the analysis into a topology-based global overview and a subsequent geometric local refinement. The occlusion-free overview enables the analyst to identify features and to link them to other visualizations that permit analysis of those properties not captured by the topological abstraction, e.g. cluster shape or value distributions in particular dimensions or subspaces. The advantage of separating structure from data point analysis is that restricting local analysis only to data subsets significantly reduces artifacts and the visual complexity of standard techniques. That is, the additional topological layer enables the analyst to identify structure that was hidden before and to focus on particular features by suppressing irrelevant points during local feature analysis. This thesis addresses the topology-based visual analysis of high-dimensional point clouds for both the time-invariant and the time-varying case. Time-invariant means that the points do not change in their number or positions. That is, the analyst explores the clustering of a fixed and constant set of points. The extension to the time-varying case implies the analysis of a varying clustering, where clusters appear as new, merge or split, or vanish. Especially for high-dimensional data, both tracking---which means to relate features over time---but also visualizing changing structure are difficult problems to solve
Some new techniques for pattern recognition research and lung sound signal analysis
This thesis describes the results of a collaborative research programme between the Department of Electronics & Electrical Engineering, University of Glasgow, and the Centre for Respiratory Investigation, Glasgow Royal Infirmary. The research was initially aimed at studying lung sound using signal processing and pattern recognition techniques. The use of pattern recogntion techniques was largely confined to exploratory data analysis which led to an interest in the methods themselves. A study was carried out to apply recent research in computational geometry to clustering Two geometric structures, the Gabriel graph and the relative neighbourhood graph, are both defined by a region of influence. A generalization of these graphs is used to find the conditions under which graphs defined by a region of influence are connected and planar. The Gabriel graph may be considered to be just planar and the relative neighbourhood graph to be just connected. From this two variable regions of influence were defined that were aimed at producing disconnected graphs and hence a partitioning of the data set, A hierarchic clustering based on relative distance may be generated by varying the size of the region of influence. The value of the clustering method is examined in terms of admissibility criteria and by a case study. An interactive display to complement the graph theoretical clustering was also developed. This display allows a partition in the clustering to be examined. The relationship between clusters in the partition may be studied by using the partition to define a contracted graph which is then displayed. Subgraphs of the original graph may be used to provide displays of individual clusterso This display should provide additional information about a partition and hence allow the user to understand the data better. The remainder of the work in this thesis concerns the application of pattern recogntition techniques to the analysis of lung sound signals. Breath sound was analysed using frequency domain methods since it is basically a continuous signal. Initially, a rather ad hoc method was used for feature extraction which was based on a piecewise constant approximation to the amplitude spectrum. While this method provided a useful set of features, it is clear that more systematic methods are required. These methods were used to study lung sound in four groups of patients: (1) normal patients, (2) patients with asbestosis, (3) patients with cryptogenic fibrosing alveolitis (CFA) and (4) patients with interstitial pulmonary oedema. The data sets were analysed using principal components analysis and the new graph theroretical clustering method (this data was used as a case study for the clustering method). Three groups of patients could be identified from the data;- (a) normal subjects, (b) patients with fibrosis of the lungs (asbestosis & CFA) and (c) patients with pulmonary oedema. These results suggest that lung sound may be able to make a useful contribution to non-invasive diagnosis. However more extensive studies are required before the real value of lung sound in diagnosis is established
Organising a photograph collection based on human appearance
This thesis describes a complete framework for organising digital photographs in an unsupervised manner, based on the appearance of people captured in the photographs. Organising a collection of photographs manually, especially providing the identities of people captured in photographs, is a time consuming task. Unsupervised grouping of images containing similar persons makes annotating names easier (as a group of images can be named at once) and enables quick search based on query by example.
The full process of unsupervised clustering is discussed in this thesis. Methods for locating facial components are discussed and a technique based on colour
image segmentation is proposed and tested. Additionally a method based on the Principal Component Analysis template is tested, too. These provide eye locations required for acquiring a normalised facial image. This image is then preprocessed by a histogram equalisation and feathering, and the features of MPEG-7 face recognition descriptor are extracted. A distance measure proposed in the MPEG-7 standard is used as a similarity measure.
Three approaches to grouping that use only face recognition features for clustering are analysed. These are modified k-means, single-link and a method based on a nearest neighbour classifier. The nearest neighbour-based technique is chosen for further experiments with fusing information from several sources. These sources are context-based such as events (party, trip, holidays), the ownership of photographs, and content-based such as information about the colour and texture of the bodies of humans appearing in photographs. Two techniques are proposed for fusing event and ownership (user) information with the face recognition features: a Transferable Belief Model (TBM) and three level clustering. The three level clustering is carried out at âeventâ level, âuserâ level and âcollectionâ level. The latter technique proves to be most efficient.
For combining body information with the face recognition features, three probabilistic fusion methods are tested. These are the average sum, the generalised product and the maximum rule. Combinations are tested within events and within user collections. This work concludes with a brief discussion on extraction of key images for a representation of each cluster
Methods for Learning Structured Prediction in Semantic Segmentation of Natural Images
Automatic segmentation and recognition of semantic classes in natural images is an important open problem in computer vision. In this work, we investigate three different approaches to recognition: without supervision, with supervision on level of images, and with supervision on the level of pixels. The thesis comprises three parts. The first part introduces a clustering algorithm that optimizes a novel information-theoretic objective function. We show that the proposed algorithm has clear advantages over standard algorithms from the literature on a wide array of datasets. Clustering algorithms are an important building block for higher-level computer vision applications, in particular for semantic segmentation. The second part of this work proposes an algorithm for automatic segmentation and recognition of object classes in natural images, that learns a segmentation model solely from annotation in the form of presence and absence of object classes in images. The third and main part of this work investigates one of the most popular approaches to the task of object class segmentation and semantic segmentation, based on conditional random fields and structured prediction. We investigate several learning algorithms, in particular in combination with approximate inference procedures. We show how structured models for image segmentation can be learned exactly in practical settings, even in the presence of many loops in the underlying neighborhood graphs. The introduced methods provide results advancing the state-of-the-art on two complex benchmark datasets for semantic segmentation, the MSRC-21 Dataset of RGB images and the NYU V2 Dataset or RGB-D images of indoor scenes. Finally, we introduce a software library that al- lows us to perform extensive empirical comparisons of state-of-the-art structured learning approaches. This allows us to characterize their practical properties in a range of applications, in particular for semantic segmentation and object class segmentation.Methoden zum Lernen von Strukturierter Vorhersage in Semantischer Segmentierung von NatĂŒrlichen Bildern Automatische Segmentierung und Erkennung von semantischen Klassen in natĂŒr- lichen Bildern ist ein wichtiges offenes Problem des maschinellen Sehens. In dieser Arbeit untersuchen wir drei möglichen AnsĂ€tze der Erkennung: ohne Ăberwachung, mit Ăberwachung auf Ebene von Bildern und mit Ăberwachung auf Ebene von Pixeln. Diese Arbeit setzt sich aus drei Teilen zusammen. Im ersten Teil der Arbeit schlagen wir einen Clustering-Algorithmus vor, der eine neuartige, informationstheoretische Zielfunktion optimiert. Wir zeigen, dass der vorgestellte Algorithmus ĂŒblichen Standardverfahren aus der Literatur gegenĂŒber klare Vorteile auf vielen verschiedenen DatensĂ€tzen hat. Clustering ist ein wichtiger Baustein in vielen Applikationen des machinellen Sehens, insbesondere in der automatischen Segmentierung. Der zweite Teil dieser Arbeit stellt ein Verfahren zur automatischen Segmentierung und Erkennung von Objektklassen in natĂŒrlichen Bildern vor, das mit Hilfe von Supervision in Form von Klassen-Vorkommen auf Bildern in der Lage ist ein Segmentierungsmodell zu lernen. Der dritte Teil der Arbeit untersucht einen der am weitesten verbreiteten AnsĂ€tze zur semantischen Segmentierung und Objektklassensegmentierung, Conditional Random Fields, verbunden mit Verfahren der strukturierten Vorhersage. Wir untersuchen verschiedene Lernalgorithmen des strukturierten Lernens, insbesondere im Zusammenhang mit approximativer Vorhersage. Wir zeigen, dass es möglich ist trotz des Vorhandenseins von Kreisen in den betrachteten Nachbarschaftsgraphen exakte strukturierte Modelle zur Bildsegmentierung zu lernen. Mit den vorgestellten Methoden bringen wir den Stand der Kunst auf zwei komplexen DatensĂ€tzen zur semantischen Segmentierung voran, dem MSRC-21 Datensatz von RGB-Bildern und dem NYU V2 Datensatz von RGB-D Bildern von Innenraum-Szenen. Wir stellen auĂerdem eine Software-Bibliothek vor, die es erlaubt einen weitreichenden Vergleich der besten Lernverfahren fĂŒr strukturiertes Lernen durchzufĂŒhren. Unsere Studie erlaubt uns eine Charakterisierung der betrachteten Algorithmen in einer Reihe von Anwendungen, insbesondere der semantischen Segmentierung und Objektklassensegmentierung
APPLICATION OF IMAGE ANALYSIS TECHNIQUES TO SATELLITE CLOUD MOTION TRACKING
Cloud motion wind (CMW) determination requires tracking of individual cloud targets.
This is achieved by first clustering and then tracking each cloud cluster. Ideally, different
cloud clusters correspond to diiferent pressure levels. Two new clustering techniques
have been developed for the identification of cloud types in multi-spectral satellite imagery.
The first technique is the Global-Local clustering algorithm. It is a cascade of a
histogram clustering algorithm and a dynamic clustering algorithm. The histogram
clustering algorithm divides the multi-spectral histogram into'non-overlapped regions,
and these regions are used to initialise the dynamic clustering algorithm. The dynamic
clustering algorithm assumes clusters have a Gaussian distributed probability density
function with diiferent population size and variance.
The second technique uses graph theory to exploit the spatial information which is
often ignored in per-pixel clustering. The algorithm is in two stages: spatial clustering
and spectral clustering. The first stage extracts homogeneous objects in the image
using a family of algorithms based on stepwise optimization. This family of algorithms
can be further divided into two approaches: Top-down and Bottom-up. The second
stage groups similar segments into clusters using a statistical hypothesis test on their
similarities. The clusters generated are less noisy along class boundaries and are in
hierarchical order. A criterion based on mutual information is derived to monitor the
spatial clustering process and to suggest an optimal number of segments.
An automated cloud motion tracking program has been developed. Three images
(each separated by 30 minutes) are used to track cloud motion and the middle image
is clustered using Global-Local clustering prior to tracking. Compared with traditional
methods based on raw images, it is found that separation of cloud types before cloud
tracking can reduce the ambiguity due to multi-layers of cloud moving at different
speeds and direction. Three matching techniques are used and their reliability compared.
Target sizes ranging from 4 x 4 to 32 x 32 are tested and their errors compared. The
optimum target size for first generation METEOSAT images has also been found.Meteorological Office, Bracknel
- âŠ