1,070 research outputs found

    A survey of outlier detection methodologies

    Get PDF
    Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review

    Computing Vertex-Vertex Dissimilarities Using Random Trees: Application to Clustering in Graphs

    Get PDF
    A current challenge in graph clustering is to tackle the issue of complex networks, i.e, graphs with attributed vertices and/or edges. In this paper, we present GraphTrees, a novel method that relies on random decision trees to compute pairwise dissimilarities between vertices in a graph. We show that using different types of trees, it is possible to extend this framework to graphs where the vertices have attributes. While many existing methods that tackle the problem of clustering vertices in an attributed graph are limited to categorical attributes, GraphTrees can handle heterogeneous types of vertex attributes. Moreover, unlike other approaches, the attributes do not need to be preprocessed. We also show that our approach is competitive with well-known methods in the case of non-attributed graphs in terms of quality of clustering, and provides promising results in the case of vertex-attributed graphs. By extending the use of an already well established approach-the random trees-to graphs, our proposed approach opens new research directions, by lever-aging decades of research on this topic

    Clustering graphs using random trees

    Get PDF
    In this work-in-progress paper, we present GraphTrees, a novel method that relies on random decision trees to compute pairwise distances between vertices in a graph. We show that our approach is competitive with the state of the art methods in the case of non-attributed graphs in terms of quality of clustering. By extending the use of an already ubiquitous approach-the random trees-to graphs, our proposed approach opens new research directions, by leveraging decades of research on this topic

    Computing Vertex-Vertex Dissimilarities Using Random Trees: Application to Clustering in Graphs

    Get PDF
    International audienceA current challenge in graph clustering is to tackle the issue of complex networks, i.e, graphs with attributed vertices and/or edges. In this paper, we present GraphTrees, a novel method that relies on random decision trees to compute pairwise dissimilarities between vertices in a graph. We show that using different types of trees, it is possible to extend this framework to graphs where the vertices have attributes. While many existing methods that tackle the problem of clustering vertices in an attributed graph are limited to categorical attributes, GraphTrees can handle heterogeneous types of vertex attributes. Moreover, unlike other approaches, the attributes do not need to be preprocessed. We also show that our approach is competitive with well-known methods in the case of non-attributed graphs in terms of quality of clustering, and provides promising results in the case of vertex-attributed graphs. By extending the use of an already well established approach-the random trees-to graphs, our proposed approach opens new research directions, by lever-aging decades of research on this topic

    Patterns and trends in ethnic residential segregation in England, 1991-2001: a quantitative and qualitative investigation

    Get PDF
    This thesis brings together the themes of ethnicity, inequalities, locality and community interactions. Through an exploration of the processes of residential segregation it demonstrates the complexity of narratives in English local authorities. The research addresses the policy concerns of community cohesion and regeneration and the role of neighbourhood within these. Using quantitative and qualitative research methods, it offers a thematic analysis of the factors affecting residential segregation. A quantitative analysis of the factors leading to variation in the residential arrangements of ethnic groups is conducted at the local authority level using multivariate techniques. The is followed by a qualitative exploration of these processes that reveals the complexity of the relationships between housing patterns, deprivation, ethnicity, culture and community relations. This is set in a critical realist discourse and in the context of a critique of New Labour discourse on community cohesion

    A new perceptual dissimilarity measure for image retrieval and clustering

    Get PDF
    Image retrieval and clustering are two important tools for analysing and organising images. Dissimilarity measure is central to both image retrieval and clustering. The performance of image retrieval and clustering algorithms depends on the effectiveness of the dissimilarity measure. ‘Minkowski’ distance, or more specifically, ‘Euclidean’ distance, is the most widely used dissimilarity measure in image retrieval and clustering. Euclidean distance depends only on the geometric position of two data instances in the feature space and completely ignores the data distribution. However, data distribution has an effect on human perception. The argument that two data instances in a dense area are more perceptually dissimilar than the same two instances in a sparser area, is proposed by psychologists. Based on this idea, a dissimilarity measure called, ‘mp’, has been proposed to address Euclidean distance’s limitation of ignoring the data distribution. Here, mp relies on data distribution to calculate the dissimilarity between two instances. As prescribed in mp, higher data mass between two data instances implies higher dissimilarity, and vice versa. mp relies only on data distribution and completely ignores the geometric distance in its calculations. In the aggregation of dissimilarities between two instances over all the dimensions in feature space, both Euclidean distance and mp give same priority to all the dimensions. This may result in a situation that the final dissimilarity between two data instances is determined by a few dimensions of feature vectors with relatively much higher values. As a result, the dissimilarity derived may not align well with human perception. The need to address the limitations of Minkowski distance measures, along with the importance of a dissimilarity measure that considers both geometric distance and the perceptual effect of data distribution in measuring dissimilarity between images motivated this thesis. It studies the performance of mp for image retrieval. It investigates a new dissimilarity measure that combines both Euclidean distance and data distribution. In addition to these, it studies the performance of such a dissimilarity measure for image retrieval and clustering. Our performance study of mp for image retrieval shows that relying only on data distribution to measure the dissimilarity results in some situations, where the mp’s measurement is contrary to human perception. This thesis introduces a new dissimilarity measure called, perceptual dissimilarity measure (PDM). PDM considers the perceptual effect of data distribution in combination with Euclidean distance. PDM has two variants, PDM1 and PDM2. PDM1 focuses on improving mp by weighting it using Euclidean distance in situations where mp may not retrieve accurate results. PDM2 considers the effect of data distribution on the perceived dissimilarity measured by Euclidean distance. PDM2 proposes a weighting system for Euclidean distance using a logarithmic transform of data mass. The proposed PDM variants have been used as alternatives to Euclidean distance and mp to improve the accuracy in image retrieval. Our results show that PDM2 has consistently performed the best, compared to Euclidean distance, mp and PDM1. PDM1’s performance was not consistent, although it has performed better than mp in all the experiments, but it could not outperform Euclidean distance in some cases. Following the promising results of PDM2 in image retrieval, we have studied its performance for image clustering. k-means is the most widely used clustering algorithm in scientific and industrial applications. k-medoids is the closest clustering algorithm to k-means. Unlike k-means which works only with Euclidean distance, k-medoids gives the option to choose the arbitrary dissimilarity measure. We have used Euclidean distance, mp and PDM2 as the dissimilarity measure in k-medoids and compared the results with k-means. Our clustering results show that PDM2 has perfromed overally the best. This confirms our retrieval results and identifies PDM2 as a suitable dissimilarity measure for image retrieval and clustering.Doctor of Philosoph

    Analysis and design of multifunctional agricultural landscapes : a graph theoretic approach

    Get PDF
    This thesis deals with the development of quantitative methodologies for the evaluation of landscape functions and their interactions in multifunctional agricultural landscapes. It focuses on the spatial coherence of hedgerow networks for ecological functions and landscape character for perception of landscape identity, and on their integration in a multifunctional and multiscale trade-off analysis. Graph theory provided the basis for new methodologies that are applied in this research

    Journal of the Community Development Society Vol. 28 No. 01

    Get PDF
    Community Development Journal covers political, economic and social programmes that link the activities of people with institutions and government. Issues covered include, for example, community action, village, town and regional planning, community studies, and rural development
    • …
    corecore