84,809 research outputs found

    Rank-based dimension reduction for many-criteria populations

    Get PDF
    Copyright © 2011 ACM13th annual conference on Genetic and Evolutionary Computation (GECCO '11), Dublin, Ireland, 12-16 July 2011Interpreting individuals described by a set of criteria can be a difficult task when the number of criteria is large. Such individuals can be ranked, for instance in terms of their average rank across criteria as well as by each distinct criterion. We therefore investigate criteria selection methods which aim to preserve the average rank of individuals but with fewer criteria. Our experiments show that these methods perform effectively, identifying and removing redundancies within the data, and that they are best incorporated into a multi-objective algorithm

    Ordering and Visualisation of Many-objective Populations

    Get PDF
    In many everyday tasks it is necessary to compare the performance of the individuals in a population described by two or more criteria, for example comparing products in order to decide which is the best to purchase in terms of price and quality. Other examples are the comparison of universities, countries, the infrastructure in a telecommunications network, and the candidate solutions to a multi- or many-objective problem. In all of these cases, visualising the individuals better allows a decision maker to interpret their relative performance. This thesis explores methods for understanding and visualising multi- and many-criterion populations. Since people cannot generally comprehend more than three spatial dimensions the visualisation of many-criterion populations is a non-trivial task. We address this by generating visualisations based on the dominance relation which defines a structure in the population and we introduce two novel visualisation methods. The first method explicitly illustrates the dominance relationships between individuals as a graph in which individuals are sorted into Pareto shells, and is enhanced using many-criterion ranking methods to produce a finer ordering of individuals. We extend the power index, a method for ranking according to a single criterion, into the many-criterion domain by defining individual quality in terms of tournaments. The second visualisation method uses a new dominance-based distance in conjunction with multi-dimensional scaling, and we show that dominance can be used to identify an intuitive low-dimensional mapping of individuals, placing similar individuals close together. We demonstrate that this method can visualise a population comprising a large number of criteria. Heatmaps are another common method for presenting high-dimensional data, however they suffer from a drawback of being difficult to interpret if dissimilar individuals are placed close to each other. We apply spectral seriation to produce an ordering of individuals and criteria by which the heatmap is arranged, placing similar individuals and criteria close together. A basic version, computing similarity with the Euclidean distance, is demonstrated, before rank-based alternatives are investigated. The procedure is extended to seriate both the parameter and objective spaces of a multi-objective population in two stages. Since this process describes a trade-off, favouring the ordering of individuals in one space or the other, we demonstrate methods that enhance the visualisation by using an evolutionary optimiser to tune the orderings. One way of revealing the structure of a population is by highlighting which individuals are extreme. To this end, we provide three definitions of the “edge” of a multi-criterion mutually non-dominating population. All three of the definitions are in terms of dominance, and we show that one of them can be extended to cope with many-criterion populations. Because they can be difficult to visualise, it is often difficult for a decision maker to comprehend a population consisting of a large number of criteria. We therefore consider criterion selection methods to reduce the dimensionality with a view to preserving the structure of the population as quantified by its rank order. We investigate the efficacy of greedy, hill-climber and evolutionary algorithms and cast the dimension reduction as a multi-objective problem

    Randomized Dimension Reduction on Massive Data

    Full text link
    Scalability of statistical estimators is of increasing importance in modern applications and dimension reduction is often used to extract relevant information from data. A variety of popular dimension reduction approaches can be framed as symmetric generalized eigendecomposition problems. In this paper we outline how taking into account the low rank structure assumption implicit in these dimension reduction approaches provides both computational and statistical advantages. We adapt recent randomized low-rank approximation algorithms to provide efficient solutions to three dimension reduction methods: Principal Component Analysis (PCA), Sliced Inverse Regression (SIR), and Localized Sliced Inverse Regression (LSIR). A key observation in this paper is that randomization serves a dual role, improving both computational and statistical performance. This point is highlighted in our experiments on real and simulated data.Comment: 31 pages, 6 figures, Key Words:dimension reduction, generalized eigendecompositon, low-rank, supervised, inverse regression, random projections, randomized algorithms, Krylov subspace method

    The Importance of Being Clustered: Uncluttering the Trends of Statistics from 1970 to 2015

    Full text link
    In this paper we retrace the recent history of statistics by analyzing all the papers published in five prestigious statistical journals since 1970, namely: Annals of Statistics, Biometrika, Journal of the American Statistical Association, Journal of the Royal Statistical Society, series B and Statistical Science. The aim is to construct a kind of "taxonomy" of the statistical papers by organizing and by clustering them in main themes. In this sense being identified in a cluster means being important enough to be uncluttered in the vast and interconnected world of the statistical research. Since the main statistical research topics naturally born, evolve or die during time, we will also develop a dynamic clustering strategy, where a group in a time period is allowed to migrate or to merge into different groups in the following one. Results show that statistics is a very dynamic and evolving science, stimulated by the rise of new research questions and types of data
    corecore