86,813 research outputs found
Rank-based dimension reduction for many-criteria populations
Copyright © 2011 ACM13th annual conference on Genetic and Evolutionary Computation (GECCO '11), Dublin, Ireland, 12-16 July 2011Interpreting individuals described by a set of criteria can be a difficult task when the number of criteria is large. Such individuals can be ranked, for instance in terms of their average rank across criteria as well as by each distinct criterion. We therefore investigate criteria selection methods which aim to preserve the average rank of individuals but with fewer criteria. Our experiments show that these methods perform effectively, identifying and removing redundancies within the data, and that they are best incorporated into a multi-objective algorithm
Ordering and Visualisation of Many-objective Populations
In many everyday tasks it is necessary to compare the performance of the individuals in a population described by two or more criteria, for example comparing products in order to decide which is the best to purchase in terms of price and quality. Other examples are the comparison of universities, countries, the infrastructure in a telecommunications network, and the candidate solutions to a multi- or many-objective problem. In all of these cases, visualising the individuals better allows a decision maker to interpret their relative performance. This thesis explores methods for understanding and visualising multi- and many-criterion populations.
Since people cannot generally comprehend more than three spatial dimensions the visualisation of many-criterion populations is a non-trivial task. We address this by generating visualisations based on the dominance relation which defines a structure in the population and we introduce two novel visualisation methods. The first method explicitly illustrates the dominance relationships between individuals as a graph in which individuals are sorted into Pareto shells, and is enhanced using many-criterion ranking methods to produce a finer ordering of individuals. We extend the power index, a method for ranking according to a single criterion, into the many-criterion domain by defining individual quality in terms of tournaments. The second visualisation method uses a new dominance-based distance in conjunction with multi-dimensional scaling, and we show that dominance can be used to identify an intuitive low-dimensional mapping of individuals, placing similar individuals close together. We demonstrate that this method can visualise a population comprising a large number of criteria.
Heatmaps are another common method for presenting high-dimensional data, however they suffer from a drawback of being difficult to interpret if dissimilar individuals are placed close to each other. We apply spectral seriation to produce an ordering of individuals and criteria by which the heatmap is arranged, placing similar individuals and criteria close together. A basic version, computing similarity with the Euclidean distance, is demonstrated, before rank-based alternatives are investigated. The procedure is extended to seriate both the parameter and objective spaces of a multi-objective population in two stages. Since this process describes a trade-off, favouring the ordering of individuals in one space or the other, we demonstrate methods that enhance the visualisation by using an evolutionary optimiser to tune the orderings.
One way of revealing the structure of a population is by highlighting which individuals are extreme. To this end, we provide three definitions of the “edge” of a multi-criterion mutually non-dominating population. All three of the definitions are in terms of dominance, and we show that one of them can be extended to cope with many-criterion populations.
Because they can be difficult to visualise, it is often difficult for a decision maker to comprehend a population consisting of a large number of criteria. We therefore consider criterion selection methods to reduce the dimensionality with a view to preserving the structure of the population as quantified by its rank order. We investigate the efficacy of greedy, hill-climber and evolutionary algorithms and cast the dimension reduction as a multi-objective problem
Randomized Dimension Reduction on Massive Data
Scalability of statistical estimators is of increasing importance in modern
applications and dimension reduction is often used to extract relevant
information from data. A variety of popular dimension reduction approaches can
be framed as symmetric generalized eigendecomposition problems. In this paper
we outline how taking into account the low rank structure assumption implicit
in these dimension reduction approaches provides both computational and
statistical advantages. We adapt recent randomized low-rank approximation
algorithms to provide efficient solutions to three dimension reduction methods:
Principal Component Analysis (PCA), Sliced Inverse Regression (SIR), and
Localized Sliced Inverse Regression (LSIR). A key observation in this paper is
that randomization serves a dual role, improving both computational and
statistical performance. This point is highlighted in our experiments on real
and simulated data.Comment: 31 pages, 6 figures, Key Words:dimension reduction, generalized
eigendecompositon, low-rank, supervised, inverse regression, random
projections, randomized algorithms, Krylov subspace method
Recommended from our members
Simulating multiple faceted variability in single cell RNA sequencing.
The abundance of new computational methods for processing and interpreting transcriptomes at a single cell level raises the need for in silico platforms for evaluation and validation. Here, we present SymSim, a simulator that explicitly models the processes that give rise to data observed in single cell RNA-Seq experiments. The components of the SymSim pipeline pertain to the three primary sources of variation in single cell RNA-Seq data: noise intrinsic to the process of transcription, extrinsic variation indicative of different cell states (both discrete and continuous), and technical variation due to low sensitivity and measurement noise and bias. We demonstrate how SymSim can be used for benchmarking methods for clustering, differential expression and trajectory inference, and for examining the effects of various parameters on their performance. We also show how SymSim can be used to evaluate the number of cells required to detect a rare population under various scenarios
The Importance of Being Clustered: Uncluttering the Trends of Statistics from 1970 to 2015
In this paper we retrace the recent history of statistics by analyzing all
the papers published in five prestigious statistical journals since 1970,
namely: Annals of Statistics, Biometrika, Journal of the American Statistical
Association, Journal of the Royal Statistical Society, series B and Statistical
Science. The aim is to construct a kind of "taxonomy" of the statistical papers
by organizing and by clustering them in main themes. In this sense being
identified in a cluster means being important enough to be uncluttered in the
vast and interconnected world of the statistical research. Since the main
statistical research topics naturally born, evolve or die during time, we will
also develop a dynamic clustering strategy, where a group in a time period is
allowed to migrate or to merge into different groups in the following one.
Results show that statistics is a very dynamic and evolving science, stimulated
by the rise of new research questions and types of data
- …