Search CORE

86,813 research outputs found

Rank-based dimension reduction for many-criteria populations

Author: Everson Richard M.
Fieldsend Jonathan E.
Walker David J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/07/2013
Field of study

Copyright © 2011 ACM13th annual conference on Genetic and Evolutionary Computation (GECCO '11), Dublin, Ireland, 12-16 July 2011Interpreting individuals described by a set of criteria can be a difficult task when the number of criteria is large. Such individuals can be ranked, for instance in terms of their average rank across criteria as well as by each distinct criterion. We therefore investigate criteria selection methods which aim to preserve the average rank of individuals but with fewer criteria. Our experiments show that these methods perform effectively, identifying and removing redundancies within the data, and that they are best incorporated into a multi-objective algorithm

Open Research Exeter

Ordering and Visualisation of Many-objective Populations

Author: Walker David J.
Publication venue: 'Division of Chemical Information and Computer Sciences'
Publication date: 18/04/2013
Field of study

In many everyday tasks it is necessary to compare the performance of the individuals in a population described by two or more criteria, for example comparing products in order to decide which is the best to purchase in terms of price and quality. Other examples are the comparison of universities, countries, the infrastructure in a telecommunications network, and the candidate solutions to a multi- or many-objective problem. In all of these cases, visualising the individuals better allows a decision maker to interpret their relative performance. This thesis explores methods for understanding and visualising multi- and many-criterion populations. Since people cannot generally comprehend more than three spatial dimensions the visualisation of many-criterion populations is a non-trivial task. We address this by generating visualisations based on the dominance relation which defines a structure in the population and we introduce two novel visualisation methods. The first method explicitly illustrates the dominance relationships between individuals as a graph in which individuals are sorted into Pareto shells, and is enhanced using many-criterion ranking methods to produce a finer ordering of individuals. We extend the power index, a method for ranking according to a single criterion, into the many-criterion domain by defining individual quality in terms of tournaments. The second visualisation method uses a new dominance-based distance in conjunction with multi-dimensional scaling, and we show that dominance can be used to identify an intuitive low-dimensional mapping of individuals, placing similar individuals close together. We demonstrate that this method can visualise a population comprising a large number of criteria. Heatmaps are another common method for presenting high-dimensional data, however they suffer from a drawback of being difficult to interpret if dissimilar individuals are placed close to each other. We apply spectral seriation to produce an ordering of individuals and criteria by which the heatmap is arranged, placing similar individuals and criteria close together. A basic version, computing similarity with the Euclidean distance, is demonstrated, before rank-based alternatives are investigated. The procedure is extended to seriate both the parameter and objective spaces of a multi-objective population in two stages. Since this process describes a trade-off, favouring the ordering of individuals in one space or the other, we demonstrate methods that enhance the visualisation by using an evolutionary optimiser to tune the orderings. One way of revealing the structure of a population is by highlighting which individuals are extreme. To this end, we provide three definitions of the “edge” of a multi-criterion mutually non-dominating population. All three of the definitions are in terms of dominance, and we show that one of them can be extended to cope with many-criterion populations. Because they can be difficult to visualise, it is often difficult for a decision maker to comprehend a population consisting of a large number of criteria. We therefore consider criterion selection methods to reduce the dimensionality with a view to preserving the structure of the population as quantified by its rank order. We investigate the efficacy of greedy, hill-climber and evolutionary algorithms and cast the dimension reduction as a multi-objective problem

Open Research Exeter

Randomized Dimension Reduction on Massive Data

Author: Georgiev Stoyan
Mukherjee Sayan
Publication venue
Publication date: 05/11/2013
Field of study

Scalability of statistical estimators is of increasing importance in modern applications and dimension reduction is often used to extract relevant information from data. A variety of popular dimension reduction approaches can be framed as symmetric generalized eigendecomposition problems. In this paper we outline how taking into account the low rank structure assumption implicit in these dimension reduction approaches provides both computational and statistical advantages. We adapt recent randomized low-rank approximation algorithms to provide efficient solutions to three dimension reduction methods: Principal Component Analysis (PCA), Sliced Inverse Regression (SIR), and Localized Sliced Inverse Regression (LSIR). A key observation in this paper is that randomization serves a dual role, improving both computational and statistical performance. This point is highlighted in our experiments on real and simulated data.Comment: 31 pages, 6 figures, Key Words:dimension reduction, generalized eigendecompositon, low-rank, supervised, inverse regression, random projections, randomized algorithms, Krylov subspace method

arXiv.org e-Print Archive

CiteSeerX

Recommended from our members

Simulating multiple faceted variability in single cell RNA sequencing.

Author: Xu Chenling
Yosef Nir
Zhang Xiuwei
Publication venue: eScholarship, University of California
Publication date: 01/06/2019
Field of study

The abundance of new computational methods for processing and interpreting transcriptomes at a single cell level raises the need for in silico platforms for evaluation and validation. Here, we present SymSim, a simulator that explicitly models the processes that give rise to data observed in single cell RNA-Seq experiments. The components of the SymSim pipeline pertain to the three primary sources of variation in single cell RNA-Seq data: noise intrinsic to the process of transcription, extrinsic variation indicative of different cell states (both discrete and continuous), and technical variation due to low sensitivity and measurement noise and bias. We demonstrate how SymSim can be used for benchmarking methods for clustering, differential expression and trajectory inference, and for examining the effects of various parameters on their performance. We also show how SymSim can be used to evaluate the number of cells required to detect a rare population under various scenarios

eScholarship - University of California

The Importance of Being Clustered: Uncluttering the Trends of Statistics from 1970 to 2015

Author: Anderlucci Laura
Montanari Angela
Viroli Cinzia
Publication venue
Publication date: 01/01/2017
Field of study

In this paper we retrace the recent history of statistics by analyzing all the papers published in five prestigious statistical journals since 1970, namely: Annals of Statistics, Biometrika, Journal of the American Statistical Association, Journal of the Royal Statistical Society, series B and Statistical Science. The aim is to construct a kind of "taxonomy" of the statistical papers by organizing and by clustering them in main themes. In this sense being identified in a cluster means being important enough to be uncluttered in the vast and interconnected world of the statistical research. Since the main statistical research topics naturally born, evolve or die during time, we will also develop a dynamic clustering strategy, where a group in a time period is allowed to migrate or to merge into different groups in the following one. Results show that statistics is a very dynamic and evolving science, stimulated by the rise of new research questions and types of data

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna