Search CORE

132 research outputs found

Mathematical optimization for the visualization of complex datasets

Author: Guerrero Lozano Vanesa
Publication venue
Publication date: 26/06/2017
Field of study

This PhD dissertation focuses on developing new Mathematical Optimization models and solution approaches which help to gain insight into complex data structures arising in Information Visualization. The approaches developed in this thesis merge concepts from Multivariate Data Analysis and Mathematical Optimization, bridging theoretical mathematics with real life problems. The usefulness of Information Visualization lies with its power to improve interpretability and decision making from the unknown phenomena described by raw data, as fully discussed in Chapter 1. In particular, datasets involving frequency distributions and proximity relations, which even might vary over the time, are the ones studied in this thesis. Frameworks to visualize such enclosed information, which make use of Mixed Integer (Non)linear Programming and Difference of Convex tools, are formally proposed. Algorithmic approaches such as Large Neighborhood Search or Difference of Convex Algorithm enable us to develop matheuristics to handle such models. More specifically, Chapter 2 addresses the problem of visualizing a frequency distribution and an adjacency relation attached to a set of individuals. This information is represented using a rectangular map, i.e., a subdivision of a rectangle into rectangular portions so that their areas reflect the frequencies, and the adjacencies between portions represent the adjacencies between the individuals. The visualization problem is formulated as a Mixed Integer Linear Programming model, and a matheuristic that has this model at its heart is proposed. Chapter 3 generalizes the model presented in the previous chapter by developing a visualization framework which handles simultaneously the representation of a frequency distribution and a dissimilarity relation. This framework consists of a partition of a given rectangle into piecewise rectangular portions so that the areas of the regions represent the frequencies and the distances between them represent the dissimilarities. This visualization problem is formally stated as a Mixed Integer Nonlinear Programming model, which is solved by means of a matheuristic based on Large Neighborhood Search. Contrary to previous chapters in which a partition of the visualization region is sought, Chapter 4 addresses the problem of visualizing a set of individuals, which has attached a dissimilarity measure and a frequency distribution, without necessarily cov-ering the visualization region. In this visualization problem individuals are depicted as convex bodies whose areas are proportional to the given frequencies. The aim is to determine the location of the convex bodies in the visualization region. In order to solve this problem, which generalizes the standard Multidimensional Scaling, Difference of Convex tools are used. In Chapter 5, the model stated in the previous chapter is extended to the dynamic case, namely considering that frequencies and dissimilarities are observed along a set of time periods. The solution approach combines Difference of Convex techniques with Nonconvex Quadratic Binary Optimization. All the approaches presented are tested in real datasets. Finally, Chapter 6 closes this thesis with general conclusions and future lines of research.Esta tesis se centra en desarrollar nuevos modelos y algoritmos basados en la Optimización Matemática que ayuden a comprender estructuras de datos complejas frecuentes en el área de Visualización de la Información. Las metodologías propuestas fusionan conceptos de Análisis de Datos Multivariantes y de Optimización Matemática, aunando las matemáticas teóricas con problemas reales. Como se analiza en el Capítulo 1, una adecuada visualización de los datos ayuda a mejorar la interpretabilidad de los fenómenos desconocidos que describen, así como la toma de decisiones. Concretamente, esta tesis se centra en visualizar datos que involucran distribuciones de frecuencias y relaciones de proximidad, pudiendo incluso ambas variar a lo largo del tiempo. Se proponen diferentes herramientas para visualizar dicha información, basadas tanto en la Optimización (No) Lineal Entera Mixta como en la optimización de funciones Diferencia de Convexas. Además, metodologías como la Búsqueda por Entornos Grandes y el Algoritmo DCA permiten el desarrollo de mateheurísticas para resolver dichos modelos. Concretamente, el Capítulo 2 trata el problema de visualizar simultáneamente una distribución de frequencias y una relación de adyacencias en un conjunto de individuos. Esta información se representa a través de un mapa rectangular, es decir, una subdivisión de un rectángulo en porciones rectangulares, de manera que las áreas de estas porciones representen las frecuencias y las adyacencias entre las porciones representen las adyacencias entre los individuos. Este problema de visualización se formula con la ayuda de la Optimización Lineal Entera Mixta. Además, se propone una mateheurística basada en este modelo como método de resolución. En el Capítulo 3 se generaliza el modelo presentado en el capítulo anterior, construyendo una herramienta que permite visualizar simultáneamente una distribución de frecuencias y una relación de disimilaridades. Dicha visualización se realiza mediante la partición de un rectángulo en porciones rectangulares a trozos de manera que el área de las porciones refleje la distribución de frecuencias y las distancias entre las mismas las disimilaridades. Se plantea un modelo No Lineal Entero Mixto para este problema de visualización, que es resuelto a través de una mateheurística basada en la Búsqueda por Entornos Grandes. En contraposición a los capítulos anteriores, en los que se busca una partición de la región de visualización, el Capítulo 4 trata el problema de representar una distribución de frecuencias y una relación de disimilaridad sobre un conjunto de individuos, sin forzar a que haya que recubrir dicha región de visualización. En este modelo de visualización los individuos son representados como cuerpos convexos cuyas áreas son proporcionales a las frecuencias dadas. El objetivo es determinar la localización de dichos cuerpos convexos dentro de la región de visualización. Para resolver este problema, que generaliza el tradicional Escalado Multidimensional, se utilizan técnicas de optimización basadas en funciones Diferencia de Convexas. En el Capítulo 5, se extiende el modelo desarrollado en el capítulo anterior para el caso en el que los datos son dinámicos, es decir, las frecuencias y disimilaridades se observan a lo largo de varios instantes de tiempo. Se emplean técnicas de optimización de funciones Diferencias de Convexas así como Optimización Cuadrática Binaria No Convexa para la resolución del modelo. Todas las metodologías propuestas han sido testadas en datos reales. Finalmente, el Capítulo 6 contiene las conclusiones a esta tesis, así como futuras líneas de investigación.Premio Extraordinario de Doctorado U

idUS. Depósito de Investigación Universidad de Sevilla

VMap: An Interactive Rectangular Space-filling Visualization for Map-like Vertex-centric Graph Exploration

Author: Shen Han-Wei
Xu Jiayi
Publication venue
Publication date: 31/05/2023
Field of study

We present VMap, a map-like rectangular space-filling visualization, to perform vertex-centric graph exploration. Existing visualizations have limited support for quality optimization among rectangular aspect ratios, vertex-edge intersection, and data encoding accuracy. To tackle this problem, VMap integrates three novel components: (1) a desired-aspect-ratio (DAR) rectangular partitioning algorithm, (2) a two-stage rectangle adjustment algorithm, and (3) a simulated annealing based heuristic optimizer. First, to generate a rectangular space-filling layout of an input graph, we subdivide the 2D embedding of the graph into rectangles with optimization of rectangles' aspect ratios toward a desired aspect ratio. Second, to route graph edges between rectangles without vertex-edge occlusion, we devise a two-stage algorithm to adjust a rectangular layout to insert border space between rectangles. Third, to produce and arrange rectangles by considering multiple visual criteria, we design a simulated annealing based heuristic optimization to adjust vertices' 2D embedding to support trade-offs among aspect ratio quality and the encoding accuracy of vertices' weights and adjacency. We evaluated the effectiveness of VMap on both synthetic and application datasets. The resulting rectangular layout has better aspect ratio quality on synthetic data compared with the existing method for the rectangular partitioning of 2D points. On three real-world datasets, VMap achieved better encoding accuracy and attained faster generation speed compared with existing methods on graphs' rectangular layout generation. We further illustrate the usefulness of VMap for vertex-centric graph exploration through three case studies on visualizing social networks, representing academic communities, and displaying geographic information.Comment: Submitted to IEEE Visualization Conference (IEEE VIS) 2019 and 202

arXiv.org e-Print Archive

Visualizing proportions and dissimilarities by space-filling maps: a large neighborhood search approach

Author: Carrizosa Priego Emilio José
Guerrero Lozano Vanesa
Romero Morales María Dolores
Publication venue: 'Elsevier BV'
Publication date: 01/02/2017
Field of study

In this paper we address the problem of visualizing a set of individuals, which have attached a statistical value given as a proportion, and a dissimilarity measure. Each individual is represented as a region within the unit square, in such a way that the area of the regions represent the proportions and the distances between them represent the dissimilarities. To enhance the interpretability of the representation, the regions are required to satisfy two properties. First, they must form a partition of the unit square, namely, the portions in which it is divided must cover its area without overlapping. Second, the portions must be made of a connected union of rectangles which verify the so-called box-connectivity constraints, yielding a visualization map called Space-filling Box-connected Map (SBM). The construction of an SBM is formally stated as a mathematical optimization problem, which is solved heuristically by using the Large Neighborhood Search technique. The methodology proposed in this paper is applied to three real-world datasets: the first one concerning financial markets in Europe and Asia, the second one about the letters in the English alphabet, and finally the provinces of The Netherlands as a geographical application.Ministerio de Economía y CompetitividadJunta de AndalucíaEuropean Regional Development Fun

idUS. Depósito de Investigación Universidad de Sevilla

On mathematical optimization for clustering categories in contingency tables

Author: Carrizosa Emilio
Guerrero Lozano Vanesa
Romero Morales Dolores
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/06/2022
Field of study

Many applications in data analysis study whether two categorical variables are independent using a function of the entries of their contingency table. Often, the categories of the variables, associated with the rows and columns of the table, are grouped, yielding a less granular representation of the categorical variables. The purpose of this is to attain reasonable sample sizes in the cells of the table and, more importantly, to incorporate expert knowledge on the allowable groupings. However, it is known that the conclusions on independence depend, in general, on the chosen granularity, as in the Simpson paradox. In this paper we propose a methodology to, for a given contingency table and a fixed granularity, find a clustered table with the highest χ2 statistic. Repeating this procedure for different values of the granularity, we can either identify an extreme grouping, namely the largest granularity for which the statistical dependence is still detected, or conclude that it does not exist and that the two variables are dependent regardless of the size of the clustered table. For this problem, we propose an assignment mathematical formulation and a set partitioning one. Our approach is flexible enough to include constraints on the desirable structure of the clusters, such as must-link or cannot-link constraints on the categories that can, or cannot, be merged together, and ensure reasonable sample sizes in the cells of the clustered table from which trustful statistical conclusions can be derived. We illustrate the usefulness of our methodology using a dataset of a medical study.Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This research has been financed in part by research projects EC H2020 MSCA RISE NeEDS (Grant agreement ID: 822214), FQM-329, P18-FR-2369 and US-1381178 (Junta de Andalucía, with FEDER Funds), PID2019-110886RB-I00 and PID2019-104901RB-I00 (funded by MCIN/AEI/10.13039/501100011033). This support is gratefully acknowledged

Universidad Carlos III de Madrid e-Archivo

SpharaPy: a Python toolbox for spatial harmonic analysis of non-uniformly sampled data

Author: Eichardt Roland
Graichen Uwe
Haueisen Jens
Publication venue: 'Elsevier BV'
Publication date: 09/08/2019
Field of study

Digitale Bibliothek Thüringen

Proceedings of the GIS Research UK 18th Annual Conference GISRUK 2010

Author
Publication venue: University College London
Publication date: 01/04/2010
Field of study

This volume holds the papers from the 18th annual GIS Research UK (GISRUK). This year the conference, hosted at University College London (UCL), from Wednesday 14 to Friday 16 April 2010. The conference covered the areas of core geographic information science research as well as applications domains such as crime and health and technological developments in LBS and the geoweb. UCL’s research mission as a global university is based around a series of Grand Challenges that affect us all, and these were accommodated in GISRUK 2010. The overarching theme this year was “Global Challenges”, with specific focus on the following themes: * Crime and Place * Environmental Change * Intelligent Transport * Public Health and Epidemiology * Simulation and Modelling * London as a global city * The geoweb and neo-geography * Open GIS and Volunteered Geographic Information * Human-Computer Interaction and GIS Traditionally, GISRUK has provided a platform for early career researchers as well as those with a significant track record of achievement in the area. As such, the conference provides a welcome blend of innovative thinking and mature reflection. GISRUK is the premier academic GIS conference in the UK and we are keen to maintain its outstanding record of achievement in developing GIS in the UK and beyond

UCL Discovery

Space-optimized texture atlases

Author: Martínez Bayona Jonàs
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2009
Field of study

Texture atlas parameterization provides an effective way to map a variety of colour and data attributes from 2D texture domains onto polygonal surface meshes. Most of the existing literature focus on how to build seamless texture atlases for continuous photometric detail, but little e ort has been devoted to devise e cient techniques for encoding self-repeating, uncontinuous signals such as building facades. We present a perception-based scheme for generating space-optimized texture atlases speci cally designed for intentionally non-bijective parameterizations. Our scheme combines within-chart tiling support with intelligent packing and perceptual measures for assigning texture space in accordance to the amount of information contents of the image and on its saliency. We demonstrate our optimization scheme in the context of real-time navigation through a gigatexel urban model of an European city. Our scheme achieves signi cant compression ratios and speed-up factors with visually indistinguishable results. We developed a technique that generates space-optimized texture atlases for the particular encoding of uncontinuous signals projected onto geometry. The scene is partitioned using a texture atlas tree that contains for each node a texture atlas. The leaf nodes of the tree contain scene geometry. The level of detail is controlled by traversing the tree and selecting the appropriate texture atlas for a given viewer position and orientation. In a preprocessing step, textures associated to each texture atlas node of the tree are packed. Textures are resized according to a given user-de ned texel size and the size of the geometry that are projected onto. We also use perceptual measures to assign texture space in accordance to image detail. We also explore different techniques for supporting texture wrapping of uncontinuous signals, which involved the development of e cient techniques for compressing texture coordinates via the GPU. Our approach supports texture ltering and DXTC compression without noticeable artifacts. We have implemented a prototype version of our space-optimized texture atlases technique and used it to render the 3D city model of Barcelona achieving interactive rendering frame rates. The whole model was composed by more than three million triangles and contained more than twenty thousand different textures representing the building facades with an average original resolution of 512 pixels per texture. Our scheme achieves up 100:1 compression ratios and speed-up factors of 20 with visually indistinguishable results

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Holistic interpretation of visual data based on topology:semantic segmentation of architectural facades

Author: Fathalla Radwa
Publication venue
Publication date
Field of study

The work presented in this dissertation is a step towards effectively incorporating contextual knowledge in the task of semantic segmentation. To date, the use of context has been confined to the genre of the scene with a few exceptions in the field. Research has been directed towards enhancing appearance descriptors. While this is unarguably important, recent studies show that computer vision has reached a near-human level of performance in relying on these descriptors when objects have stable distinctive surface properties and in proper imaging conditions. When these conditions are not met, humans exploit their knowledge about the intrinsic geometric layout of the scene to make local decisions. Computer vision lags behind when it comes to this asset. For this reason, we aim to bridge the gap by presenting algorithms for semantic segmentation of building facades making use of scene topological aspects. We provide a classification scheme to carry out segmentation and recognition simultaneously.The algorithm is able to solve a single optimization function and yield a semantic interpretation of facades, relying on the modeling power of probabilistic graphs and efficient discrete combinatorial optimization tools. We tackle the same problem of semantic facade segmentation with the neural network approach.We attain accuracy figures that are on-par with the state-of-the-art in a fully automated pipeline.Starting from pixelwise classifications obtained via Convolutional Neural Networks (CNN). These are then structurally validated through a cascade of Restricted Boltzmann Machines (RBM) and Multi-Layer Perceptron (MLP) that regenerates the most likely layout. In the domain of architectural modeling, there is geometric multi-model fitting. We introduce a novel guided sampling algorithm based on Minimum Spanning Trees (MST), which surpasses other propagation techniques in terms of robustness to noise. We make a number of additional contributions such as measure of model deviation which captures variations among fitted models

Aston Publications Explorer

Spectral methods for multimodal data analysis

Author: Bronstein Michael
Kovnatsky Artiom
Publication venue
Publication date: 10/11/2016
Field of study

Spectral methods have proven themselves as an important and versatile tool in a wide range of problems in the fields of computer graphics, machine learning, pattern recognition, and computer vision, where many important problems boil down to constructing a Laplacian operator and finding a few of its eigenvalues and eigenfunctions. Classical examples include the computation of diffusion distances on manifolds in computer graphics, Laplacian eigenmaps, and spectral clustering in machine learning. In many cases, one has to deal with multiple data spaces simultaneously. For example, clustering multimedia data in machine learning applications involves various modalities or ``views'' (e.g., text and images), and finding correspondence between shapes in computer graphics problems is an operation performed between two or more modalities. In this thesis, we develop a generalization of spectral methods to deal with multiple data spaces and apply them to problems from the domains of computer graphics, machine learning, and image processing. Our main construction is based on simultaneous diagonalization of Laplacian operators. We present an efficient numerical technique for computing joint approximate eigenvectors of two or more Laplacians in challenging noisy scenarios, which also appears to be the first general non-smooth manifold optimization method. Finally, we use the relation between joint approximate diagonalizability and approximate commutativity of operators to define a structural similarity measure for images. We use this measure to perform structure-preserving color manipulations of a given image

RERO DOC Digital Library

Recommended from our members

Spreadsheet Tools for Data Analysts

Author: Barowy Daniel W
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/11/2017
Field of study

Spreadsheets are a natural fit for data analysis, combining a simple data storage and presentation layer with a programming language and basic debugging tools. Because spreadsheets are accessible and flexible, they are used by both novices and experts. Consequently, spreadsheets are hugely popular, with more than 750 million copies of Microsoft Excel installed worldwide. This popularity means that spreadsheets are the most popular programming language on the planet and the de facto tool for data analysis. Nevertheless, spreadsheets do not address a number of important tasks in a typical analyst\u27s pipeline, and their design frequently complicates them. This thesis describes three key challenges for analysts using spreadsheets. 1) Data wrangling is the process of converting or mapping data from a raw form into another form suitable for use with automated tools. 2) Data cleaning is the process of locating and correcting omitted or erroneous data. 3) Formula auditing is the process of finding and correcting spreadsheet program errors. These three tasks combined are estimated to occupy more than three quarters of a data analyst\u27s time. Furthermore, errors not caught during these steps have led to catastrophically bad decisions resulting in billions of dollars in losses. Advances in automated techniques for these tasks may result in dramatic savings in both time and money. Three novel programming language-based techniques were created to address these key tasks. The first, automatic layout transformation using examples, is a program synthesis-based technique that lets spreadsheet users perform data wrangling tasks automatically, at scale, and without programming. The second, data debugging, is technique for data cleaning that combines program analysis and statistical analysis to automatically find likely data errors. The third, spatio-structural program analysis unifies positional and dependence information and finds spreadsheet errors using a kind of anomaly analysis. Each technique was implemented as an end-user tool---FlaskRelate, CheckCell, and ExceLint respectively---in the form of a point-and-click plugin for Microsoft Excel. Our evaluation demonstrates that these techniques substantially improve user efficiency. Finally, because these tools build on each other in a complementary fashion, data analysts can run data wrangling, cleaning, and formula auditing tasks together in a single analysis pipeline

ScholarWorks@UMass Amherst