Search CORE

14 research outputs found

Large-Scale Evaluation of Topic Models and Dimensionality Reduction Methods for 2D Text Spatialization

Author: Atzberger Daniel
Cech Tim
Döllner Jürgen
Richter Rico
Scheibel Willy
Schreck Tobias
Trapp Matthias
Publication venue
Publication date: 17/07/2023
Field of study

Topic models are a class of unsupervised learning algorithms for detecting the semantic structure within a text corpus. Together with a subsequent dimensionality reduction algorithm, topic models can be used for deriving spatializations for text corpora as two-dimensional scatter plots, reflecting semantic similarity between the documents and supporting corpus analysis. Although the choice of the topic model, the dimensionality reduction, and their underlying hyperparameters significantly impact the resulting layout, it is unknown which particular combinations result in high-quality layouts with respect to accuracy and perception metrics. To investigate the effectiveness of topic models and dimensionality reduction methods for the spatialization of corpora as two-dimensional scatter plots (or basis for landscape-type visualizations), we present a large-scale, benchmark-based computational evaluation. Our evaluation consists of (1) a set of corpora, (2) a set of layout algorithms that are combinations of topic models and dimensionality reductions, and (3) quality metrics for quantifying the resulting layout. The corpora are given as document-term matrices, and each document is assigned to a thematic class. The chosen metrics quantify the preservation of local and global properties and the perceptual effectiveness of the two-dimensional scatter plots. By evaluating the benchmark on a computing cluster, we derived a multivariate dataset with over 45 000 individual layouts and corresponding quality metrics. Based on the results, we propose guidelines for the effective design of text spatializations that are based on topic models and dimensionality reductions. As a main result, we show that interpretable topic models are beneficial for capturing the structure of text corpora. We furthermore recommend the use of t-SNE as a subsequent dimensionality reduction.Comment: To be published at IEEE VIS 2023 conferenc

arXiv.org e-Print Archive

MapSets: Visualizing embedded and clustered graphs

Author: Efrat A.
Hu Y.
Kobourov S. G.
Pupyrev S.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

We describe MapSets, a method for visualizing embedded and clustered graphs. The proposed method relies on a theoretically sound geometric algorithm, which guarantees the contiguity and disjointness of the regions representing the clusters, and also optimizes the convexity of the regions. A fully functional implementation is available online and is used in a comparison with related earlier methods. © Springer-Verlag Berlin Heidelberg 2014

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin

MapSets: Visualizing embedded and clustered graphs

Author: Efrat A.
Hu Y.
Kobourov S. G.
Pupyrev S.
Publication venue: 'Journal of Graph Algorithms and Applications'
Publication date: 01/01/2014
Field of study

In addition to objects and relationships between them, groups or clusters of objects are an essential part of many real-world datasets: party affiliation in political networks, types of living organisms in the tree of life, movie genres in the internet movie database. In recent visualization methods, such group information is conveyed by explicit regions that enclose related elements. However, when in addition to fixed cluster membership, the input elements also have fixed positions in space (e.g., geo-referenced data), it becomes difficult to produce readable visualizations. In such fixed-clustering and fixed-embedding settings, some methods produce fragmented regions, while other produce contiguous (connected) regions that may contain overlaps even if the input clusters are disjoint. Both fragmented regions and unnecessary overlaps have a detrimental effect on the interpretation of the drawing. With this in mind, we propose MapSets: a visualization technique that combines the advantages of both methods, producing maps with non-fragmented and non-overlapping regions. The proposed method relies on a theoretically sound geometric algorithm which guarantees contiguity and disjointness of the regions, and also optimizes the convexity of the regions. A fully functional implementation is available in an online system and is used in a comparison with related earlier methods. © 2015, Brown University. All right reserved.National Science Foundation, NSF: 111597

Crossref

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin

Supporting Methodology Transfer in Visualization Research with Literature-Based Discovery and Visual Text Analytics

Author: Benito Santos Alejandro
Publication venue
Publication date: 01/01/2020
Field of study

[ES] La creciente especialización de la ciencia está motivando la rápida fragmentación de disciplinas bien establecidas en comunidades interdisciplinares. Esta descom- posición se puede observar en un tipo de investigación en visualización conocida como investigación de visualización dirigida por el problema. En ella, equipos de expertos en visualización y un dominio concreto, colaboran en un área específica de conocimiento como pueden ser las humanidades digitales, la bioinformática, la seguridad informática o las ciencias del deporte. Esta tesis propone una serie de métodos inspirados en avances recientes en el análisis automático de textos y la rep- resentación del conocimiento para promover la adecuada comunicación y transferen- cia de conocimiento entre estas comunidades. Los métodos obtenidos se combinaron en una interfaz de análisis visual de textos orientada al descubrimiento científico, GlassViz, que fue diseñada con estos objetivos en mente. La herramienta se probó por primera vez en el dominio de las humanidades digitales para explorar un corpus masivo de artículos de visualización de propósito general. GlassViz fue adaptada en un estudio posterior para que soportase diferentes fuentes de datos representativas de estas comunidades, mostrando evidencia de que el enfoque propuesto también es una alternativa válida para abordar el problema de la fragmentación en la investigación en visualización

Gestion del Repositorio Documental de la Universidad de Salamanca

NLP Driven Models for Automatically Generating Survey Articles for Scientific Topics.

Author: Jha Rahul Kumar
Publication venue
Publication date: 01/01/2015
Field of study

This thesis presents new methods that use natural language processing (NLP) driven models for summarizing research in scientific fields. Given a topic query in the form of a text string, we present methods for finding research articles relevant to the topic as well as summarization algorithms that use lexical and discourse information present in the text of these articles to generate coherent and readable extractive summaries of past research on the topic. In addition to summarizing prior research, good survey articles should also forecast future trends. With this motivation, we present work on forecasting future impact of scientific publications using NLP driven features.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113407/1/rahuljha_1.pd

Deep Blue Documents at the University of Michigan