22 research outputs found

    Overlap Removal of Dimensionality Reduction Scatterplot Layouts

    Full text link
    Dimensionality Reduction (DR) scatterplot layouts have become a ubiquitous visualization tool for analyzing multidimensional data items with presence in different areas. Despite its popularity, scatterplots suffer from occlusion, especially when markers convey information, making it troublesome for users to estimate items' groups' sizes and, more importantly, potentially obfuscating critical items for the analysis under execution. Different strategies have been devised to address this issue, either producing overlap-free layouts, lacking the powerful capabilities of contemporary DR techniques in uncover interesting data patterns, or eliminating overlaps as a post-processing strategy. Despite the good results of post-processing techniques, the best methods typically expand or distort the scatterplot area, thus reducing markers' size (sometimes) to unreadable dimensions, defeating the purpose of removing overlaps. This paper presents a novel post-processing strategy to remove DR layouts' overlaps that faithfully preserves the original layout's characteristics and markers' sizes. We show that the proposed strategy surpasses the state-of-the-art in overlap removal through an extensive comparative evaluation considering multiple different metrics while it is 2 or 3 orders of magnitude faster for large datasets.Comment: 11 pages and 9 figure

    OpenEssayist: a supply and demand learning analytics tool for drafting academic essays

    Get PDF
    This paper focuses on the use of a natural language analytics engine to provide feedback to students when preparing an essay for summative assessment. OpenEssayist is a real-time learning analytics tool, which operates through the combination of a linguistic analysis engine that processes the text in the essay, and a web application that uses the output of the linguistic analysis engine to generate the feedback. We outline the system itself and present analysis of observed patterns of activity as a cohort of students engaged with the system for their module assignments. We report a significant positive correlation between the number of drafts submitted to the system and the grades awarded for the first assignment. We can also report that this cohort of students gained significantly higher overall grades than the students in the previous cohort, who had no access to OpenEssayist. As a system that is content free, OpenEssayist can be used to support students working in any domain that requires the writing of essays

    Semantic Search and Visual Exploration of Computational Notebooks

    Get PDF
    Code search is an important and frequent activity for developers using computational notebooks (e.g., Jupyter). The flexibility of notebooks brings challenges for effective code search, where classic search interfaces for traditional software code may be limited. In this thesis, we propose, NBSearch, a novel system that supports semantic code search in notebook collections and interactive visual exploration of search results. NBSearch leverages advanced machine learning models to enable natural language search queries and intuitive visualizations to present complicated intra- and inter-notebook relationships in the returned results. We developed NBSearch through an iterative participatory design process with two experts from a large software company. We evaluated the models with a series of experiments and the whole system with a controlled user study. The results indicate the feasibility of our analytical pipeline and the effectiveness of NBSearch to support code search in large notebook collections. As one important aspect of the future directions, the search quality of NBSearch was further improved by incorporating the impact of markdowns in notebooks, and its performance was evaluated by comparing to the original implementation

    Improved Approximation Algorithms for Box Contact Representations

    Full text link
    We study the following geometric representation problem: Given a graph whose vertices correspond to axis-aligned rectangles with fixed dimensions, arrange the rectangles without overlaps in the plane such that two rectangles touch if the graph contains an edge between them. This problem is called Contact Representation of Word Networks (Crown) since it formalizes the geometric problem behind drawing word clouds in which semantically related words are close to each other. Crown is known to be NP-hard, and there are approximation algorithms for certain graph classes for the optimization version, Max-Crown, in which realizing each desired adjacency yields a certain profit. We present the first O(1)-approximation algorithm for the general case, when the input is a complete weighted graph, and for the bipartite case. Since the subgraph of realized adjacencies is necessarily planar, we also consider several planar graph classes (namely stars, trees, outerplanar, and planar graphs), improving upon the known results. For some graph classes, we also describe improvements in the unweighted case, where each adjacency yields the same profit. Finally, we show that the problem is APX-complete on bipartite graphs of bounded maximum degree. © 2016, Springer Science+Business Media New York

    Beyond the Third Dimension:Visualizing High-Dimensional Data with Projections

    Get PDF
    Multidimensional projections are an increasingly popular technique for visualizing large datasets containing observations having tens or even hundreds of dimensions. Compared to other techniques such as parallel coordinates, tables, and scatterplot matrices, they support tasks such as finding groups of related observations and outliers in simpler, more effective, ways. The authors discuss here the advantages of multidimensional projections, how to compute them, and recent advances that enhance them by visual explanatory techniques, so as to make them efficient and effective instruments that should be part of the toolkit of any scientist interested in high-dimensional data exploration
    corecore