9 research outputs found

    Summarizing text to embed qualitative data into visualizations

    Full text link
    Qualitative data can be conveyed with strings of text. Fitting longer text into visualizations requires a) space to place the text inside the visualization; and b) appropriate text to fit the space available. For quantitative visualizations, space is available in area marks; or within visualization layouts where the marks have an implied space (e.g. bar charts). For qualitative visualizations, space is defined in common text layouts such as prose paragraphs. To fit text within these layouts is a function for emerging NLP capabilities such as summarization.Comment: 6 pages, 8 figures, accepted at NLVIZ 2022: Exploring Research Opportunities for Natural Language, Text, and Data Visualizatio

    Game Studies at Scale: Towards Facilitating Exploration of Game Corpora

    Get PDF
    Critically playing a game, and performing a close reading of a specific aspect of a game, are valid game analysis techniques. But these types of analyses don’t scale to the plethora of games available, and also neglect implementation aspects of the games which themselves are texts that can be analyzed. We argue that appropriate software tools can support research in game studies, allowing individual games to be read at the level of gameplay as well as the implementation level. Moreover, these tools permit analysis to scale in a similar fashion as distant reading allows for traditional texts, and be applied to an entire corpus of games. We illustrate these ideas using a corpus of games created using the Graphic Adventure Creator, a program first released in 1985 for a number of computing platforms. As a proof of concept, we have built a system called GrACIAS – the Graphic Adventure Creator Internal Analysis System – that we have used for both static and dynamic analysis of this corpus of games, effectively allowing them to be internally explored and “read.” Furthermore, our system is able to look for game solutions automatically and has solved over 60 game images to date, making the games accessible to researchers, but also people who may not be expert players or even able to understand the language the game uses

    PerCon: A Personal Digital Library for Heterogeneous Data Management and Analysis

    Get PDF
    Systems are needed to support access to and analysis of larger and more heterogeneous scientific datasets. Users need support in the location, organization, analysis, and interpretation of data to support their current activities with appropriate services and tools. We developed PerCon, a data management and analysis environment, to support such use. PerCon processes and integrates data gathered via queries to existing data providers to create a personal or a small group digital library of data. Users may then search, browse, visualize, annotate, and organize the data as they proceed with analysis and interpretation. Analysis and interpretation in PerCon takes place in a visual workspace in which multiple data visualizations and annotations are placed into spatial arrangements based on the current task. The system watches for patterns in the user’s data selection, exploration, and organization, then through mixed-initiative interaction assists users by suggesting potentially relevant data from unexplored data sources. In order to identify relevant data, PerCon builds up various precomputed feature tables of data objects including their metadata (e.g. similarities, distances) and a user interest model to infer the user interest or specific information need. In particular, probabilistic networks in PerCon model user interactions (i.e. event features) and predict the data type of greatest interest through network training. In turn, the most relevant data objects of interest in the inferred data type are identified through a weighted feature computation then recommended to the user. PerCon’s data location and analysis capabilities were evaluated in a controlled study with 24 users. The study participants were asked to locate and analyze heterogeneous weather and river data with and without the visual workspace and mixed-initiative interaction, respectively. Results indicate that the visual workspace facilitated information representation and aided in the identification of relationships between datasets. The system’s suggestions encouraged data exploration, leading participants to identify more evidences of correlation among data streams and more potential interactions among weather and river data

    Categorization and analysis of text in computer mediated communication archives using visualization

    No full text

    Visualizing Evaluative Language in Relation to Constructing Identity in English Editorials and Op-Eds

    Get PDF
    This thesis is concerned with the problem of managing complexity in Systemic Functional Linguistic (SFL) analyses of language, particularly at the discourse semantics level. To deal with this complexity, the thesis develops AppAnn, a suite of linguistic visualization techniques that are specifically designed to provide both synoptic and dynamic views on discourse semantic patterns in text and corpus. Moreover, AppAnn visualizations are illustrated in a series of explorations of identity in a corpus of editorials and op-eds about the bin Laden killing. The findings suggest that the intriguing intricacies of discourse semantic meanings can be successfully discerned and more readily understood through linguistic visualization. The findings also provide insightful implications for discourse analysis by contributing to our understanding of a number of underdeveloped concepts of SFL, including coupling, commitment, instantiation, affiliation and individuation

    Using data mining to repurpose German language corpora. An evaluation of data-driven analysis methods for corpus linguistics

    Get PDF
    A growing number of studies report interesting insights gained from existing data resources. Among those, there are analyses on textual data, giving reason to consider such methods for linguistics as well. However, the field of corpus linguistics usually works with purposefully collected, representative language samples that aim to answer only a limited set of research questions. This thesis aims to shed some light on the potentials of data-driven analysis based on machine learning and predictive modelling for corpus linguistic studies, investigating the possibility to repurpose existing German language corpora for linguistic inquiry by using methodologies developed for data science and computational linguistics. The study focuses on predictive modelling and machine-learning-based data mining and gives a detailed overview and evaluation of currently popular strategies and methods for analysing corpora with computational methods. After the thesis introduces strategies and methods that have already been used on language data, discusses how they can assist corpus linguistic analysis and refers to available toolkits and software as well as to state-of-the-art research and further references, the introduced methodological toolset is applied in two differently shaped corpus studies that utilize readily available corpora for German. The first study explores linguistic correlates of holistic text quality ratings on student essays, while the second deals with age-related language features in computer-mediated communication and interprets age prediction models to answer a set of research questions that are based on previous research in the field. While both studies give linguistic insights that integrate into the current understanding of the investigated phenomena in German language, they systematically test the methodological toolset introduced beforehand, allowing a detailed discussion of added values and remaining challenges of machine-learning-based data mining methods in corpus at the end of the thesis
    corecore