4 research outputs found

    Graph-level operations: A high-level interface for graph visualization technique specification

    Get PDF
    More and more the world is being described as graphs---as connections between people, places, and ideas---since they provide a richer model than simply understanding each item in isolation. In order to help analysts understand these graphs, researchers have developed and studied a large number of graph visualization techniques. This variety of techniques presents solutions to a breadth of graph analysis tasks, but it introduces a new issue: complexity. The variety introduces both the complexity of comparing techniques in an objective way and the engineering complexity of implementing so many techniques. In this thesis, I present graph-level operations models (or GLO models) as an elegant solution to these challenges. A GLO model consists of a model of visual elements and a set of functions (GLOs) that manipulate those elements. I introduce GLOv1 and GLOv2, GLO models derived from six hand-picked graph visualization techniques and twenty-nine techniques derived from a review of 430 graph visualization publications, respectively. I show how to use GLOs to define graph visualization techniques, including a model's original seed techniques as well as novel techniques. I demonstrate the analysis potential of the GLO model by clustering the twenty-nine seed techniques using two different GLO-based schemes. Finally, I demonstrate the practical engineering potential of the model through an open-source Javascript implementation (GLO.js) and two applications built atop the implementation for exploring a graph and discovering novel techniques using GLOs (GLO-STIX and GLO-CLI).Ph.D

    Iterative Visual Analytics and its Applications in Bioinformatics

    Get PDF
    Indiana University-Purdue University Indianapolis (IUPUI)You, Qian. Ph.D., Purdue University, December, 2010. Iterative Visual Analytics and its Applications in Bioinformatics. Major Professors: Shiaofen Fang and Luo Si. Visual Analytics is a new and developing field that addresses the challenges of knowledge discoveries from the massive amount of available data. It facilitates humans‘ reasoning capabilities with interactive visual interfaces for exploratory data analysis tasks, where automatic data mining methods fall short due to the lack of the pre-defined objective functions. Analyzing the large volume of data sets for biological discoveries raises similar challenges. The domain knowledge of biologists and bioinformaticians is critical in the hypothesis-driven discovery tasks. Yet developing visual analytics frameworks for bioinformatic applications is still in its infancy. In this dissertation, we propose a general visual analytics framework – Iterative Visual Analytics (IVA) – to address some of the challenges in the current research. The framework consists of three progressive steps to explore data sets with the increased complexity: Terrain Surface Multi-dimensional Data Visualization, a new multi-dimensional technique that highlights the global patterns from the profile of a large scale network. It can lead users‘ attention to characteristic regions for discovering otherwise hidden knowledge; Correlative Multi-level Terrain Surface Visualization, a new visual platform that provides the overview and boosts the major signals of the numeric correlations among nodes in interconnected networks of different contexts. It enables users to gain critical insights and perform data analytical tasks in the context of multiple correlated networks; and the Iterative Visual Refinement Model, an innovative process that treats users‘ perceptions as the objective functions, and guides the users to form the optimal hypothesis by improving the desired visual patterns. It is a formalized model for interactive explorations to converge to optimal solutions. We also showcase our approach with bio-molecular data sets and demonstrate its effectiveness in several biomarker discovery applications

    IDENTITY RESOLUTION IN EMAIL COLLECTIONS

    Get PDF
    Access to historically significant email collections poses challenges that arise less often in personal collections. Most notably, people exploring a large collection of emails, in which they were not sending or receiving, may not be very familiar with the discussions that exist in this collection. They would not only need to focus on understanding the topical content of those discussions, but would also find it useful to understand who the people sending, receiving, or mentioned in these discussions were. In this dissertation, the problem of resolving personal identity in the context of large email collections is tackled. In such collections, a common name (e.g., John) might easily refer to any one of several hundred people; when one of these people was mentioned in an email, the question then arises: "who is that John?'' To "resolve identity'' of people in an email collection, two problems need to be solved: (1) modeling the identity of the participants in that collection, and (2) resolving name-mentions (that appeared in the body of the messages) to these identities. To tackle the first problem, a simple computational model of identity, that is built on extracting unambiguous references (e.g., full names from headers, or nicknames from free-text signatures) to people from the whole collection, is presented. To tackle the second problem, a generative probabilistic approach that leverages the model of identity to resolve mentions is presented. The approach is motivated by intuitions about the way people might refer to others in an email; it expands the context surrounding a mention in four directions: the message where the mention was observed, the thread that includes that message, topically-related messages, and messages sent or received by the original communicating parties. It relies on less ambiguous references (e.g., email addresses or full names) that are observed in some context of a given mention to rank potential referents of that mention. In order to jointly resolve all mentions in the collection, a parallel implementation is presented using the MapReduce distributed-programming framework. The implementation decomposes the structure of the resolution process into subcomponents that fit the MapReduce task model well. At the heart of that implementation, a parallel algorithm for efficient computation of pairwise document similarity in large collections is proposed as a general solution that can be used for scalable context expansion of all mentions and other applications as well. The resolution approach compares favorably with previously-reported techniques on small test collections (sets of mention-queries that were manually resolved beforehand) that were used to evaluate the task in the literature. However, the mention-queries in those collections, besides being relatively few in number, are limited in that all refer to people for whom a substantial amount of evidence would be expected to be available in the collection thus omitting the "long tail'' of the identity distribution for which less evidence is available. This motivated the development of a new test collection that now is the largest and best-balanced test collection available for the task. To build this collection, a user study was conducted that also provided some insight into the difficulty of the task and how time-consuming it is when humans perform it, and the reliability of their task performance. The study revealed that at least 80% of the 584 annotated mentions were resolvable to people who had sent or received email within the same collection. The new test collection was used to experimentally evaluate the resolution system. The results highlight the importance of the social context (that includes messages sent or received by the original communicating parties) when resolving mentions in email. Moreover, the results show that combining evidence from multiple types of contexts yields better resolution than what can be achieved using any individual context. The one-best selection is correct 74% of the time when tested on the full set of the mention-queries, and 51% of the time when tested on the mention-queries labeled as "hard'' by the annotators. Experiments run with iterative reformulation of the resolution algorithm resulted in modest gains only for the second iteration in the social context expansion
    corecore