3,497 research outputs found

    Concept Tree Based Clustering Visualization with Shaded Similarity Matrices

    Get PDF
    One of the problems with existing clustering methods is that the interpretation of clusters may be difficult. Two different approaches have been used to solve this problem: conceptual clustering in machine learning and clustering visualization in statistics and graphics. The purpose of this paper is to investigate the benefits of combining clustering visualization and conceptual clustering to obtain better cluster interpretations. In our research we have combined concept trees for conceptual clustering with shaded similarity matrices for visualization. Experimentation shows that the two interpretation approaches can complement each other to help us understand data better

    The State-of-the-Art of Set Visualization

    Get PDF
    Sets comprise a generic data model that has been used in a variety of data analysis problems. Such problems involve analysing and visualizing set relations between multiple sets defined over the same collection of elements. However, visualizing sets is a non-trivial problem due to the large number of possible relations between them. We provide a systematic overview of state-of-the-art techniques for visualizing different kinds of set relations. We classify these techniques into six main categories according to the visual representations they use and the tasks they support. We compare the categories to provide guidance for choosing an appropriate technique for a given problem. Finally, we identify challenges in this area that need further research and propose possible directions to address these challenges. Further resources on set visualization are available at http://www.setviz.net

    Revealing Microbial Responses to Environmental Dynamics: Developing Methods for Analysis and Visualization of Complex Sequence Datasets.

    Get PDF
    abstract: The greatest barrier to understanding how life interacts with its environment is the complexity in which biology operates. In this work, I present experimental designs, analysis methods, and visualization techniques to overcome the challenges of deciphering complex biological datasets. First, I examine an iron limitation transcriptome of Synechocystis sp. PCC 6803 using a new methodology. Until now, iron limitation in experiments of Synechocystis sp. PCC 6803 gene expression has been achieved through media chelation. Notably, chelation also reduces the bioavailability of other metals, whereas naturally occurring low iron settings likely result from a lack of iron influx and not as a result of chelation. The overall metabolic trends of previous studies are well-characterized but within those trends is significant variability in single gene expression responses. I compare previous transcriptomics analyses with our protocol that limits the addition of bioavailable iron to growth media to identify consistent gene expression signals resulting from iron limitation. Second, I describe a novel method of improving the reliability of centroid-linkage clustering results. The size and complexity of modern sequencing datasets often prohibit constructing distance matrices, which prevents the use of many common clustering algorithms. Centroid-linkage circumvents the need for a distance matrix, but has the adverse effect of producing input-order dependent results. In this chapter, I describe a method of cluster edge counting across iterated centroid-linkage results and reconstructing aggregate clusters from a ranked edge list without a distance matrix and input-order dependence. Finally, I introduce dendritic heat maps, a new figure type that visualizes heat map responses through expanding and contracting sequence clustering specificities. Heat maps are useful for comparing data across a range of possible states. However, data binning is sensitive to clustering cutoffs which are often arbitrarily introduced by researchers and can substantially change the heat map response of any single data point. With an understanding of how the architectural elements of dendrograms and heat maps affect data visualization, I have integrated their salient features to create a figure type aimed at viewing multiple levels of clustering cutoffs, allowing researchers to better understand the effects of environment on metabolism or phylogenetic lineages.Dissertation/ThesisChapter 2 Excel file of transcriptome responsesChapter 2 Perl scriptsChapter 3 Cluster Aggregation Perl scriptChapter 4 Example of the top-down clustering method used to construct dendritic heat mapsChapter 4Perl scripts and dendritic heat map imagesChapter 4 Perl scripts and dendritic heat map imagesDoctoral Dissertation Geological Sciences 201

    Mining Time-aware Actor-level Evolution Similarity for Link Prediction in Dynamic Network

    Get PDF
    Topological evolution over time in a dynamic network triggers both the addition and deletion of actors and the links among them. A dynamic network can be represented as a time series of network snapshots where each snapshot represents the state of the network over an interval of time (for example, a minute, hour or day). The duration of each snapshot denotes the temporal scale/sliding window of the dynamic network and all the links within the duration of the window are aggregated together irrespective of their order in time. The inherent trade-off in selecting the timescale in analysing dynamic networks is that choosing a short temporal window may lead to chaotic changes in network topology and measures (for example, the actors’ centrality measures and the average path length); however, choosing a long window may compromise the study and the investigation of network dynamics. Therefore, to facilitate the analysis and understand different patterns of actor-oriented evolutionary aspects, it is necessary to define an optimal window length (temporal duration) with which to sample a dynamic network. In addition to determining the optical temporal duration, another key task for understanding the dynamics of evolving networks is being able to predict the likelihood of future links among pairs of actors given the existing states of link structure at present time. This phenomenon is known as the link prediction problem in network science. Instead of considering a static state of a network where the associated topology does not change, dynamic link prediction attempts to predict emerging links by considering different types of historical/temporal information, for example the different types of temporal evolutions experienced by the actors in a dynamic network due to the topological evolution over time, known as actor dynamicities. Although there has been some success in developing various methodologies and metrics for the purpose of dynamic link prediction, mining actor-oriented evolutions to address this problem has received little attention from the research community. In addition to this, the existing methodologies were developed without considering the sampling window size of the dynamic network, even though the sampling duration has a large impact on mining the network dynamics of an evolutionary network. Therefore, although the principal focus of this thesis is link prediction in dynamic networks, the optimal sampling window determination was also considered

    BeSocratic: An Intelligent Tutoring System for the Recognition, Evaluation, and Analysis of Free-form Student Input

    Get PDF
    This dissertation describes a novel intelligent tutoring system, BeSocratic, which aims to help fill the gap between simple multiple-choice systems and free-response systems. BeSocratic focuses on targeting questions that are free-form in nature yet defined to the point which allows for automatic evaluation and analysis. The system includes a set of modules which provide instructors with tools to assess student performance. Beyond text boxes and multiple-choice questions, BeSocratic contains several modules that recognize, evaluate, provide feedback, and analyze student-drawn structures, including Euclidean graphs, chemistry molecules, computer science graphs, and simple drawings. Our system uses a visual, rule-based authoring system which enables the creation of activities for use within science, technology, engineering, and mathematics classrooms. BeSocratic records each action that students make within the system. Using a set of post-analysis tools, teachers have the ability to examine both individual and group performances. We accomplish this using hidden Markov model-based clustering techniques and visualizations. These visualizations can help teachers quickly identify common strategies and errors for large groups of students. Furthermore, analysis results can be used directly to improve activities through advanced detection of student errors and refined feedback. BeSocratic activities have been created and tested at several universities. We report specific results from several activities, and discuss how BeSocratic\u27s analysis tools are being used with data from other systems. We specifically detail two chemistry activities and one computer science activity: (1) an activity focused on improving mechanism use, (2) an activity which assesses student understanding of Gibbs energy, and (3) an activity which teaches students the fundamentals of splay trees. In addition to analyzing data collected from students within BeSocratic, we share our visualizations and results from analyzing data gathered with another educational system, PhET

    Combining Extended Table Lens and Treemap Techniques for Visualizing Tabular Data

    Get PDF

    Combining Extended Table Lens and Treemap Techniques for Visualizing Tabular Data

    Get PDF

    Combining Extended Table Lens and Treemap Techniques for Visualizing Tabular Data

    Get PDF
    corecore