1,577 research outputs found

    Mapping the Space of Genomic Signatures

    Full text link
    We propose a computational method to measure and visualize interrelationships among any number of DNA sequences allowing, for example, the examination of hundreds or thousands of complete mitochondrial genomes. An "image distance" is computed for each pair of graphical representations of DNA sequences, and the distances are visualized as a Molecular Distance Map: Each point on the map represents a DNA sequence, and the spatial proximity between any two points reflects the degree of structural similarity between the corresponding sequences. The graphical representation of DNA sequences utilized, Chaos Game Representation (CGR), is genome- and species-specific and can thus act as a genomic signature. Consequently, Molecular Distance Maps could inform species identification, taxonomic classifications and, to a certain extent, evolutionary history. The image distance employed, Structural Dissimilarity Index (DSSIM), implicitly compares the occurrences of oligomers of length up to kk (herein k=9k=9) in DNA sequences. We computed DSSIM distances for more than 5 million pairs of complete mitochondrial genomes, and used Multi-Dimensional Scaling (MDS) to obtain Molecular Distance Maps that visually display the sequence relatedness in various subsets, at different taxonomic levels. This general-purpose method does not require DNA sequence homology and can thus be used to compare similar or vastly different DNA sequences, genomic or computer-generated, of the same or different lengths. We illustrate potential uses of this approach by applying it to several taxonomic subsets: phylum Vertebrata, (super)kingdom Protista, classes Amphibia-Insecta-Mammalia, class Amphibia, and order Primates. This analysis of an extensive dataset confirms that the oligomer composition of full mtDNA sequences can be a source of taxonomic information.Comment: 14 pages, 7 figures. arXiv admin note: substantial text overlap with arXiv:1307.375

    Molecular Distance Maps: An alignment-free computational tool for analyzing and visualizing DNA sequences\u27 interrelationships

    Get PDF
    In an attempt to identify and classify species based on genetic evidence, we propose a novel combination of methods to quantify and visualize the interrelationships between thousand of species. This is possible by using Chaos Game Representation (CGR) of DNA sequences to compute genomic signatures which we then compare by computing pairwise distances. In the last step, the original DNA sequences are embedded in a high dimensional space using Multi-Dimensional Scaling (MDS) before everything is projected on a Euclidean 3D space. To start with, we apply this method to a mitochondrial DNA dataset from NCBI containing over 3,000 species. The analysis shows that the oligomer composition of full mtDNA sequences can be a source of taxonomic information, suggesting that this method could be used for unclassified species and taxonomic controversies. Next, we test the hypothesis that CGR-based genomic signature is preserved along a species\u27 genome by comparing inter- and intra-genomic signatures of nuclear DNA sequences from six different organisms, one from each kingdom of life. We also compare six different distances and we assess their performance using statistical measures. Our results support the existence of a genomic signature for a species\u27 genome at the kingdom level. In addition, we test whether CGR-based genomic signatures originating only from nuclear DNA can be used to distinguish between closely-related species and we answer in the negative. To overcome this limitation, we propose the concept of ``composite signatures\u27\u27 which combine information from different types of DNA and we show that they can effectively distinguish all closely-related species under consideration. We also propose the concept of ``assembled signatures\u27\u27 which, among other advantages, do not require a long contiguous DNA sequence but can be built from smaller ones consisting of ~100-300 base pairs. Finally, we design an interactive webtool MoDMaps3D for building three-dimensional Molecular Distance Maps. The user can explore an already existing map or build his/her own using NCBI\u27s accession numbers as input. MoDMaps3D is platform independent, written in Javascript and can run in all major modern browsers

    Immersive analytics for oncology patient cohorts

    Get PDF
    This thesis proposes a novel interactive immersive analytics tool and methods to interrogate the cancer patient cohort in an immersive virtual environment, namely Virtual Reality to Observe Oncology data Models (VROOM). The overall objective is to develop an immersive analytics platform, which includes a data analytics pipeline from raw gene expression data to immersive visualisation on virtual and augmented reality platforms utilising a game engine. Unity3D has been used to implement the visualisation. Work in this thesis could provide oncologists and clinicians with an interactive visualisation and visual analytics platform that helps them to drive their analysis in treatment efficacy and achieve the goal of evidence-based personalised medicine. The thesis integrates the latest discovery and development in cancer patients’ prognoses, immersive technologies, machine learning, decision support system and interactive visualisation to form an immersive analytics platform of complex genomic data. For this thesis, the experimental paradigm that will be followed is in understanding transcriptomics in cancer samples. This thesis specifically investigates gene expression data to determine the biological similarity revealed by the patient's tumour samples' transcriptomic profiles revealing the active genes in different patients. In summary, the thesis contributes to i) a novel immersive analytics platform for patient cohort data interrogation in similarity space where the similarity space is based on the patient's biological and genomic similarity; ii) an effective immersive environment optimisation design based on the usability study of exocentric and egocentric visualisation, audio and sound design optimisation; iii) an integration of trusted and familiar 2D biomedical visual analytics methods into the immersive environment; iv) novel use of the game theory as the decision-making system engine to help the analytics process, and application of the optimal transport theory in missing data imputation to ensure the preservation of data distribution; and v) case studies to showcase the real-world application of the visualisation and its effectiveness

    Feature-driven Volume Visualization of Medical Imaging Data

    Get PDF
    Direct volume rendering (DVR) is a volume visualization technique that has been proved to be a very powerful tool in many scientific visualization domains. Diagnostic medical imaging is one such domain in which DVR provides new capabilities for the analysis of complex cases and improves the efficiency of image interpretation workflows. However, the full potential of DVR in the medical domain has not yet been realized. A major obstacle for a better integration of DVR in the medical domain is the time-consuming process to optimize the rendering parameters that are needed to generate diagnostically relevant visualizations in which the important features that are hidden in image volumes are clearly displayed, such as shape and spatial localization of tumors, its relationship with adjacent structures, and temporal changes in the tumors. In current workflows, clinicians must manually specify the transfer function (TF), view-point (camera), clipping planes, and other visual parameters. Another obstacle for the adoption of DVR to the medical domain is the ever increasing volume of imaging data. The advancement of imaging acquisition techniques has led to a rapid expansion in the size of the data, in the forms of higher resolutions, temporal imaging acquisition to track treatment responses over time, and an increase in the number of imaging modalities that are used for a single procedure. The manual specification of the rendering parameters under these circumstances is very challenging. This thesis proposes a set of innovative methods that visualize important features in multi-dimensional and multi-modality medical images by automatically or semi-automatically optimizing the rendering parameters. Our methods enable visualizations necessary for the diagnostic procedure in which 2D slice of interest (SOI) can be augmented with 3D anatomical contextual information to provide accurate spatial localization of 2D features in the SOI; the rendering parameters are automatically computed to guarantee the visibility of 3D features; and changes in 3D features can be tracked in temporal data under the constraint of consistent contextual information. We also present a method for the efficient computation of visibility histograms (VHs) using adaptive binning, which allows our optimal DVR to be automated and visualized in real-time. We evaluated our methods by producing visualizations for a variety of clinically relevant scenarios and imaging data sets. We also examined the computational performance of our methods for these scenarios

    Feature-driven Volume Visualization of Medical Imaging Data

    Get PDF
    Direct volume rendering (DVR) is a volume visualization technique that has been proved to be a very powerful tool in many scientific visualization domains. Diagnostic medical imaging is one such domain in which DVR provides new capabilities for the analysis of complex cases and improves the efficiency of image interpretation workflows. However, the full potential of DVR in the medical domain has not yet been realized. A major obstacle for a better integration of DVR in the medical domain is the time-consuming process to optimize the rendering parameters that are needed to generate diagnostically relevant visualizations in which the important features that are hidden in image volumes are clearly displayed, such as shape and spatial localization of tumors, its relationship with adjacent structures, and temporal changes in the tumors. In current workflows, clinicians must manually specify the transfer function (TF), view-point (camera), clipping planes, and other visual parameters. Another obstacle for the adoption of DVR to the medical domain is the ever increasing volume of imaging data. The advancement of imaging acquisition techniques has led to a rapid expansion in the size of the data, in the forms of higher resolutions, temporal imaging acquisition to track treatment responses over time, and an increase in the number of imaging modalities that are used for a single procedure. The manual specification of the rendering parameters under these circumstances is very challenging. This thesis proposes a set of innovative methods that visualize important features in multi-dimensional and multi-modality medical images by automatically or semi-automatically optimizing the rendering parameters. Our methods enable visualizations necessary for the diagnostic procedure in which 2D slice of interest (SOI) can be augmented with 3D anatomical contextual information to provide accurate spatial localization of 2D features in the SOI; the rendering parameters are automatically computed to guarantee the visibility of 3D features; and changes in 3D features can be tracked in temporal data under the constraint of consistent contextual information. We also present a method for the efficient computation of visibility histograms (VHs) using adaptive binning, which allows our optimal DVR to be automated and visualized in real-time. We evaluated our methods by producing visualizations for a variety of clinically relevant scenarios and imaging data sets. We also examined the computational performance of our methods for these scenarios

    Computational Statistics and Data Visualization

    Get PDF
    This book is the third volume of the Handbook of Computational Statistics and covers the field of Data Visualization. In line with the companion volumes, it contains a collection of chapters by experts in the field to present readers with an up-to-date and comprehensive overview of the state of the art. Data Visualization is an active area of application and research and this is a good time to gather together a summary of current knowledge. Graphic displays are often very effective at communicating information. They are also very often not effective at communicating information. Two important reasons for this state of affairs are that graphics can be produced with a few clicks of the mouse without any thought, and that the design of graphics is not taken seriously in many scientific textbooks. Some people seem to think that preparing good graphics is just a matter of common sense (in which case their common sense cannot be in good shape) and others believe that preparing graphics is a low-level task, not appropriate for scientific attention. This volume of the Handbook of Computational Statistics takes graphics for Data Visualization seriously.Data Visualization, Exploratory Graphics.

    Multimodal Biomedical Data Visualization: Enhancing Network, Clinical, and Image Data Depiction

    Get PDF
    In this dissertation, we present visual analytics tools for several biomedical applications. Our research spans three types of biomedical data: reaction networks, longitudinal multidimensional clinical data, and biomedical images. For each data type, we present intuitive visual representations and efficient data exploration methods to facilitate visual knowledge discovery. Rule-based simulation has been used for studying complex protein interactions. In a rule-based model, the relationships of interacting proteins can be represented as a network. Nevertheless, understanding and validating the intended behaviors in large network models are ineffective and error prone. We have developed a tool that first shows a network overview with concise visual representations and then shows relevant rule-specific details on demand. This strategy significantly improves visualization comprehensibility and disentangles the complex protein-protein relationships by showing them selectively alongside the global context of the network. Next, we present a tool for analyzing longitudinal multidimensional clinical datasets, that we developed for understanding Parkinson's disease progression. Detecting patterns involving multiple time-varying variables is especially challenging for clinical data. Conventional computational techniques, such as cluster analysis and dimension reduction, do not always generate interpretable, actionable results. Using our tool, users can select and compare patient subgroups by filtering patients with multiple symptoms simultaneously and interactively. Unlike conventional visualizations that use local features, many targets in biomedical images are characterized by high-level features. We present our research characterizing such high-level features through multiscale texture segmentation and deep-learning strategies. First, we present an efficient hierarchical texture segmentation approach that scales up well to gigapixel images to colorize electron microscopy (EM) images. This enhances visual comprehensibility of gigapixel EM images across a wide range of scales. Second, we use convolutional neural networks (CNNs) to automatically derive high-level features that distinguish cell states in live-cell imagery and voxel types in 3D EM volumes. In addition, we present a CNN-based 3D segmentation method for biomedical volume datasets with limited training samples. We use factorized convolutions and feature-level augmentations to improve model generalization and avoid overfitting
    • …
    corecore