Search CORE

15,430 research outputs found

Towards Data-Driven Large Scale Scientific Visualization and Exploration

Author: Ip Cheuk Yiu
Publication venue
Publication date: 01/01/2013
Field of study

Technological advances have enabled us to acquire extremely large datasets but it remains a challenge to store, process, and extract information from them. This dissertation builds upon recent advances in machine learning, visualization, and user interactions to facilitate exploration of large-scale scientific datasets. First, we use data-driven approaches to computationally identify regions of interest in the datasets. Second, we use visual presentation for effective user comprehension. Third, we provide interactions for human users to integrate domain knowledge and semantic information into this exploration process. Our research shows how to extract, visualize, and explore informative regions on very large 2D landscape images, 3D volumetric datasets, high-dimensional volumetric mouse brain datasets with thousands of spatially-mapped gene expression profiles, and geospatial trajectories that evolve over time. The contribution of this dissertation include: (1) We introduce a sliding-window saliency model that discovers regions of user interest in very large images; (2) We develop visual segmentation of intensity-gradient histograms to identify meaningful components from volumetric datasets; (3) We extract boundary surfaces from a wealth of volumetric gene expression mouse brain profiles to personalize the reference brain atlas; (4) We show how to efficiently cluster geospatial trajectories by mapping each sequence of locations to a high-dimensional point with the kernel distance framework. We aim to discover patterns, relationships, and anomalies that would lead to new scientific, engineering, and medical advances. This work represents one of the first steps toward better visual understanding of large-scale scientific data by combining machine learning and human intelligence

Digital Repository at the University of Maryland

Visualization methods for statistical analysis of microarray clusters

Author: Dirksen Nathaniel C
Hibbs Matthew A
Li Kai
Troyanskaya Olga G
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: The most common method of identifying groups of functionally related genes in microarray data is to apply a clustering algorithm. However, it is impossible to determine which clustering algorithm is most appropriate to apply, and it is difficult to verify the results of any algorithm due to the lack of a gold-standard. Appropriate data visualization tools can aid this analysis process, but existing visualization methods do not specifically address this issue. RESULTS: We present several visualization techniques that incorporate meaningful statistics that are noise-robust for the purpose of analyzing the results of clustering algorithms on microarray data. This includes a rank-based visualization method that is more robust to noise, a difference display method to aid assessments of cluster quality and detection of outliers, and a projection of high dimensional data into a three dimensional space in order to examine relationships between clusters. Our methods are interactive and are dynamically linked together for comprehensive analysis. Further, our approach applies to both protein and gene expression microarrays, and our architecture is scalable for use on both desktop/laptop screens and large-scale display devices. This methodology is implemented in GeneVAnD (Genomic Visual ANalysis of Datasets) and is available at . CONCLUSION: Incorporating relevant statistical information into data visualizations is key for analysis of large biological datasets, particularly because of high levels of noise and the lack of a gold-standard for comparisons. We developed several new visualization techniques and demonstrated their effectiveness for evaluating cluster quality and relationships between clusters

The Jackson Laboratory: The Mouseion at the JAXlibrary

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Integrating Data Clustering and Visualization for the Analysis of 3D Gene Expression Data

Author: B. Hamann
C.C. Fowlkes
C.L. Luengo Hendriks
D.W. Knowles
E.W. Bethel
G.H. Weber
H. Hagen
J. Malik
M.B. Eisen
M.D. Biggin
Min-Yu Huang
O. Rubel
S.V.E. Keranen
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Towards human-computer synergetic analysis of large-scale biological data

Author: A Andreeva
A Butte
A Perer
A Prlić
A Sturn
A Tagmount
AI Saeed
B Dalziel
B Shneiderman
Ben Dalziel
C Orengo
Daniel Asarnow
David Foote
DB Allison
DH Jeong
EH Baehrecke
F Valafar
H Hochheiser
Hui Yang
I Vessey
J Ernst
J Hou
J Hou
J Seo
JC Pinzon
Jonathan Stillman
K Aas
KS Teranishi
L Holm
M Kanehisa
M Osadchy
MA Hibbs
Matthew Gormley
MB Eisen
P Craig
P Malik
R Jain
R Singh
R Singh
R Singh
R Singh
R Tibshirani
Rahul Singh
RC Gentleman
SG Hart
Susan Fisher
T Hastie
TPS Chan
VD Winn
VG Tusher
W Tong
William Murad
X Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Identification of coherent patterns in gene expression data using an efficient biclustering algorithm and parallel coordinate visualization

Author: Cheng Kin-On
Law Ngai-Fong
Liew Alan Wee-Chung
Siu Wan-Chi
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background The DNA microarray technology allows the measurement of expression levels of thousands of genes under tens/hundreds of different conditions. In microarray data, genes with similar functions usually co-express under certain conditions only <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Thus, biclustering which clusters genes and conditions simultaneously is preferred over the traditional clustering technique in discovering these coherent genes. Various biclustering algorithms have been developed using different bicluster formulations. Unfortunately, many useful formulations result in NP-complete problems. In this article, we investigate an efficient method for identifying a popular type of biclusters called additive model. Furthermore, parallel coordinate (PC) plots are used for bicluster visualization and analysis. Results We develop a novel and efficient biclustering algorithm which can be regarded as a greedy version of an existing algorithm known as pCluster algorithm. By relaxing the constraint in homogeneity, the proposed algorithm has polynomial-time complexity in the worst case instead of exponential-time complexity as in the pCluster algorithm. Experiments on artificial datasets verify that our algorithm can identify both additive-related and multiplicative-related biclusters in the presence of overlap and noise. Biologically significant biclusters have been validated on the yeast cell-cycle expression dataset using Gene Ontology annotations. Comparative study shows that the proposed approach outperforms several existing biclustering algorithms. We also provide an interactive exploratory tool based on PC plot visualization for determining the parameters of our biclustering algorithm. Conclusion We have proposed a novel biclustering algorithm which works with PC plots for an interactive exploratory analysis of gene expression data. Experiments show that the biclustering algorithm is efficient and is capable of detecting co-regulated genes. The interactive analysis enables an optimum parameter determination in the biclustering algorithm so as to achieve the best result. In future, we will modify the proposed algorithm for other bicluster models such as the coherent evolution model.</p

The Hong Kong Polytechnic University Pao Yue-kong Library

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Orymold: ontology based gene expression data integration and analysis tool applied to rice

Author: Adrados Rosa
Adsuara José-Enrique
Espinosa Antonio
Maes Tamara
Mercadé Jaume
Puigdomènech Pere,
Segura Jordi
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Background: Integration and exploration of data obtained from genome wide monitoring technologies has become a major challenge for many bioinformaticists and biologists due to its heterogeneity and high dimensionality. A widely accepted approach to solve these issues has been the creation and use of controlled vocabularies (ontologies). Ontologies allow for the formalization of domain knowledge, which in turn enables generalization in the creation of querying interfaces as well as in the integration of heterogeneous data, providing both human and machine readable interfaces. Results: We designed and implemented a software tool that allows investigators to create their own semantic model of an organism and to use it to dynamically integrate expression data obtained from DNA microarrays and other probe based technologies. The software provides tools to use the semantic model to postulate and validate of hypotheses on the spatial and temporal expression and function of genes. In order to illustrate the software's use and features, we used it to build a semantic model of rice (Oryza sativa) and integrated experimental data into it. Conclusion: In this paper we describe the development and features of a flexible software application for dynamic gene expression data annotation, integration, and exploration called Orymold. Orymold is freely available for non-commercial users from http://www.oryzon.com/media/orymold.html webcit

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Diposit Digital de Documents de la UAB

DNA Microarray Data Analysis: A New Survey on Biclustering

Author: Ben Saber Haifa
ELLOUMI Mourad
Publication venue: International Journal for Computational Biology (IJCB)
Publication date: 03/04/2015
Field of study

There are subsets of genes that have similar behavior under subsets of conditions, so we say that they coexpress, but behave independently under other subsets of conditions. Discovering such coexpressions can be helpful to uncover genomic knowledge such as gene networks or gene interactions. That is why, it is of utmost importance to make a simultaneous clustering of genes and conditions to identify clusters of genes that are coexpressed under clusters of conditions. This type of clustering is called biclustering.Biclustering is an NP-hard problem. Consequently, heuristic algorithms are typically used to approximate this problem by finding suboptimal solutions. In this paper, we make a new survey on biclustering of gene expression data, also called microarray data

International Journal for Computational Biology (IJCB)