254 research outputs found

    Information visualization for DNA microarray data analysis: A critical review

    Get PDF
    Graphical representation may provide effective means of making sense of the complexity and sheer volume of data produced by DNA microarray experiments that monitor the expression patterns of thousands of genes simultaneously. The ability to use ldquoabstractrdquo graphical representation to draw attention to areas of interest, and more in-depth visualizations to answer focused questions, would enable biologists to move from a large amount of data to particular records they are interested in, and therefore, gain deeper insights in understanding the microarray experiment results. This paper starts by providing some background knowledge of microarray experiments, and then, explains how graphical representation can be applied in general to this problem domain, followed by exploring the role of visualization in gene expression data analysis. Having set the problem scene, the paper then examines various multivariate data visualization techniques that have been applied to microarray data analysis. These techniques are critically reviewed so that the strengths and weaknesses of each technique can be tabulated. Finally, several key problem areas as well as possible solutions to them are discussed as being a source for future work

    Efficient and Accurate Construction of Genetic Linkage Maps from the Minimum Spanning Tree of a Graph

    Get PDF
    Genetic linkage maps are cornerstones of a wide spectrum of biotechnology applications, including map-assisted breeding, association genetics, and map-assisted gene cloning. During the past several years, the adoption of high-throughput genotyping technologies has been paralleled by a substantial increase in the density and diversity of genetic markers. New genetic mapping algorithms are needed in order to efficiently process these large datasets and accurately construct high-density genetic maps. In this paper, we introduce a novel algorithm to order markers on a genetic linkage map. Our method is based on a simple yet fundamental mathematical property that we prove under rather general assumptions. The validity of this property allows one to determine efficiently the correct order of markers by computing the minimum spanning tree of an associated graph. Our empirical studies obtained on genotyping data for three mapping populations of barley (Hordeum vulgare), as well as extensive simulations on synthetic data, show that our algorithm consistently outperforms the best available methods in the literature, particularly when the input data are noisy or incomplete. The software implementing our algorithm is available in the public domain as a web tool under the name MSTmap

    Ensemble Clustering for Biological Datasets

    Get PDF

    Clustering Algorithms: Their Application to Gene Expression Data

    Get PDF
    Gene expression data hide vital information required to understand the biological process that takes place in a particular organism in relation to its environment. Deciphering the hidden patterns in gene expression data proffers a prodigious preference to strengthen the understanding of functional genomics. The complexity of biological networks and the volume of genes present increase the challenges of comprehending and interpretation of the resulting mass of data, which consists of millions of measurements; these data also inhibit vagueness, imprecision, and noise. Therefore, the use of clustering techniques is a first step toward addressing these challenges, which is essential in the data mining process to reveal natural structures and iden-tify interesting patterns in the underlying data. The clustering of gene expression data has been proven to be useful in making known the natural structure inherent in gene expression data, understanding gene functions, cellular processes, and subtypes of cells, mining useful information from noisy data, and understanding gene regulation. The other benefit of clustering gene expression data is the identification of homology, which is very important in vaccine design. This review examines the various clustering algorithms applicable to the gene expression data in order to discover and provide useful knowledge of the appropriate clustering technique that will guarantee stability and high degree of accuracy in its analysis procedure

    Procrustes Analysis of Truncated Least Squares Multidimensional Scaling

    Get PDF
    Multidimensional Scaling (MDS) is an important class of techniques for embedding sets of patterns in Euclidean space. Most often it is used to visualize in mathbbR3 multidimensional data sets or data sets given by dissimilarity measures that are not distance metrics. Unfortunately, embedding n patterns with MDS involves processing O(n2) pairwise pattern dissimilarities, making MDS computationally demanding for large data sets. Especially in Least Squares MDS (LS-MDS) methods, that proceed by finding a minimum of a multimodal stress function, computational cost is a limiting factor. Several works therefore explored approximate MDS techniques that are less computationally expensive. These approximate methods were evaluated in terms of correlation between Euclidean distances in the embedding and the pattern dissimilarities or value of the stress function. We employ Procrustes Analysis to directly quantify differences between embeddings constructed with an approximate LS-MDS method and embeddings constructed with exact LS-MDS. We then compare our findings to the results of classical analysis, i.e. that based on stress value and correlation between Euclidean distances and pattern dissimilarities. Our results demonstrate that small changes in stress value or correlation coefficient can translate to large differences between embeddings. The differences can be attributed not only to the inevitable variability resulting from the multimodality of the stress function but also to the approximation errors. These results show that approximation may have larger impact on MDS than what was thus far revealed by analyses of stress value and correlation between Euclidean distances and pattern dissimilarities

    Computational approaches for interpreting scRNA-seq data.

    Get PDF
    The recent developments in high-throughput single-cell RNA sequencing technology (scRNA-seq) have enabled the generation of vast amounts of transcriptomic data at cellular resolution. With these advances come new modes of data analysis, building on high-dimensional data mining techniques. Here, we consider biological questions for which scRNA-seq data is used, both at a cell and gene level, and describe tools available for these types of analyses. This is an exciting and rapidly evolving field, where clustering, pseudotime inference, branching inference and gene-level analyses are particularly informative areas of computational analysis

    A distributed topology control technique for low interference and energy efficiency in wireless sensor networks

    Get PDF
    Wireless sensor networks are used in several multi-disciplinary areas covering a wide variety of applications. They provide distributed computing, sensing and communication in a powerful integration of capabilities. They have great long-term economic potential and have the ability to transform our lives. At the same time however, they pose several challenges – mostly as a result of their random deployment and non-renewable energy sources.Among the most important issues in wireless sensor networks are energy efficiency and radio interference. Topology control plays an important role in the design of wireless ad hoc and sensor networks; it is capable of constructing networks that have desirable characteristics such as sparser connectivity, lower transmission power and a smaller node degree.In this research a distributed topology control technique is presented that enhances energy efficiency and reduces radio interference in wireless sensor networks. Each node in the network makes local decisions about its transmission power and the culmination of these local decisions produces a network topology that preserves global connectivity. The topology that is produced consists of a planar graph that is a power spanner, it has lower node degrees and can be constructed using local information. The network lifetime is increased by reducing transmission power and the use of low node degrees reduces traffic interference. The approach to topology control that is presented in this document has an advantage over previously developed approaches in that it focuses not only on reducing either energy consumption or radio interference, but on reducing both of these obstacles. Results are presented of simulations that demonstrate improvements in performance. AFRIKAANS : Draadlose sensor netwerke word gebruik in verskeie multi-dissiplinĂȘre areas wat 'n wye verskeidenheid toepassings dek. Hulle voorsien verspreide berekening, bespeuring en kommunikasie in 'n kragtige integrate van vermoĂ«ns. Hulle het goeie langtermyn ekonomiese potentiaal en die vermoĂ« om ons lewens te herskep. Terselfdertyd lewer dit egter verskeie uitdagings op as gevolg van hul lukrake ontplooiing en nie-hernubare energie bronne. Van die belangrikste kwessies in draadlose sensor netwerke is energie-doeltreffendheid en radiosteuring. Topologie-beheer speel 'n belangrike rol in die ontwerp van draadlose informele netwerke en sensor netwerke en dit is geskik om netwerke aan te bring wat gewenste eienskappe het soos verspreide koppeling, laer transmissiekrag en kleiner nodus graad.In hierdie ondersoek word 'n verspreide topologie beheertegniek voorgelĂȘ wat energie-doeltreffendheid verhoog en radiosteuring verminder in draadlose sensor netwerke. Elke nodus in die netwerk maak lokale besluite oor sy transmissiekrag en die hoogtepunt van hierdie lokale besluite lewer 'n netwerk-topologie op wat globale verbintenis behou.Die topologie wat gelewer word is 'n tweedimensionele grafiek en 'n kragsleutel; dit het laer nodus grade en kan gebou word met lokale inligting. Die netwerk-leeftyd word vermeerder deur transmissiekrag te verminder en verkeer-steuring word verminder deur lae nodus grade. Die benadering tot topologie-beheer wat voorgelĂȘ word in hierdie skrif het 'n voordeel oor benaderings wat vroeĂ«r ontwikkel is omdat dit nie net op die vermindering van net energie verbruik of net radiosteuring fokus nie, maar op albei. Resultate van simulasies word voorgelĂȘ wat die verbetering in werkverrigting demonstreer.Dissertation (MEng)--University of Pretoria, 2010.Electrical, Electronic and Computer Engineeringunrestricte

    CellTree: an R/bioconductor package to infer the hierarchical structure of cell populations from single-cell RNA-seq data

    Get PDF
    GO BP Terms for myoblast data. Full table of enriched GO BP terms for each topic in myoblast data. (PDF 36 kb
    • 

    corecore