254 research outputs found
Information visualization for DNA microarray data analysis: A critical review
Graphical representation may provide effective means of making sense of the complexity and sheer volume of data produced by DNA microarray experiments that monitor the expression patterns of thousands of genes simultaneously. The ability to use ldquoabstractrdquo graphical representation to draw attention to areas of interest, and more in-depth visualizations to answer focused questions, would enable biologists to move from a large amount of data to particular records they are interested in, and therefore, gain deeper insights in understanding the microarray experiment results. This paper starts by providing some background knowledge of microarray experiments, and then, explains how graphical representation can be applied in general to this problem domain, followed by exploring the role of visualization in gene expression data analysis. Having set the problem scene, the paper then examines various multivariate data visualization techniques that have been applied to microarray data analysis. These techniques are critically reviewed so that the strengths and weaknesses of each technique can be tabulated. Finally, several key problem areas as well as possible solutions to them are discussed as being a source for future work
Efficient and Accurate Construction of Genetic Linkage Maps from the Minimum Spanning Tree of a Graph
Genetic linkage maps are cornerstones of a wide spectrum of biotechnology applications, including map-assisted breeding, association genetics, and map-assisted gene cloning. During the past several years, the adoption of high-throughput genotyping technologies has been paralleled by a substantial increase in the density and diversity of genetic markers. New genetic mapping algorithms are needed in order to efficiently process these large datasets and accurately construct high-density genetic maps. In this paper, we introduce a novel algorithm to order markers on a genetic linkage map. Our method is based on a simple yet fundamental mathematical property that we prove under rather general assumptions. The validity of this property allows one to determine efficiently the correct order of markers by computing the minimum spanning tree of an associated graph. Our empirical studies obtained on genotyping data for three mapping populations of barley (Hordeum vulgare), as well as extensive simulations on synthetic data, show that our algorithm consistently outperforms the best available methods in the literature, particularly when the input data are noisy or incomplete. The software implementing our algorithm is available in the public domain as a web tool under the name MSTmap
Clustering Algorithms: Their Application to Gene Expression Data
Gene expression data hide vital information required to understand the biological process that takes place in a particular organism in relation to its environment. Deciphering the hidden patterns in gene expression data proffers a prodigious preference to strengthen the understanding of functional genomics. The complexity of biological networks and the volume of genes present increase the challenges of comprehending and interpretation of the resulting mass of data, which consists of millions of measurements; these data also inhibit vagueness, imprecision, and noise. Therefore, the use of clustering techniques is a first step toward addressing these challenges, which is essential in the data mining process to reveal natural structures and iden-tify interesting patterns in the underlying data. The clustering of gene expression data has been proven to be useful in making known the natural structure inherent in gene expression data, understanding gene functions, cellular processes, and subtypes of cells, mining useful information from noisy data, and understanding gene regulation. The other benefit of clustering gene expression data is the identification of homology, which is very important in vaccine design. This review examines the various clustering algorithms applicable to the gene expression data in order to discover and provide useful knowledge of the appropriate clustering technique that will guarantee stability and high degree of accuracy in its analysis procedure
Procrustes Analysis of Truncated Least Squares Multidimensional Scaling
Multidimensional Scaling (MDS) is an important class of techniques for embedding sets of patterns in Euclidean space. Most often it is used to visualize in mathbbR3 multidimensional data sets or data sets given by dissimilarity measures that are not distance metrics. Unfortunately, embedding n patterns with MDS involves processing O(n2) pairwise pattern dissimilarities, making MDS computationally demanding for large data sets. Especially in Least Squares MDS (LS-MDS) methods, that proceed by finding a minimum of a multimodal stress function, computational cost is a limiting factor. Several works therefore explored approximate MDS techniques that are less computationally expensive. These approximate methods were evaluated in terms of correlation between Euclidean distances in the embedding and the pattern dissimilarities or value of the stress function. We employ Procrustes Analysis to directly quantify differences between embeddings constructed with an approximate LS-MDS method and embeddings constructed with exact LS-MDS. We then compare our findings to the results of classical analysis, i.e. that based on stress value and correlation between Euclidean distances and pattern dissimilarities. Our results demonstrate that small changes in stress value or correlation coefficient can translate to large differences between embeddings. The differences can be attributed not only to the inevitable variability resulting from the multimodality of the stress function but also to the approximation errors. These results show that approximation may have larger impact on MDS than what was thus far revealed by analyses of stress value and correlation between Euclidean distances and pattern dissimilarities
Computational approaches for interpreting scRNA-seq data.
The recent developments in high-throughput single-cell RNA sequencing technology (scRNA-seq) have enabled the generation of vast amounts of transcriptomic data at cellular resolution. With these advances come new modes of data analysis, building on high-dimensional data mining techniques. Here, we consider biological questions for which scRNA-seq data is used, both at a cell and gene level, and describe tools available for these types of analyses. This is an exciting and rapidly evolving field, where clustering, pseudotime inference, branching inference and gene-level analyses are particularly informative areas of computational analysis
A distributed topology control technique for low interference and energy efficiency in wireless sensor networks
Wireless sensor networks are used in several multi-disciplinary areas covering a wide variety of applications. They provide distributed computing, sensing and communication in a powerful integration of capabilities. They have great long-term economic potential and have the ability to transform our lives. At the same time however, they pose several challenges â mostly as a result of their random deployment and non-renewable energy sources.Among the most important issues in wireless sensor networks are energy efficiency and radio interference. Topology control plays an important role in the design of wireless ad hoc and sensor networks; it is capable of constructing networks that have desirable characteristics such as sparser connectivity, lower transmission power and a smaller node degree.In this research a distributed topology control technique is presented that enhances energy efficiency and reduces radio interference in wireless sensor networks. Each node in the network makes local decisions about its transmission power and the culmination of these local decisions produces a network topology that preserves global connectivity. The topology that is produced consists of a planar graph that is a power spanner, it has lower node degrees and can be constructed using local information. The network lifetime is increased by reducing transmission power and the use of low node degrees reduces traffic interference. The approach to topology control that is presented in this document has an advantage over previously developed approaches in that it focuses not only on reducing either energy consumption or radio interference, but on reducing both of these obstacles. Results are presented of simulations that demonstrate improvements in performance. AFRIKAANS : Draadlose sensor netwerke word gebruik in verskeie multi-dissiplinĂȘre areas wat 'n wye verskeidenheid toepassings dek. Hulle voorsien verspreide berekening, bespeuring en kommunikasie in 'n kragtige integrate van vermoĂ«ns. Hulle het goeie langtermyn ekonomiese potentiaal en die vermoĂ« om ons lewens te herskep. Terselfdertyd lewer dit egter verskeie uitdagings op as gevolg van hul lukrake ontplooiing en nie-hernubare energie bronne. Van die belangrikste kwessies in draadlose sensor netwerke is energie-doeltreffendheid en radiosteuring. Topologie-beheer speel 'n belangrike rol in die ontwerp van draadlose informele netwerke en sensor netwerke en dit is geskik om netwerke aan te bring wat gewenste eienskappe het soos verspreide koppeling, laer transmissiekrag en kleiner nodus graad.In hierdie ondersoek word 'n verspreide topologie beheertegniek voorgelĂȘ wat energie-doeltreffendheid verhoog en radiosteuring verminder in draadlose sensor netwerke. Elke nodus in die netwerk maak lokale besluite oor sy transmissiekrag en die hoogtepunt van hierdie lokale besluite lewer 'n netwerk-topologie op wat globale verbintenis behou.Die topologie wat gelewer word is 'n tweedimensionele grafiek en 'n kragsleutel; dit het laer nodus grade en kan gebou word met lokale inligting. Die netwerk-leeftyd word vermeerder deur transmissiekrag te verminder en verkeer-steuring word verminder deur lae nodus grade. Die benadering tot topologie-beheer wat voorgelĂȘ word in hierdie skrif het 'n voordeel oor benaderings wat vroeĂ«r ontwikkel is omdat dit nie net op die vermindering van net energie verbruik of net radiosteuring fokus nie, maar op albei. Resultate van simulasies word voorgelĂȘ wat die verbetering in werkverrigting demonstreer.Dissertation (MEng)--University of Pretoria, 2010.Electrical, Electronic and Computer Engineeringunrestricte
CellTree: an R/bioconductor package to infer the hierarchical structure of cell populations from single-cell RNA-seq data
GO BP Terms for myoblast data. Full table of enriched GO BP terms for each topic in myoblast data. (PDF 36 kb
Recommended from our members
Parallel swarm intelligence strategies for large-scale clustering based on MapReduce with application to epigenetics of aging
Clustering is an important technique for data analysis and knowledge discovery. In the context of big data, it becomes a challenging issue due to the huge amount of data recently collected making conventional clustering algorithms inappropriate. The use of swarm intelligence algorithms has shown promising results when applied to data clustering of moderate size due to their decentralized and self-organized behavior. However, these algorithms exhibit limited capabilities when large data sets are involved. In this paper, we developed a decentralized distributed big data clustering solution using three swarm intelligence algorithms according to MapReduce framework. The developed framework allows cooperation between the three algorithms namely particle swarm optimization, ant colony optimization and artificial bees colony to achieve largely scalable data partitioning through a migration strategy. This latter reaps advantage of the combined exploration and exploitation capabilities of these algorithms to foster diversity. The framework is tested using amazon elastic map-reduce service (EMR) deploying up to 192 computer nodes and 30 gigabytes of data. Parallel metrics such as speed-up, size-up and scale-up are used to measure the elasticity and scalability of the framework. Our results are compared with their counterparts big data clustering results and show a significant improvement in terms of time and convergence to good quality solution. The developed model has been applied to epigenetics data clustering according to methylation features in CpG islands, gene body, and gene promoter in order to study the epigenetics impact on aging. Experimental results reveal that DNA-methylation changes slightly and not aberrantly with aging corroborating previous studies
- âŠ