Search CORE

254 research outputs found

Information visualization for DNA microarray data analysis: A critical review

Author: Kuljis J
Liu X
Zhang L
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

Graphical representation may provide effective means of making sense of the complexity and sheer volume of data produced by DNA microarray experiments that monitor the expression patterns of thousands of genes simultaneously. The ability to use ldquoabstractrdquo graphical representation to draw attention to areas of interest, and more in-depth visualizations to answer focused questions, would enable biologists to move from a large amount of data to particular records they are interested in, and therefore, gain deeper insights in understanding the microarray experiment results. This paper starts by providing some background knowledge of microarray experiments, and then, explains how graphical representation can be applied in general to this problem domain, followed by exploring the role of visualization in gene expression data analysis. Having set the problem scene, the paper then examines various multivariate data visualization techniques that have been applied to microarray data analysis. These techniques are critically reviewed so that the strengths and weaknesses of each technique can be tabulated. Finally, several key problem areas as well as possible solutions to them are discussed as being a source for future work

Crossref

Kent Academic Repository

Brunel University Research Archive

Efficient and Accurate Construction of Genetic Linkage Maps from the Minimum Spanning Tree of a Graph

Author: A Ben-Dor
AH Sturtevant
B Liu
B Liu
C Gaspin
CT Falk
D Mester
D Weeks
DA Cartwright
DD Kosambi
DE Goldberg
ES Lander
F Alizadeh
F Alizadeh
F Glover
F Glover
H Iwata
H van Os
HV Os
J Jansen
JBS Haldane
KW Broman
Leonid Kruglyak
P Stam
Prasanna R. Bhat
S de Givry
S Kirkpatrick
S Lin
SE Lincoln
SR Wilson
Stefano Lonardi
T Schiex
Timothy J. Close
W Liu
Yonghui Wu
Z Sun
Publication venue: Public Library of Science
Publication date: 01/10/2008
Field of study

Genetic linkage maps are cornerstones of a wide spectrum of biotechnology applications, including map-assisted breeding, association genetics, and map-assisted gene cloning. During the past several years, the adoption of high-throughput genotyping technologies has been paralleled by a substantial increase in the density and diversity of genetic markers. New genetic mapping algorithms are needed in order to efficiently process these large datasets and accurately construct high-density genetic maps. In this paper, we introduce a novel algorithm to order markers on a genetic linkage map. Our method is based on a simple yet fundamental mathematical property that we prove under rather general assumptions. The validity of this property allows one to determine efficiently the correct order of markers by computing the minimum spanning tree of an associated graph. Our empirical studies obtained on genotyping data for three mapping populations of barley (Hordeum vulgare), as well as extensive simulations on synthetic data, show that our algorithm consistently outperforms the best available methods in the literature, particularly when the input data are noisy or incomplete. The software implementing our algorithm is available in the public domain as a web tool under the name MSTmap

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Ensemble Clustering for Biological Datasets

Author: Harun Pirim
Şadi Evren Şeker
Publication venue: 'IntechOpen'
Publication date: 28/11/2012
Field of study

IntechOpen

Clustering Algorithms: Their Application to Gene Expression Data

Author: Agrawal R.
Alizadeh A.A.
Bandyopadhyay S.
Bandyopadhyay S.
Bezdek J.C.
Bezdek J.C.
Bezdek† J.C.
Bhargavi M.S.
Blatt M.
Bochkov Y.A.
Brunet J.P.
Bryan K.
Buitinck L.
Bunnik E.M.
Caliński T.
Chandrasekhar T.
Cheng Y.
Costa I.G.
Cover T.M.
D'haeseleer P.
Dave R.N.
Davies D.L.
De Morsier F.
Dempster A.P.
Dharmarajan A.
Dhillon I.S.
Divina F.
Do C.B.
Domany E.
Du Z.
Dunn† J.C.
Edla D.R.
Eisen M.B.
Ferguson T.S.
Frey B.J.
Fu L.
Fukuyama Y.
Galluccio L.
Gath I.
Getz G.
Gordon G.J.
Gu J.
Guha S.
Handhayani T.
Handl J.
Hatamlou A.
Heard N.A.
Heyer L.J.
Hinneburg A.
Hinneburg A.
Hu X.
Hubert L.J.
Jain A.K.
Jiang D.
Jiang H.
Joopudi S.
Kao Y.T.
Karmilasari S.W.
Karypis G.
Kaufman L.
Kerr G.
Kluger Y.
Kohonen T.
Kohonen T.
Krzanowski W.J.
Leone M.
Lu Y.
Lu Y.
Ma'sum M.A.
MacQueen J.
Madeira S.C.
Mann A.K.
Masciari E.
Maulik U.
Milligan G.W.
Mitra S.
Moon T.K.
Moore W.C.
Müllner D.
Nagpal A.
Nasser S.
Neal R.M.
Ng R.T.
Pakhira M.K.
Pal N.R.
Pedregosa F.
Pirim H.
Pitman J.
Prelić A.
Qin Z.S.
Raman S.
Rasmussen C.E.
Rezaee B.
Rezaee M.R.
Ruspini E.H.
Saha S.
Saha S.
Saha S.
Sathishkumar K.
Sheikholeslami G.
Sheng Q.
Sirinukunwattana K.
Sokal R.R.
Sun J.
Talaat A.M.
Tamayo P.
Tanay A.
Tang C.
Thalamuthu A.
Tibshirani R.
Wan M.
Wang L.
Wang W.
Williams G.
Wu J.
Wu K.L.
Wu S.
Xie X.L.
Xu R.
Xu Y.
Yu H.
Zhang D.
Zhang T.
Zhang Y.
Zhang Z.Y.
Zhao L.
Zhong C.
Zitnik M.
Řehůřek R.
Publication venue: 'SAGE Publications'
Publication date: 01/01/2016
Field of study

Gene expression data hide vital information required to understand the biological process that takes place in a particular organism in relation to its environment. Deciphering the hidden patterns in gene expression data proffers a prodigious preference to strengthen the understanding of functional genomics. The complexity of biological networks and the volume of genes present increase the challenges of comprehending and interpretation of the resulting mass of data, which consists of millions of measurements; these data also inhibit vagueness, imprecision, and noise. Therefore, the use of clustering techniques is a first step toward addressing these challenges, which is essential in the data mining process to reveal natural structures and iden-tify interesting patterns in the underlying data. The clustering of gene expression data has been proven to be useful in making known the natural structure inherent in gene expression data, understanding gene functions, cellular processes, and subtypes of cells, mining useful information from noisy data, and understanding gene regulation. The other benefit of clustering gene expression data is the identification of homology, which is very important in vaccine design. This review examines the various clustering algorithms applicable to the gene expression data in order to discover and provide useful knowledge of the appropriate clustering technique that will guarantee stability and high degree of accuracy in its analysis procedure

Covenant University Repository

Crossref

Directory of Open Access Journals

PubMed Central

Procrustes Analysis of Truncated Least Squares Multidimensional Scaling

Author: Boryczko Krzysztof
Dzwinel Witold
Kurdziel Marcin
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 30/01/2013
Field of study

Multidimensional Scaling (MDS) is an important class of techniques for embedding sets of patterns in Euclidean space. Most often it is used to visualize in mathbbR3 multidimensional data sets or data sets given by dissimilarity measures that are not distance metrics. Unfortunately, embedding n patterns with MDS involves processing O(n2) pairwise pattern dissimilarities, making MDS computationally demanding for large data sets. Especially in Least Squares MDS (LS-MDS) methods, that proceed by finding a minimum of a multimodal stress function, computational cost is a limiting factor. Several works therefore explored approximate MDS techniques that are less computationally expensive. These approximate methods were evaluated in terms of correlation between Euclidean distances in the embedding and the pattern dissimilarities or value of the stress function. We employ Procrustes Analysis to directly quantify differences between embeddings constructed with an approximate LS-MDS method and embeddings constructed with exact LS-MDS. We then compare our findings to the results of classical analysis, i.e. that based on stress value and correlation between Euclidean distances and pattern dissimilarities. Our results demonstrate that small changes in stress value or correlation coefficient can translate to large differences between embeddings. The differences can be attributed not only to the inevitable variability resulting from the multimodality of the stress function but also to the approximation errors. These results show that approximation may have larger impact on MDS than what was thus far revealed by analyses of stress value and correlation between Euclidean distances and pattern dissimilarities

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Computational approaches for interpreting scRNA-seq data.

Author: Andrews
Angerer
Angermueller
Bendall
Benito
Brennecke
Buenrostro
Buettner
Campbell
Campbell
Campbell
Chen
Cloney
Davis
Deng
Dey
Finak
Grün
Grün
Grün
Guo
Haghverdi
Ilicic
Islam
Johnson
Jones
Kalaitzis
Kar
Kharchenko
Kim
Kim
Kiselev
Klein
Korthauer
Lafon
Langfelder
Leek
Leek
Leng
Love
Love
Lönnberg
Lönnberg
Maaten
Macaulay
Macaulay
Macosko
Magwene
Mahata
Mao
Marco
Marinov
McCarthy
Moignard
Moroz
Navin
Pearson
Pierson
Proserpio
Qiu
Qiu
Raj
Reinius
Risso
Robinson
Sander
Setty
Shalek
Simpson
Smallwood
Stegle
Stegle
Strehl
Svensson
Tang
Trapnell
Trapnell
Vallejos
Vieira Braga
Wang
Ward
Welch
Xu
Xue
Yule
Zeisel
Ziegenhain
Publication venue: FEBS Lett
Publication date: 01/08/2017
Field of study

The recent developments in high-throughput single-cell RNA sequencing technology (scRNA-seq) have enabled the generation of vast amounts of transcriptomic data at cellular resolution. With these advances come new modes of data analysis, building on high-dimensional data mining techniques. Here, we consider biological questions for which scRNA-seq data is used, both at a cell and gene level, and describe tools available for these types of analyses. This is an exciting and rapidly evolving field, where clustering, pseudotime inference, branching inference and gene-level analyses are particularly informative areas of computational analysis

Crossref

Apollo (Cambridge)

A distributed topology control technique for low interference and energy efficiency in wireless sensor networks

Author: Chiwewe Tapiwa Moses
Publication venue: 'University of Pretoria - Department of Philosophy'
Publication date: 24/02/2011
Field of study

Wireless sensor networks are used in several multi-disciplinary areas covering a wide variety of applications. They provide distributed computing, sensing and communication in a powerful integration of capabilities. They have great long-term economic potential and have the ability to transform our lives. At the same time however, they pose several challenges – mostly as a result of their random deployment and non-renewable energy sources.Among the most important issues in wireless sensor networks are energy efficiency and radio interference. Topology control plays an important role in the design of wireless ad hoc and sensor networks; it is capable of constructing networks that have desirable characteristics such as sparser connectivity, lower transmission power and a smaller node degree.In this research a distributed topology control technique is presented that enhances energy efficiency and reduces radio interference in wireless sensor networks. Each node in the network makes local decisions about its transmission power and the culmination of these local decisions produces a network topology that preserves global connectivity. The topology that is produced consists of a planar graph that is a power spanner, it has lower node degrees and can be constructed using local information. The network lifetime is increased by reducing transmission power and the use of low node degrees reduces traffic interference. The approach to topology control that is presented in this document has an advantage over previously developed approaches in that it focuses not only on reducing either energy consumption or radio interference, but on reducing both of these obstacles. Results are presented of simulations that demonstrate improvements in performance. AFRIKAANS : Draadlose sensor netwerke word gebruik in verskeie multi-dissiplinêre areas wat 'n wye verskeidenheid toepassings dek. Hulle voorsien verspreide berekening, bespeuring en kommunikasie in 'n kragtige integrate van vermoëns. Hulle het goeie langtermyn ekonomiese potentiaal en die vermoë om ons lewens te herskep. Terselfdertyd lewer dit egter verskeie uitdagings op as gevolg van hul lukrake ontplooiing en nie-hernubare energie bronne. Van die belangrikste kwessies in draadlose sensor netwerke is energie-doeltreffendheid en radiosteuring. Topologie-beheer speel 'n belangrike rol in die ontwerp van draadlose informele netwerke en sensor netwerke en dit is geskik om netwerke aan te bring wat gewenste eienskappe het soos verspreide koppeling, laer transmissiekrag en kleiner nodus graad.In hierdie ondersoek word 'n verspreide topologie beheertegniek voorgelê wat energie-doeltreffendheid verhoog en radiosteuring verminder in draadlose sensor netwerke. Elke nodus in die netwerk maak lokale besluite oor sy transmissiekrag en die hoogtepunt van hierdie lokale besluite lewer 'n netwerk-topologie op wat globale verbintenis behou.Die topologie wat gelewer word is 'n tweedimensionele grafiek en 'n kragsleutel; dit het laer nodus grade en kan gebou word met lokale inligting. Die netwerk-leeftyd word vermeerder deur transmissiekrag te verminder en verkeer-steuring word verminder deur lae nodus grade. Die benadering tot topologie-beheer wat voorgelê word in hierdie skrif het 'n voordeel oor benaderings wat vroeër ontwikkel is omdat dit nie net op die vermindering van net energie verbruik of net radiosteuring fokus nie, maar op albei. Resultate van simulasies word voorgelê wat die verbetering in werkverrigting demonstreer.Dissertation (MEng)--University of Pretoria, 2010.Electrical, Electronic and Computer Engineeringunrestricte

UPSpace at the University of Pretoria

CellTree: an R/bioconductor package to infer the hierarchical structure of cell populations from single-cell RNA-seq data

Author: David A. duVerle
Hiroyuki Aburatani
Koji Tsuda
Seitaro Nomura
Sohiya Yotsukura
Publication venue: Springer Nature
Publication date: 01/01/2016
Field of study

GO BP Terms for myoblast data. Full table of enriched GO BP terms for each topic in myoblast data. (PDF 36 kb

Springer - Publisher Connector

FigShare

Recommended from our members

Parallel swarm intelligence strategies for large-scale clustering based on MapReduce with application to epigenetics of aging

Author: Batouche M
Benmounah Z
Lio P
Meshoul S
Publication venue: Applied Soft Computing
Publication date: 01/01/2018
Field of study

Clustering is an important technique for data analysis and knowledge discovery. In the context of big data, it becomes a challenging issue due to the huge amount of data recently collected making conventional clustering algorithms inappropriate. The use of swarm intelligence algorithms has shown promising results when applied to data clustering of moderate size due to their decentralized and self-organized behavior. However, these algorithms exhibit limited capabilities when large data sets are involved. In this paper, we developed a decentralized distributed big data clustering solution using three swarm intelligence algorithms according to MapReduce framework. The developed framework allows cooperation between the three algorithms namely particle swarm optimization, ant colony optimization and artificial bees colony to achieve largely scalable data partitioning through a migration strategy. This latter reaps advantage of the combined exploration and exploitation capabilities of these algorithms to foster diversity. The framework is tested using amazon elastic map-reduce service (EMR) deploying up to 192 computer nodes and 30 gigabytes of data. Parallel metrics such as speed-up, size-up and scale-up are used to measure the elasticity and scalability of the framework. Our results are compared with their counterparts big data clustering results and show a significant improvement in terms of time and convergence to good quality solution. The developed model has been applied to epigenetics data clustering according to methylation features in CpG islands, gene body, and gene promoter in order to study the epigenetics impact on aging. Experimental results reveal that DNA-methylation changes slightly and not aberrantly with aging corroborating previous studies

Apollo (Cambridge)