Search CORE

6,834 research outputs found

DNA meets the SVD

Author: Grindrod Peter
Higham Desmond J.
Kalna Gabriela
Spence Alistair
Stoyanov Zhivko
Vass J. Keith
Publication venue
Publication date: 01/01/2008
Field of study

This paper introduces an important area of computational cell biology where complex, publicly available genomic data is being examined by linear algebra methods, with the aim of revealing biological and medical insights

University of Strathclyde Institutional Repository

Edinburgh Research Explorer

Semantic distillation: a method for clustering objects by their contextual specificity

Author: AN Langville
AN Langville
Chris Godsil and Gordon Royle
CJ Rijsbergen van
DM Cvetković
F Fouss
I Yanai
J Mercer
J Shi
JC Bezdek
K Pearson
LA Zadeh
M Belkin
M Campanino
Miklós Rédei
MLD Chiara
MW Berry
N Aronszajn
P Baldi
P Gärdenfors
R Baeza-Yates
R Fan
R Homayouni
RR Coifman
S Vishveshwara
ST Wang
Sándor Dominich
Publication venue
Publication date: 01/01/2007
Field of study

Techniques for data-mining, latent semantic analysis, contextual search of databases, etc. have long ago been developed by computer scientists working on information retrieval (IR). Experimental scientists, from all disciplines, having to analyse large collections of raw experimental data (astronomical, physical, biological, etc.) have developed powerful methods for their statistical analysis and for clustering, categorising, and classifying objects. Finally, physicists have developed a theory of quantum measurement, unifying the logical, algebraic, and probabilistic aspects of queries into a single formalism. The purpose of this paper is twofold: first to show that when formulated at an abstract level, problems from IR, from statistical data analysis, and from physical measurement theories are very similar and hence can profitably be cross-fertilised, and, secondly, to propose a novel method of fuzzy hierarchical clustering, termed \textit{semantic distillation} -- strongly inspired from the theory of quantum measurement --, we developed to analyse raw data coming from various types of experiments on DNA arrays. We illustrate the method by analysing DNA arrays experiments and clustering the genes of the array according to their specificity.Comment: Accepted for publication in Studies in Computational Intelligence, Springer-Verla

arXiv.org e-Print Archive

CiteSeerX

Crossref

HAL-Rennes 1

Noise and nonlinearities in high-throughput data

Author: Bagnoli F
Bagnoli F
Franco Bagnoli
Koukolíková-Nicola Z
Minka T
Nguyen V-A Nicola-Koulikova Z Bagnoli F Lió P Ho Tu Bao Zhou Zhi-Hua
Pietro Lió
Rajan J J
Viet-Anh Nguyen
Zdena Koukolíková-Nicola
Publication venue: 'IOP Publishing'
Publication date: 05/01/2010
Field of study

High-throughput data analyses are becoming common in biology, communications, economics and sociology. The vast amounts of data are usually represented in the form of matrices and can be considered as knowledge networks. Spectra-based approaches have proved useful in extracting hidden information within such networks and for estimating missing data, but these methods are based essentially on linear assumptions. The physical models of matching, when applicable, often suggest non-linear mechanisms, that may sometimes be identified as noise. The use of non-linear models in data analysis, however, may require the introduction of many parameters, which lowers the statistical weight of the model. According to the quality of data, a simpler linear analysis may be more convenient than more complex approaches. In this paper, we show how a simple non-parametric Bayesian model may be used to explore the role of non-linearities and noise in synthetic and experimental data sets.Comment: 12 pages, 3 figure

arXiv.org e-Print Archive

Crossref

Techniques for clustering gene expression data

Author: Crane Martin
Doolan Padraig
Kerr Gráinne
Ruskin Heather J.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

Many clustering techniques have been proposed for the analysis of gene expression data obtained from microarray experiments. However, choice of suitable method(s) for a given experimental dataset is not straightforward. Common approaches do not translate well and fail to take account of the data profile. This review paper surveys state of the art applications which recognises these limitations and implements procedures to overcome them. It provides a framework for the evaluation of clustering in gene expression analyses. The nature of microarray data is discussed briefly. Selected examples are presented for the clustering methods considered

CiteSeerX

Irish Universities

DCU Online Research Access Service

Stochastic Data Clustering

Author: Meyer Carl D.
Wessell Charles D.
Publication venue
Publication date: 01/01/2012
Field of study

In 1961 Herbert Simon and Albert Ando published the theory behind the long-term behavior of a dynamical system that can be described by a nearly uncoupled matrix. Over the past fifty years this theory has been used in a variety of contexts, including queueing theory, brain organization, and ecology. In all these applications, the structure of the system is known and the point of interest is the various stages the system passes through on its way to some long-term equilibrium. This paper looks at this problem from the other direction. That is, we develop a technique for using the evolution of the system to tell us about its initial structure, and we use this technique to develop a new algorithm for data clustering.Comment: 23 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

Gettysburg College

A temporal precedence based clustering method for gene expression microarray data

Author: Buchanan-Wollaston Vicky
Krishna Ritesh V.
Li Chang-Tsun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Background: Time-course microarray experiments can produce useful data which can help in understanding the underlying dynamics of the system. Clustering is an important stage in microarray data analysis where the data is grouped together according to certain characteristics. The majority of clustering techniques are based on distance or visual similarity measures which may not be suitable for clustering of temporal microarray data where the sequential nature of time is important. We present a Granger causality based technique to cluster temporal microarray gene expression data, which measures the interdependence between two time-series by statistically testing if one time-series can be used for forecasting the other time-series or not. Results: A gene-association matrix is constructed by testing temporal relationships between pairs of genes using the Granger causality test. The association matrix is further analyzed using a graph-theoretic technique to detect highly connected components representing interesting biological modules. We test our approach on synthesized datasets and real biological datasets obtained for Arabidopsis thaliana. We show the effectiveness of our approach by analyzing the results using the existing biological literature. We also report interesting structural properties of the association network commonly desired in any biological system. Conclusions: Our experiments on synthesized and real microarray datasets show that our approach produces encouraging results. The method is simple in implementation and is statistically traceable at each step. The method can produce sets of functionally related genes which can be further used for reverse-engineering of gene circuits

Deakin Research Online

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Warwick Research Archives Portal Repository

Importance of Similarity Measure in Gene Expression Data-A Survey

Author: Tunga Arundhathi
Publication venue: Auricle Global Society of Education and Research
Publication date: 31/12/2017
Field of study

The usage of data mining techniques in research fields of computational biology include gene finding, genome assembly , prediction of gene expression etc, are very promising because the large amount of data is involved in these research fields. These techniques aims that to disclose the unknown knowledge and relationships. Different data sources are available one such as DNA Micro Array is the technology which enables the researchers to investigate and address issues which are non traceable. DNA Micro Array experiments generates thousands of gene expression measurements and provide a simple way for collecting huge amounts of data in short time. Micro array data analysis allows identifying the most relevant genes for a target disease and group of genes with similar patterns under different experimental conditions.Clustering methods are widely used on gene expression data to categorize genes with similar expression profiles. The goal of clustering in micro array technology is to group genes or experiments into clusters according to a similarity measure. In this paper we introduce the concept of micro Array technology, clustering on gene expression data and survey on similarity measure. Finally we conclude this paper promising that similarity measure plays an important role on gene expression data while using one of the data mining techniques is clustering

International Journal on Future Revolution in Computer Science & Communication Engineering