Search CORE

5,398 research outputs found

Joint mapping of genes and conditions via multidimensional unfolding analysis

Author: Engelen Kristof
Heiser Willem J
Marchal Kathleen
Van Deun Katrijn
Van Mechelen Iven
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Microarray compendia profile the expression of genes in a number of experimental conditions. Such data compendia are useful not only to group genes and conditions based on their similarity in overall expression over profiles but also to gain information on more subtle relations between genes and conditions. Getting a clear visual overview of all these patterns in a single easy-to-grasp representation is a useful preliminary analysis step: We propose to use for this purpose an advanced exploratory method, called multidimensional unfolding. Results We present a novel algorithm for multidimensional unfolding that overcomes both general problems and problems that are specific for the analysis of gene expression data sets. Applying the algorithm to two publicly available microarray compendia illustrates its power as a tool for exploratory data analysis: The unfolding analysis of a first data set resulted in a two-dimensional representation which clearly reveals temporal regulation patterns for the genes and a meaningful structure for the time points, while the analysis of a second data set showed the algorithm's ability to go beyond a mere identification of those genes that discriminate between different patient or tissue types. Conclusion Multidimensional unfolding offers a useful tool for preliminary explorations of microarray data: By relying on an easy-to-grasp low-dimensional geometric framework, relations among genes, among conditions and between genes and conditions are simultaneously represented in an accessible way which may reveal interesting patterns in the data. An additional advantage of the method is that it can be applied to the raw data without necessitating the choice of suitable genewise transformations of the data.</p

Springer - Publisher Connector

Directory of Open Access Journals

Ghent University Academic Bibliography

PubMed Central

DIAL UCLouvain

Temporal patterns of gene expression via nonmetric multidimensional scaling analysis

Author: Cho
Eisen
Iyer
Johansson
Kanaya
Kasturi
Shmulevich
Spellman
Y. Oono
Y.-h. Taguchi
Publication venue
Publication date: 02/08/2003
Field of study

Motivation: Microarray experiments result in large scale data sets that require extensive mining and refining to extract useful information. We have been developing an efficient novel algorithm for nonmetric multidimensional scaling (nMDS) analysis for very large data sets as a maximally unsupervised data mining device. We wish to demonstrate its usefulness in the context of bioinformatics. In our motivation is also an aim to demonstrate that intrinsically nonlinear methods are generally advantageous in data mining. Results: The Pearson correlation distance measure is used to indicate the dissimilarity of the gene activities in transcriptional response of cell cycle-synchronized human fibroblasts to serum [Iyer et al., Science vol. 283, p83 (1999)]. These dissimilarity data have been analyzed with our nMDS algorithm to produce an almost circular arrangement of the genes. The temporal expression patterns of the genes rotate along this circular arrangement. If an appropriate preparation procedure may be applied to the original data set, linear methods such as the principal component analysis (PCA) could achieve reasonable results, but without data preprocessing linear methods such as PCA cannot achieve a useful picture. Furthermore, even with an appropriate data preprocessing, the outcomes of linear procedures are not as clearcut as those by nMDS without preprocessing.Comment: 11 pages, 6 figures + online only 2 color figures, submitted to Bioinformatic

arXiv.org e-Print Archive

Crossref

Random matrix analysis for gene interaction networks in cancer cells

Author: Kikkawa Ayumi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/07/2018
Field of study

Investigations of topological uniqueness of gene interaction networks in cancer cells are essential for understanding this disease. Based on the random matrix theory, we study the distribution of the nearest neighbor level spacings

P(s)

of interaction matrices for gene networks in human cancer cells. The interaction matrices are computed using the Cancer Network Galaxy (TCNG) database, which is a repository of gene interactions inferred by a Bayesian network model. 256 NCBI GEO entries regarding gene expressions in human cancer cells have been selected for the Bayesian network calculations in TCNG. We observe the Wigner distribution of

P(s)

when the gene networks are dense networks that have more than

\sim 38,000

edges. In the opposite case, when the networks have smaller numbers of edges, the distribution

P(s)

becomes the Poisson distribution. We investigate relevance of

P(s)

both to the size of the networks and to edge frequencies that manifest reliance of the inferred gene interactions.Comment: 22 pages, 7 figure

arXiv.org e-Print Archive

OIST Institutional Repository

Institutional Repositories DataBase (IRDB)

Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data

Author: A Antoniadis
A Butte
AL Boulesteix
B Nadler
B Schölkopf
B Schölkopf
C Chatfield
CC Chang
CCC Liu
Christian Ruckert
Christoph Bartenhagen
CL Nutt
D Geman
D Singh
DV Nguyen
H Hotelling
Hans-Ulrich Klein
HU Klein
I Del Giudice
IS Lim
IT Jolliffe
J Baek
J Misra
JB Tenenbaum
JI Powell
JJ Dai
K Dawson
KQ Weinberger
KQ Weinberger
KY Yeung
LJP Van der Maaten
LK Saul
M Belkin
M Belkin
M Mramor
M Vlachos
MA Hibbs
Martin Dugas
N Cristianini
N Pochet
O Chapelle
R Verhaak
R Xu
S Chao
S Lafon
SB Cho
ST Roweis
T Li
TF Cox
TJ Umpai
TR Golub
U Alon
VD Silva
X Lin
Xiaoyi Jiang
Y Su
Y Wang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Visualization of DNA microarray data in two or three dimensional spaces is an important exploratory analysis step in order to detect quality issues or to generate new hypotheses. Principal Component Analysis (PCA) is a widely used linear method to define the mapping between the high-dimensional data and its low-dimensional representation. During the last decade, many new nonlinear methods for dimension reduction have been proposed, but it is still unclear how well these methods capture the underlying structure of microarray gene expression data. In this study, we assessed the performance of the PCA approach and of six nonlinear dimension reduction methods, namely Kernel PCA, Locally Linear Embedding, Isomap, Diffusion Maps, Laplacian Eigenmaps and Maximum Variance Unfolding, in terms of visualization of microarray data. Results A systematic benchmark, consisting of Support Vector Machine classification, cluster validation and noise evaluations was applied to ten microarray and several simulated datasets. Significant differences between PCA and most of the nonlinear methods were observed in two and three dimensional target spaces. With an increasing number of dimensions and an increasing number of differentially expressed genes, all methods showed similar performance. PCA and Diffusion Maps responded less sensitive to noise than the other nonlinear methods. Conclusions Locally Linear Embedding and Isomap showed a superior performance on all datasets. In very low-dimensional representations and with few differentially expressed genes, these two methods preserve more of the underlying structure of the data than PCA, and thus are favorable alternatives for the visualization of microarray data.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Complete deconvolution of cellular mixtures based on linearity of transcriptional signatures

Author: Artyomov Maxim N.
Bambouskova Monika
Swain Amanda
Zaitsev Konstantin
Publication venue: Digital Commons@Becker
Publication date: 01/01/2019
Field of study

Digital Commons@Becker

Processes underlying the nutritional programming of embryonic development by iron deficiency in the rat

Author: Gambling Lorraine
Hayes Helen
Langley-Evans Simon C
McArdle Harry J
McMullen Sarah
Swali Angelina
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 26/10/2012
Field of study

Peer reviewedPublisher PD

Aberdeen University Research

Directory of Open Access Journals

PubMed Central

FigShare

ClgR regulation of chaperone and protease systems is essential for Mycobacterium tuberculosis parasitism of the macrophage

Author: Butler R.E.
Estorninho M.
Harders-Westerveen S.F.
Kierzek A.
Neyrolles O.
Smith H.E.
Stewart G.R.
Thole J.E.R.
Publication venue
Publication date: 01/01/2010
Field of study

Chaperone and protease systems play essential roles in cellular homeostasis and have vital functions in controlling the abundance of specific cellular proteins involved in processes such as transcription, replication, metabolism and virulence. Bacteria have evolved accurate regulatory systems to control the expression and function of chaperones and potentially destructive proteases. Here, we have used a combination of transcriptomics, proteomics and targeted mutagenesis to reveal that the clp gene regulator (ClgR) of Mycobacterium tuberculosis activates the transcription of at least ten genes, including four that encode protease systems (ClpP1/C, ClpP2/C, PtrB and HtrA-like protease Rv1043c) and three that encode chaperones (Acr2, ClpB and the chaperonin Rv3269). Thus, M. tuberculosis ClgR controls a larger network of protein homeostatic and regulatory systems than ClgR in any other bacterium studied to date. We demonstrate that ClgR-regulated transcriptional activation of these systems is essential for M. tuberculosis to replicate in macrophages. Furthermore, we observe that this defect is manifest early in infection, as M. tuberculosis lacking ClgR is deficient in the ability to control phagosome pH 1 h post-phagocytosis

Wageningen University & Research Publications

An Overview of the Use of Neural Networks for Data Mining Tasks

Author: Alberts B
Alpaydin E
Ando T
Blake CL
Bramer MA
Castanheira LG
Han J
Lu H
Mitchell M
Ni X
Quinlan RJ
Rumelhart DE
Shafer JC
Shendure J
Simić D
Stahl F
Steinwart I
Surjandari I
Wei JS
Widrow B
Witten IH
Zaslavsky B
Zhang D
Publication venue: 'Wiley'
Publication date: 01/01/2012
Field of study

In the recent years the area of data mining has experienced a considerable demand for technologies that extract knowledge from large and complex data sources. There is a substantial commercial interest as well as research investigations in the area that aim to develop new and improved approaches for extracting information, relationships, and patterns from datasets. Artificial Neural Networks (NN) are popular biologically inspired intelligent methodologies, whose classification, prediction and pattern recognition capabilities have been utilised successfully in many areas, including science, engineering, medicine, business, banking, telecommunication, and many other fields. This paper highlights from a data mining perspective the implementation of NN, using supervised and unsupervised learning, for pattern recognition, classification, prediction and cluster analysis, and focuses the discussion on their usage in bioinformatics and financial data analysis tasks

Central Archive at the University of Reading

Crossref

Portsmouth University Research Portal (Pure)

Bournemouth University Research Online