Search CORE

117 research outputs found

Unsupervised reduction of random noise in complex data by a row-specific, sorted principal component-guided method

Author: E Segal
Fumiaki Katagiri
HA David
JC Redman
Joseph W Foley
M Sato
MB Eisen
ME Wall
O Alter
R Development Core Team
SK Kim
The Gene Ontology Consortium
TR Hughes
TZ Berardini
WS Cleveland
Y Benjamini
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Large biological data sets, such as expression profiles, benefit from reduction of random noise. Principal component (PC) analysis has been used for this purpose, but it tends to remove small features as well as random noise. Results We interpreted the PCs as a mere signal-rich coordinate system and sorted the squared PC-coordinates of each row in descending order. The sorted squared PC-coordinates were compared with the distribution of the ordered squared random noise, and PC-coordinates for insignificant contributions were treated as random noise and nullified. The processed data were transformed back to the initial coordinates as noise-reduced data. To increase the sensitivity of signal capture and reduce the effects of stochastic noise, this procedure was applied to multiple small subsets of rows randomly sampled from a large data set, and the results corresponding to each row of the data set from multiple subsets were averaged. We call this procedure Row-specific, Sorted PRincipal component-guided Noise Reduction (RSPR-NR). Robust performance of RSPR-NR, measured by noise reduction and retention of small features, was demonstrated using simulated data sets. Furthermore, when applied to an actual expression profile data set, RSPR-NR preferentially increased the correlations between genes that share the same Gene Ontology terms, strongly suggesting reduction of random noise in the data set. Conclusion RSPR-NR is a robust random noise reduction method that retains small features well. It should be useful in improving the quality of large biological data sets.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Missing value imputation for microarray gene expression data using histone acetylation information

Author: AA Alizadeh
AL Clayton
AP Gasch
C Rich
Caisheng He
CM Perou
D Schubeler
DE Koryakov
DJ Duggan
DK Pokholok
E Segal
GC Yuan
GCLY Yuan
H Kim
H Yoshimoto
HY Yu
I Takemasa
J Tuikkala
JA Orr
Jiang Wang
Jihua Feng
JJ Hu
JL DeRisi
JL Schafer
KJ Kim
KW McCool
L Mariño-Ramírez
L Narlikar
L Verdone
M Ouyang
MB Eisen
MD Meneghini
MPS Brown
MS Kobor
MSB Sehgal
O Alter
O Alter
O Troyanskaya
OJ Rando
P Johansson
P Spellman
Qian Xiang
RJA Little
S Chatterjee
S Oba
S Raychaudhuri
SA Armstrong
SC Kim
SK Kurdistani
TR Golub
TR O'Connor
TY Roh
X Feng
X Guo
Xianhua Dai
Yangyang Deng
Zhiming Dai
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background It is an important pre-processing step to accurately estimate missing values in microarray data, because complete datasets are required in numerous expression profile analysis in bioinformatics. Although several methods have been suggested, their performances are not satisfactory for datasets with high missing percentages. Results The paper explores the feasibility of doing missing value imputation with the help of gene regulatory mechanism. An imputation framework called histone acetylation information aided imputation method (HAIimpute method) is presented. It incorporates the histone acetylation information into the conventional KNN(<it>k</it>-nearest neighbor) and LLS(local least square) imputation algorithms for final prediction of the missing values. The experimental results indicated that the use of acetylation information can provide significant improvements in microarray imputation accuracy. The HAIimpute methods consistently improve the widely used methods such as KNN and LLS in terms of normalized root mean squared error (NRMSE). Meanwhile, the genes imputed by HAIimpute methods are more correlated with the original complete genes in terms of Pearson correlation coefficients. Furthermore, the proposed methods also outperform GOimpute, which is one of the existing related methods that use the functional similarity as the external information. Conclusion We demonstrated that the using of histone acetylation information could greatly improve the performance of the imputation especially at high missing percentages. This idea can be generalized to various imputation methods to facilitate the performance. Moreover, with more knowledge accumulated on gene regulatory mechanism in addition to histone acetylation, the performance of our approach can be further improved and verified.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Pre-Whaling Genetic Diversity and Population Ecology in Eastern Pacific Gray Whales: Insights from Ancient DNA and Stable Isotopes

Author: A Rambaut
AJ Drummond
AM Springer
AP Rooney
B Shapiro
BM Henn
CD Phillips
CH Townsend
CNK Anderson
CS Baker
D Aurioles
D Posada
D Rugh
DA Henderson
DL Swofford
DW Rice
DY Yang
E Axelsson
ERM Druffel
GA Watterson
Huelsbeck
I Barnes
II Krupnik
J Roman
J Rozas
JA Jackson
JD Darling
KA Hobson
L Excoffier
L Excoffier
L Nunney
L Nunney
M Clement
M de Bruyn
M Hasegawa
M Navascues
M Nei
M Voss
M Yoneda
MA Altabet
MA Beaumont
MA Suchard
Michael Knapp
ML Pinksky
ND Pyenson
P Clapham
P Hedrick
PJ Clapham
PR Wade
R Frankham
R LeDuc
RJ Francey
RK Burton
RR Hudson
RR Reeves
RS Waples
S Hideshima
S. Elizabeth Alter
SD Newsome
SD Newsome
SE Alter
SE Alter
SE Alter
SE Moore
Seth D. Newsome
SH Ambrose
SS Kienast
ST Kalinowski
Stephen R. Palumbi
SYW Ho
SYW Ho
SYW Ho
SYW Ho
SYW Ho
SYW Ho
SYW Ho
TE Steeves
TR Frasier
WL Perryman
Y Ramakrishnan
YL Chan
Publication venue: Public Library of Science
Publication date: 09/05/2012
Field of study

Commercial whaling decimated many whale populations, including the eastern Pacific gray whale, but little is known about how population dynamics or ecology differed prior to these removals. Of particular interest is the possibility of a large population decline prior to whaling, as such a decline could explain the ∼5-fold difference between genetic estimates of prior abundance and estimates based on historical records. We analyzed genetic (mitochondrial control region) and isotopic information from modern and prehistoric gray whales using serial coalescent simulations and Bayesian skyline analyses to test for a pre-whaling decline and to examine prehistoric genetic diversity, population dynamics and ecology. Simulations demonstrate that significant genetic differences observed between ancient and modern samples could be caused by a large, recent population bottleneck, roughly concurrent with commercial whaling. Stable isotopes show minimal differences between modern and ancient gray whale foraging ecology. Using rejection-based Approximate Bayesian Computation, we estimate the size of the population bottleneck at its minimum abundance and the pre-bottleneck abundance. Our results agree with previous genetic studies suggesting the historical size of the eastern gray whale population was roughly three to five times its current size

Public Library of Science (PLOS)

City University of New York

Crossref

Directory of Open Access Journals

PubMed Central

Classification of heterogeneous microarray data by maximum entropy kernel

Author: AI Su
AI Su
B Nilsson
B Rosner
B Scholköpf
B Schölkopf
DR Rhodes
GA Torunera
H Liu
H Lodhi
H Saigo
I Yanai
J Okutsu
JE Staunton
JM Boer
K Tsuda
K Tsuda
L Liu
LJ van't Veer
ME Wall
N Cristianini
O Alter
R Kondor
RK O'Donnell
RW Tothill
S Ramaswamy
SM Flechnera
T Kato
TR Golub
Tsuyoshi Kato
V Vapnik
Wataru Fujibuchi
Z Liu
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background There is a large amount of microarray data accumulating in public databases, providing various data waiting to be analyzed jointly. Powerful kernel-based methods are commonly used in microarray analyses with support vector machines (SVMs) to approach a wide range of classification problems. However, the standard vectorial data kernel family (linear, RBF, etc.) that takes vectorial data as input, often fails in prediction if the data come from different platforms or laboratories, due to the low gene overlaps or consistencies between the different datasets. Results We introduce a new type of kernel called maximum entropy (ME) kernel, which has no pre-defined function but is generated by kernel entropy maximization with sample distance matrices as constraints, into the field of SVM classification of microarray data. We assessed the performance of the ME kernel with three different data: heterogeneous kidney carcinoma, noise-introduced leukemia, and heterogeneous oral cavity carcinoma metastasis data. The results clearly show that the ME kernel is very robust for heterogeneous data containing missing values and high-noise, and gives higher prediction accuracies than the standard kernels, namely, linear, polynomial and RBF. Conclusion The results demonstrate its utility in effectively analyzing promiscuous microarray data of rare specimens, e.g., minor diseases or species, that present difficulty in compiling homogeneous data in a single laboratory.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Diagnostic and prognostic prediction using gene expression profiles in high-dimensional microarray data

Author: A Ben-Dor
A Rosenwald
AA Alizadeh
C Ambroise
CH Ooi
D Faraggi
E Wang
H Zhang
J Khan
JM Deutsch
LJ van't-Veer
M Bittner
M West
MA Shipp
MD Radmacher
O Alter
R Simon
R Simon
R Simon
R Simon
R Tibshirani
S Dudoit
S Kim
S Ramaswamy
TH Bo
TR Golub
Publication venue: Nature Publishing Group
Publication date
Field of study

Crossref

PubMed Central

Simultaneous Clustering of Multiple Gene Expression and Physical Interaction Datasets

Author: Adrian Vetta
AP Topchy
C Huttenhower
C Stark
CT Harbison
D Cheng
D Hanisch
E Segal
EA Elion
EH Davidson
EL Hong
EN Smith
Eric E. Schadt
G Chua
G Yvert
H Ge
H Wang
HW Mewes
I Lee
I Tirosh
I Ulitsky
J Ihmels
J Shi
JL DeRisi
JM Stuart
Jun Zhu
Jörg Stelling
K MacIsaac
L Pena-Castillo
M Ashburner
Manikandan Narayanan
MC Oldham
NM Luscombe
O Alter
P Langfelder
R Andersen
R Bonneau
R Guimera
R Kannan
R Sharan
R Sharan
S Arora
S Bergmann
SA Jelinsky
T Ideker
TR Golub
TR Hughes
U Brandes
U de Lichtenberg
Z Hu
Z Kutalik
Publication venue: Public Library of Science
Publication date: 01/04/2010
Field of study

Many genome-wide datasets are routinely generated to study different aspects of biological systems, but integrating them to obtain a coherent view of the underlying biology remains a challenge. We propose simultaneous clustering of multiple networks as a framework to integrate large-scale datasets on the interactions among and activities of cellular components. Specifically, we develop an algorithm JointCluster that finds sets of genes that cluster well in multiple networks of interest, such as coexpression networks summarizing correlations among the expression profiles of genes and physical networks describing protein-protein and protein-DNA interactions among genes or gene-products. Our algorithm provides an efficient solution to a well-defined problem of jointly clustering networks, using techniques that permit certain theoretical guarantees on the quality of the detected clustering relative to the optimal clustering. These guarantees coupled with an effective scaling heuristic and the flexibility to handle multiple heterogeneous networks make our method JointCluster an advance over earlier approaches. Simulation results showed JointCluster to be more robust than alternate methods in recovering clusters implanted in networks with high false positive rates. In systematic evaluation of JointCluster and some earlier approaches for combined analysis of the yeast physical network and two gene expression datasets under glucose and ethanol growth conditions, JointCluster discovers clusters that are more consistently enriched for various reference classes capturing different aspects of yeast biology or yield better coverage of the analysed genes. These robust clusters, which are supported across multiple genomic datasets and diverse reference classes, agree with known biology of yeast under these growth conditions, elucidate the genetic control of coordinated transcription, and enable functional predictions for a number of uncharacterized genes

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Global Considerations in Hierarchical Clustering Reveal Meaningful Patterns in Data

Author: A Torrente
AK Jain
CF Zorumski
D Boley
D Horn
D Horn
David Horn
G Getz
G Owsianik
H Chipman
J Handl
J Orlowski
JB Kruskal
Ji Zhu
LK Kaczmarek
M Berridge
M Rune
M Steinbach
MB Eisen
Michal Linial
MS Savaresi
N Kaplan
N Slonim
O Alter
O Sasson
P Cimiano
P D'Haeseleer
P Hansen
PJ Planet
Q Ren
R Apweiler
R Cangelosi
R Sharan
R Varshavsky
R Varshavsky
RO Duda
Roy Varshavsky
S Altschul
TK Landauer
TR Golub
Y Benjamini
Y Zhao
Publication venue: Public Library of Science
Publication date: 21/05/2008
Field of study

BACKGROUND: A hierarchy, characterized by tree-like relationships, is a natural method of organizing data in various domains. When considering an unsupervised machine learning routine, such as clustering, a bottom-up hierarchical (BU, agglomerative) algorithm is used as a default and is often the only method applied. METHODOLOGY/PRINCIPAL FINDINGS: We show that hierarchical clustering that involve global considerations, such as top-down (TD, divisive), or glocal (global-local) algorithms are better suited to reveal meaningful patterns in the data. This is demonstrated, by testing the correspondence between the results of several algorithms (TD, glocal and BU) and the correct annotations provided by experts. The correspondence was tested in multiple domains including gene expression experiments, stock trade records and functional protein families. The performance of each of the algorithms is evaluated by statistical criteria that are assigned to clusters (nodes of the hierarchy tree) based on expert-labeled data. Whereas TD algorithms perform better on global patterns, BU algorithms perform well and are advantageous when finer granularity of the data is sought. In addition, a novel TD algorithm that is based on genuine density of the data points is presented and is shown to outperform other divisive and agglomerative methods. Application of the algorithm to more than 500 protein sequences belonging to ion-channels illustrates the potential of the method for inferring overlooked functional annotations. ClustTree, a graphical Matlab toolbox for applying various hierarchical clustering algorithms and testing their quality is made available. CONCLUSIONS: Although currently rarely used, global approaches, in particular, TD or glocal algorithms, should be considered in the exploratory process of clustering. In general, applying unsupervised clustering methods can leverage the quality of manually-created mapping of proteins families. As demonstrated, it can also provide insights in erroneous and missed annotations

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Recommended from our members

Rules, Practices and Information Technology (IT): A Trifecta of Organizational Regulation

Author: Alter N
Bach K
Bhaskar R
Bourdieu P
Corbin J
Crozier M
Crozier M
Cyert RM
de Certeau M
Dewey J
François-Xavier de Vaujany
Gibson J
Giddens A
Gosden C
Hanseth O
Heidegger M
Hevner AR
Husserl E
Husserl E
Iivari J
Kalle Lyytinen
Lanzara GF
Latour B
Latour B
Latour B
Merleau-Ponty M
Merleau-Ponty M
Merleau-Ponty M
Merleau-Ponty M
Merleau-Ponty M
Merton RK
Mills AJ
Nicolini D
Panofsky E
Reynaud J-D
Reynaud JD
Reynaud JD
Schatzki TR
Simmel G
Stefan Haefliger
Taylor C
Twining W
Vladislav V. Fomin
von Wright GH
Weber M
Weick KE
Weick KE
Whitehead AN
Wittgenstein L
Wittgenstein L
Yates J
Zuboff S
Publication venue: 'Institute for Operations Research and the Management Sciences (INFORMS)'
Publication date: 01/01/2018
Field of study

As information technology (IT) based regulation has become critical and pervasive for contemporary organizing, Information Systems research turns mostly a deaf ear to the topic. Current explanations of IT-based regulation fit into received frameworks such as structuration theory, actor-network theory, or neo-institutional analyses but fail to recognize the unique capacities IT and related IT based regulatory practices offer as a powerful regulatory means. Any IT-based regulation system is made up of rules, practices and IT artifacts and their relationships. We propose this trifecta as a promising lens to study IT-based regulation in that it sensitizes scholars into how IT artifacts mediate rules and constitute regulatory processes embracing rules, capacities of IT endowed by the artifact, and organizational practices. We review the concepts of rules and IT-based regulation and identify two gaps in the current research on organizational regulation: 1)the critical role of sense-making as part of IT based regulation, and 2)the challenge of temporally coupling rules and their enactment during IT based regulation. To address these gaps we introduce the concept of regulatory episode as a unit of analysis for studying IT-based regulation. We also formulate a tentative research agenda for IT-based regulation that focuses on tensions triggered by the three key elements of the IT-based regulatory processes

City Research Online

Crossref

Microarray gene expression profiling and analysis in renal cell carcinoma

Author: AF Fergany
Alexandru Almasan
AN Young
Andrew A Novick
CV Denis
DJ Lockhart
DT Ross
EA Castilla
GH Gulob
H Moch
H Murer
H Suzuki
HC King
HS Sonmez
I Takemasa
J DeRisi
J Xu
JD Brenton
JM Boer
John Hissong
Joseph A DiDonato
JPT Higgins
KM Yamada
L Belov
Louis S Liou
M Ashburner
M Schena
M Takahashi
M Zhou
Marek Skacel
ME Wall
MKS Yeung
N Obermuller
NS Holter
NS Holter
O Alter
P Russo
PC Walsh
Provash Sadhukhan
R Santer
RA Heller
RJ Amato
S Li
S Ramaswamy
S Suer
Sandy D Der
T Akashi
T Ebert
T Shi
Ting Shi
TR Golub
VE Reuter
Y Kariya
YL Zhao
Zhong-Hui Duan
Publication venue: BioMed Central
Publication date: 01/01/2004
Field of study

BACKGROUND: Renal cell carcinoma (RCC) is the most common cancer in adult kidney. The accuracy of current diagnosis and prognosis of the disease and the effectiveness of the treatment for the disease are limited by the poor understanding of the disease at the molecular level. To better understand the genetics and biology of RCC, we profiled the expression of 7,129 genes in both clear cell RCC tissue and cell lines using oligonucleotide arrays. METHODS: Total RNAs isolated from renal cell tumors, adjacent normal tissue and metastatic RCC cell lines were hybridized to affymatrix HuFL oligonucleotide arrays. Genes were categorized into different functional groups based on the description of the Gene Ontology Consortium and analyzed based on the gene expression levels. Gene expression profiles of the tissue and cell line samples were visualized and classified by singular value decomposition. Reverse transcription polymerase chain reaction was performed to confirm the expression alterations of selected genes in RCC. RESULTS: Selected genes were annotated based on biological processes and clustered into functional groups. The expression levels of genes in each group were also analyzed. Seventy-four commonly differentially expressed genes with more than five-fold changes in RCC tissues were identified. The expression alterations of selected genes from these seventy-four genes were further verified using reverse transcription polymerase chain reaction (RT-PCR). Detailed comparison of gene expression patterns in RCC tissue and RCC cell lines shows significant differences between the two types of samples, but many important expression patterns were preserved. CONCLUSIONS: This is one of the initial studies that examine the functional ontology of a large number of genes in RCC. Extensive annotation, clustering and analysis of a large number of genes based on the gene functional ontology revealed many interesting gene expression patterns in RCC. Most notably, genes involved in cell adhesion were dominantly up-regulated whereas genes involved in transport were dominantly down-regulated. This study reveals significant gene expression alterations in key biological pathways and provides potential insights into understanding the molecular mechanism of renal cell carcinogenesis

Crossref

Boston University Institutional Repository (OpenBU)

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Connecting genes, coexpression modules, and molecular signatures to environmental stress phenotypes in plants

Abstract Background One of the eminent opportunities afforded by modern genomic technologies is the potential to provide a mechanistic understanding of the processes by which genetic change translates to phenotypic variation and the resultant appearance of distinct physiological traits. Indeed much progress has been made in this area, particularly in biomedicine where functional genomic information can be used to determine the physiological state (e.g., diagnosis) and predict phenotypic outcome (e.g., patient survival). Ecology currently lacks an analogous approach where genomic information can be used to diagnose the presence of a given physiological state (e.g., stress response) and then predict likely phenotypic outcomes (e.g., stress duration and tolerance, fitness). Results Here, we demonstrate that a compendium of genomic signatures can be used to classify the plant abiotic stress phenotype in <it>Arabidopsis </it>according to the architecture of the transcriptome, and then be linked with gene coexpression network analysis to determine the underlying genes governing the phenotypic response. Using this approach, we confirm the existence of known stress responsive pathways and marker genes, report a common abiotic stress responsive transcriptome and relate phenotypic classification to stress duration. Conclusion Linking genomic signatures to gene coexpression analysis provides a unique method of relating an observed plant phenotype to changes in gene expression that underlie that phenotype. Such information is critical to current and future investigations in plant biology and, in particular, to evolutionary ecology, where a mechanistic understanding of adaptive physiological responses to abiotic stress can provide researchers with a tool of great predictive value in understanding species and population level adaptation to climate change.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central