Search CORE

124 research outputs found

Statistical analysis and significance testing of serial analysis of gene expression data using a Poisson mixture model

Author: A Dempster
AT Weeraratna
D Porter
D Porter
DA Porter
F Leisch
F van Ruissen
G Schwarz
GJ McLachlan
H Akaike
H Matsumura
HH Thygesen
J Lu
K Boon
KA Baggerly
KA Baggerly
M Cornelissen
R Development Core Team
R Edgar
RZ Vencio
S Lee
S Saha
Scott D Zuyderduyn
VA Kuznetsov
VE Velculescu
WN Venables
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Serial analysis of gene expression (SAGE) is used to obtain quantitative snapshots of the transcriptome. These profiles are count-based and are assumed to follow a Binomial or Poisson distribution. However, tag counts observed across multiple libraries (for example, one or more groups of biological replicates) have additional variance that cannot be accommodated by this assumption alone. Several models have been proposed to account for this effect, all of which utilize a continuous prior distribution to explain the excess variance. Here, a Poisson mixture model, which assumes excess variability arises from sampling a mixture of distinct components, is proposed and the merits of this model are discussed and evaluated. Results The goodness of fit of the Poisson mixture model on 15 sets of biological SAGE replicates is compared to the previously proposed hierarchical gamma-Poisson (negative binomial) model, and a substantial improvement is seen. In further support of the mixture model, there is observed: 1) an increase in the number of mixture components needed to fit the expression of tags representing more than one transcript; and 2) a tendency for components to cluster libraries into the same groups. A confidence score is presented that can identify tags that are differentially expressed between groups of SAGE libraries. Several examples where this test outperforms those previously proposed are highlighted. Conclusion The Poisson mixture model performs well as a) a method to represent SAGE data from biological replicates, and b) a basis to assign significance when testing for differential expression between multiple groups of replicates. Code for the R statistical software package is included to assist investigators in applying this model to their own data.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Bias correction and Bayesian analysis of aggregate counts in SAGE libraries

Author: A Gelman
Artin Armagan
C Romualdi
DV Lindley
E Pauws
H Jiang
H Matsumura
H Matsumura
HH Thygesen
J Lu
JS Morris
K Boon
KA Baggerly
KA Baggerly
MH Chen
Michael A Gilchrist
PAC 't Hoen
R Malig
Russell L Zaretzki
RZN Vencio
RZN Vencio
SF Arnold
VA Kuznetsov
VA Kuznetsov
VE Velculescu
VE Velculescu
William M Briggs
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Tag-based techniques, such as SAGE, are commonly used to sample the mRNA pool of an organism's transcriptome. Incomplete digestion during the tag formation process may allow for multiple tags to be generated from a given mRNA transcript. The probability of forming a tag varies with its relative location. As a result, the observed tag counts represent a biased sample of the actual transcript pool. In SAGE this bias can be avoided by ignoring all but the 3' most tag but will discard a large fraction of the observed data. Taking this bias into account should allow more of the available data to be used leading to increased statistical power. Results Three new hierarchical models, which directly embed a model for the variation in tag formation probability, are proposed and their associated Bayesian inference algorithms are developed. These models may be applied to libraries at both the tag and aggregate level. Simulation experiments and analysis of real data are used to contrast the accuracy of the various methods. The consequences of tag formation bias are discussed in the context of testing differential expression. A description is given as to how these algorithms can be applied in that context. Conclusions Several Bayesian inference algorithms that account for tag formation effects are compared with the DPB algorithm providing clear evidence of superior performance. The accuracy of inferences when using a particular non-informative prior is found to depend on the expression level of a given gene. The multivariate nature of the approach easily allows both univariate and joint tests of differential expression. Calculations demonstrate the potential for false positive and negative findings due to variation in tag formation probabilities across samples when testing for differential expression.</p

University of Tennessee, Knoxville: Trace

Crossref

Directory of Open Access Journals

PubMed Central

DukeSpace

Comparing the old and new generation SELDI-TOF MS: implications for serum protein profiling

Author: BL Adam
DF Ransohoff
DI Malyarenko
EP Diamandis
GL Freed
IJ Schultz
J Li
Jan HM Schellens
Jos H Beijnen
Judith YMN Engwegen
JY Engwegen
KA Baggerly
Marie-Christine W Gast
MB Caspersen
Z Zhang
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Although the PBS-IIc SELDI-TOF MS apparatus has been extensively used in the search for better biomarkers, issues have been raised concerning the semi-quantitative nature of the technique and its reproducibility. To overcome these limitations, a new SELDI-TOF MS instrument has been introduced: the PCS 4000 series. Changes in this apparatus compared to the older one are a.o. an increased dynamic range of the detector, an adjusted configuration of the detector sensitivity, a raster scan that ensures more complete desorption coverage and an improved detector attenuation mechanism. In the current study, we evaluated the performance of the old PBS-IIc and new PCS 4000 series generation SELDI-TOF MS apparatus. Methods To this end, two different sample sets were profiled after which the same ProteinChip arrays were analysed successively by both instruments. Generated spectra were analysed by the associated software packages. The performance of both instruments was evaluated by assessment of the number of peaks detected in the two sample sets, the biomarker potential and reproducibility of generated peak clusters, and the number of peaks detected following serum fractionation. Results We could not confirm the claimed improved performance of the new PCS 4000 instrument, as assessed by the number of peaks detected, the biomarker potential and the reproducibility. However, the PCS 4000 instrument did prove to be of superior performance in peak detection following profiling of serum fractions. Conclusion As serum fractionation facilitates detection of low abundant proteins through reduction of the dynamic range of serum proteins, it is now increasingly applied in the search for new potential biomarkers. Hence, although the new PCS 4000 instrument did not differ from the old PBS-IIc apparatus in the analysis of crude serum, its superior performance after serum fractionation does hold promise for improved biomarker detection and identification.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Biasogram: visualization of confounding technical bias in gene expression data.

Author: AC Eklund
Aron C. Eklund
BJ Daigle Jr
C Bartenhagen
D Venet
FM Giorgi
H Auer
HK Dressman
J Wang
JC Chang
JC Chang
JK Lee
KA Baggerly
KR Gabriel
KR Hess
Marcin Krzystanek
Q Li
S Dudoit
Xiaofeng Wang
Zoltan Szallasi
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Gene expression profiles of clinical cohorts can be used to identify genes that are correlated with a clinical variable of interest such as patient outcome or response to a particular drug. However, expression measurements are susceptible to technical bias caused by variation in extraneous factors such as RNA quality and array hybridization conditions. If such technical bias is correlated with the clinical variable of interest, the likelihood of identifying false positive genes is increased. Here we describe a method to visualize an expression matrix as a projection of all genes onto a plane defined by a clinical variable and a technical nuisance variable. The resulting plot indicates the extent to which each gene is correlated with the clinical variable or the technical variable. We demonstrate this method by applying it to three clinical trial microarray data sets, one of which identified genes that may have been driven by a confounding technical variable. This approach can be used as a quality control step to identify data sets that are likely to yield false positive results

CiteSeerX

Public Library of Science (PLOS)

Crossref

Harvard University - DASH

Directory of Open Access Journals

PubMed Central

Online Research Database In Technology

FigShare

Serum diagnosis of diffuse large B-cell lymphomas and further identification of response to therapy using SELDI-TOF-MS and tree analysis patterning

Author: A Vlahou
A Vlahou
BL Adam
Bo Wang
C P.
CP Paweletz
DA Holterman
DE van der Merwe
EF Petricoin
EF Petricoin 3rd
EP Diamandis
EP Diamandis
EP Diamandis
EP Diamandis
G Wu
GL Wright Jr.
H Hong
H Zhang
HJ Issaq
IS Lossos
IS Lossos
J Marshall
JD Wulfkuhle
JM Sorace
JS Abramson
KA Baggerly
KA Baggerly
L Breiman
L Miguet
LL Banez
MA Shipp
OJ Semmes
PC Walsh
RB Wilder
Wen-qi Jiang
Xiao-shi Zhang
Xing Zhang
Z Lin
Zhi-ming Li
Zhong-zhen Guan
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Clustering-based approaches to SAGE data mining

Author: C Keime
D Porter
F Rioult
Francisco Azuaje
GM Boratyn
H Chen
H Thygesen
H Wang
H Wang
H Wang
H Zheng
Haiying Wang
Huiru Zheng
I Mechaly
J Handl
J Lu
J Sander
J Stollberg
JB Vos
JM Ruijter
K Kim
KA Baggerly
KA Baggerly
L Cai
MA El-Meanawy
MA Gilchrist
MB Eisen
MC Abba
MZ Man
N Bolshakova
P Buckhaults
P Divina
P Tamayo
RT Ng
RZ Vêncio
S Audic
S Blackshaw
S Mclntosh
S Saha
SD Zuyderduyn
T Beißbarth
T Chu
T Kohonen
T Lee
VE Velculescu
VR Akmaev
VR Akmaev
W Chan
W Yasui
WD Patino
X Jin
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Serial analysis of gene expression (SAGE) is one of the most powerful tools for global gene expression profiling. It has led to several biological discoveries and biomedical applications, such as the prediction of new gene functions and the identification of biomarkers in human cancer research. Clustering techniques have become fundamental approaches in these applications. This paper reviews relevant clustering techniques specifically designed for this type of data. It places an emphasis on current limitations and opportunities in this area for supporting biologically-meaningful data mining and visualisation

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Methodological Deficits in Diagnostic Research Using ‘-Omics’ Technologies: Evaluation of the QUADOMICS Tool and Quality of Recently Published Studies

Author: A Tatsioni
AL Petri
Antje Timmer
AR Feinstein
AWS Rutjes
B Lumbreras
B Lumbreras
B Lumbreras
Blanca Lumbreras
C Belluco
CA Lantz
D Ghosh
DC Thomas
DF Ransohoff
DF Ransohoff
DG Kleinbaum
DL Sackett
E Check
EF Petricoin
EP Diamandis
EW Steyerberg
FA Monzon
GM Pasinetti
Ildefonso Hernández-Aguado
JA Hoppin
JG Lijmer
JM Taylor
JP Ioannidis
KA Baggerly
KA Baggerly
L Irwig
L Wagner
Lucy A. Parker
M Porta
M Porta
M Porta
Miquel Porta
MS Pepe
N Smidt
Noemí GómezSaez
P Armitage
P Whiting
P Whiting
P Whiting
PF Whiting
PM Bossuyt
PS Fontela
R Simon
RB Haynes
RS Negm
S Carrol
WG Finn
Publication venue: Public Library of Science
Publication date: 02/07/2010
Field of study

Background: QUADOMICS is an adaptation of QUADAS (a quality assessment tool for use in systematic reviews of diagnostic accuracy studies), which takes into account the particular challenges presented by '-omics' based technologies. Our primary objective was to evaluate the applicability and consistency of QUADOMICS. Subsequently we evaluated and describe the methodological quality of a sample of recently published studies using the tool. Methodology/Principal Findings: 45'-omics'- based diagnostic studies were identified by systematic search of Pubmed using suitable MeSH terms (>Genomics>, >Sensitivity and specificity>, >Diagnosis>). Three investigators independently assessed the quality of the articles using QUADOMICS and met to compare observations and generate a consensus. Consistency and applicability was assessed by comparing each reviewer's original rating with the consensus. Methodological quality was described using the consensus rating. Agreement was above 80% for all three reviewers. Four items presented difficulties with application, mostly due to the lack of a clearly defined gold standard. Methodological quality of our sample was poor; studies met roughly half of the applied criteria (mean ± sd, 54.7±18.4°%). Few studies were carried out in a population that mirrored the clinical situation in which the test would be used in practice, (6, 13.3%);none described patient recruitment sufficiently; and less than half described clinical and physiological factors that might influence the biomarker profile (20, 44.4%). Conclusions: The QUADOMICS tool can consistently be applied to diagnostic '-omics' studies presently published in biomedical journals. A substantial proportion of reports in this research field fail to address design issues that are fundamental to make inferences relevant for patient care. © 2010 Parker et al.This work was supported by the Spanish Agency for Health Technology Assessment, Exp PI06/90311, Instituto de Salud Carlos III and CIBER en Epidemiología y Salud Pública (CIBERESP) in SpainPeer Reviewe

Public Library of Science (PLOS)

Comparison of normalisation methods for surface-enhanced laser desorption and ionisation (SELDI) time-of-flight (TOF) mass spectrometry data

Author: AC Sauve
DI Malyarenko
EF Petricoin
ET Fung
GL Wright Jr
H Hong
JJ Goeman
Jos H Beijnen
JS Morris
Judith YMN Engwegen
JYMN Engwegen
KA Baggerly
L Breiman
Lodewyk FA Wessels
M de Noo
M Dijkstra
Marcel JT Reinders
Marie-Christine W Gast
ME de Noo
OJ Semmes
VN Vapnik
Wouter Meuleman
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Mass spectrometry for biological data analysis is an active field of research, providing an efficient way of high-throughput proteome screening. A popular variant of mass spectrometry is SELDI, which is often used to measure sample populations with the goal of developing (clinical) classifiers. Unfortunately, not only is the data resulting from such measurements quite noisy, variance between replicate measurements of the same sample can be high as well. Normalisation of spectra can greatly reduce the effect of this technical variance and further improve the quality and interpretability of the data. However, it is unclear which normalisation method yields the most informative result. Results In this paper, we describe the first systematic comparison of a wide range of normalisation methods, using two objectives that should be met by a good method. These objectives are minimisation of inter-spectra variance and maximisation of signal with respect to class separation. The former is assessed using an estimation of the coefficient of variation, the latter using the classification performance of three types of classifiers on real-world datasets representing two-class diagnostic problems. To obtain a maximally robust evaluation of a normalisation method, both objectives are evaluated over multiple datasets and multiple configurations of baseline correction and peak detection methods. Results are assessed for statistical significance and visualised to reveal the performance of each normalisation method, in particular with respect to using no normalisation. The normalisation methods described have been implemented in the freely available MASDA R-package. Conclusion In the general case, normalisation of mass spectra is beneficial to the quality of data. The majority of methods we compared performed significantly better than the case in which no normalisation was used. We have shown that normalisation methods that scale spectra by a factor based on the dispersion (e.g., standard deviation) of the data clearly outperform those where a factor based on the central location (e.g., mean) is used. Additional improvements in performance are obtained when these factors are estimated locally, using a sliding window within spectra, instead of globally, over full spectra. The underperforming category of methods using a globally estimated factor based on the central location of the data includes the method used by the majority of SELDI users.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Multicentric validation of proteomic biomarkers in urine specific for diabetic nephropathy

Background: Urine proteome analysis is rapidly emerging as a tool for diagnosis and prognosis in disease states. For diagnosis of diabetic nephropathy (DN), urinary proteome analysis was successfully applied in a pilot study. The validity of the previously established proteomic biomarkers with respect to the diagnostic and prognostic potential was assessed on a separate set of patients recruited at three different European centers. In this case-control study of 148 Caucasian patients with diabetes mellitus type 2 and duration >= 5 years, cases of DN were defined as albuminuria >300 mg/d and diabetic retinopathy (n = 66). Controls were matched for gender and diabetes duration (n = 82). Methodology/Principal Findings: Proteome analysis was performed blinded using high-resolution capillary electrophoresis coupled with mass spectrometry (CE-MS). Data were evaluated employing the previously developed model for DN. Upon unblinding, the model for DN showed 93.8% sensitivity and 91.4% specificity, with an AUC of 0.948 (95% CI 0.898-0.978). Of 65 previously identified peptides, 60 were significantly different between cases and controls of this study. In <10% of cases and controls classification by proteome analysis not entirely resulted in the expected clinical outcome. Analysis of patient's subsequent clinical course revealed later progression to DN in some of the false positive classified DN control patients. Conclusions: These data provide the first independent confirmation that profiling of the urinary proteome by CE-MS can adequately identify subjects with DN, supporting the generalizability of this approach. The data further establish urinary collagen fragments as biomarkers for diabetes-induced renal damage that may serve as earlier and more specific biomarkers than the currently used urinary albumin

Public Library of Science (PLOS)

Crossref

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Directory of Open Access Journals

PubMed Central

Enlighten

Dissertations of the University of Groningen

An integrative multi-platform analysis for discovering biomarkers of osteosarcoma

Author: A Bairoch
B Boeckmann
DA Hosack
DC Dahlin
E Gasteiger
EP Diamandis
G Bacci
Guodong Li
Huazong Zeng
Jilong Liu
JM Sorace
KA Baggerly
LC Whelan
Lei Chen
M Sztán
M Tafani
S Jin
T Philip
T Pisitkun
Wenjing Wang
Wenjuan Zhang
WF Enneking
X Deng
Y Benjamini
Y Li
Zhengdong Cai
Zhiyu Zhang
Publication venue: BioMed Central
Publication date: 01/05/2009
Field of study

Abstract Background SELDI-TOF-MS (Surface Enhanced Laser Desorption/Ionization-Time of Flight-Mass Spectrometry) has become an attractive approach for cancer biomarker discovery due to its ability to resolve low mass proteins and high-throughput capability. However, the analytes from mass spectrometry are described only by their mass-to-charge ratio (<it>m</it>/<it>z</it>) values without further identification and annotation. To discover potential biomarkers for early diagnosis of osteosarcoma, we designed an integrative workflow combining data sets from both SELDI-TOF-MS and gene microarray analysis. Methods After extracting the information for potential biomarkers from SELDI data and microarray analysis, their associations were further inferred by link-test to identify biomarkers that could likely be used for diagnosis. Immuno-blot analysis was then performed to examine whether the expression of the putative biomarkers were indeed altered in serum from patients with osteosarcoma. Results Six differentially expressed protein peaks with strong statistical significances were detected by SELDI-TOF-MS. Four of the proteins were up-regulated and two of them were down-regulated. Microarray analysis showed that, compared with an osteoblastic cell line, the expression of 653 genes was changed more than 2 folds in three osteosarcoma cell lines. While expression of 310 genes was increased, expression of the other 343 genes was decreased. The two sets of biomarkers candidates were combined by the link-test statistics, indicating that 13 genes were potential biomarkers for early diagnosis of osteosarcoma. Among these genes, cytochrome c1 (CYC-1) was selected for further experimental validation. Conclusion Link-test on datasets from both SELDI-TOF-MS and microarray high-throughput analysis can accelerate the identification of tumor biomarkers. The result confirmed that CYC-1 may be a promising biomarker for early diagnosis of osteosarcoma.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central