Search CORE

374 research outputs found

Knowledge-based gene expression classification via matrix factorization

Author: A. M. Tomé
Affymetrix
Allison
Baldi
Barnhill
Bolstad
Breiman
Cardoso
Cardoso
Chen
D. Lutter
Diaz-Uriarte
Diaz-Uriarte
Dougherty
Dougherty
Dudoit
E. W. Lang
F. J. Theis
G. Schmitz
Galton
Galton
Golub
Guyon
Hochreiter
Irrizarry
Lee
Li
Liebermeister
Liu
Lutter
M. Stetter
Mangasarian
P. Gómez Vilda
P. Knollmüller
Pearson
Quackenbush
R. Schachtner
Saidi
Schachtner
Schachtner
Schölkopf
Simon
Spang
Talloen
Troyanskaya
Tusher
Wu
Wu
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2008
Field of study

Motivation: Modern machine learning methods based on matrix decomposition techniques, like independent component analysis (ICA) or non-negative matrix factorization (NMF), provide new and efficient analysis tools which are currently explored to analyze gene expression profiles. These exploratory feature extraction techniques yield expression modes (ICA) or metagenes (NMF). These extracted features are considered indicative of underlying regulatory processes. They can as well be applied to the classification of gene expression datasets by grouping samples into different categories for diagnostic purposes or group genes into functional categories for further investigation of related metabolic pathways and regulatory networks. Results: In this study we focus on unsupervised matrix factorization techniques and apply ICA and sparse NMF to microarray datasets. The latter monitor the gene expression levels of human peripheral blood cells during differentiation from monocytes to macrophages. We show that these tools are able to identify relevant signatures in the deduced component matrices and extract informative sets of marker genes from these gene expression profiles. The methods rely on the joint discriminative power of a set of marker genes rather than on single marker genes. With these sets of marker genes, corroborated by leave-one-out or random forest cross-validation, the datasets could easily be classified into related diagnostic categories. The latter correspond to either monocytes versus macrophages or healthy vs Niemann Pick C disease patients.Siemens AG, MunichDFG (Graduate College 638)DAAD (PPP Luso - Alem˜a and PPP Hispano - Alemanas

Crossref

University of Regensburg Publication Server

Repositório Institucional da Universidade de Aveiro

PubMed Central

PuSH

Evaluating the discriminating capacity of cell death (apoptotic) biomarkers in sepsis.

Author: Bell Matthew
Clark Danielle
Duplessis Christopher
Fowler Vance G
Frey Kenneth
Gregory Michael
Jaehne Anja K
Kingsmore Stephen F
Langley Raymond J
Lawler James
Quackenbush Eugenia B
Rivers Emanuel P
Schully Kevin
Truong Luu
Tsalik Ephraim L
Woods Christopher W
Publication venue: Henry Ford Health System Scholarly Commons
Publication date: 01/01/2018
Field of study

Background: Sepsis biomarker panels that provide diagnostic and prognostic discrimination in sepsis patients would be transformative to patient care. We assessed the mortality prediction and diagnostic discriminatory accuracy of two biomarkers reflective of cell death (apoptosis), circulating cell-free DNA (cfDNA), and nucleosomes. Methods: The cfDNA and nucleosome levels were assayed in plasma samples acquired in patients admitted from four emergency departments with suspected sepsis. Subjects with non-infectious systemic inflammatory response syndrome (SIRS) served as controls. Samples were acquired at enrollment (T0) and 24 h later (T24). We assessed diagnostic (differentiating SIRS from sepsis) and prognostic (28-day mortality) predictive power. Models incorporating procalcitonin (diagnostic prediction) and APACHE II scores (mortality prediction) were generated. Results: Two hundred three subjects were included (107 provided procalcitonin measurements). Four subjects exhibited uncomplicated sepsis, 127 severe sepsis, 35 septic shock, and 24 had non-infectious SIRS. There were 190-survivors and 13 non-survivors. Mortality prediction models using cfDNA, nucleosomes, or APACHEII yielded AUC values of 0.61, 0.75, and 0.81, respectively. A model combining nucleosomes with the APACHE II score improved the AUC to 0.84. Diagnostic models distinguishing sepsis from SIRS using procalcitonin, cfDNA(T0), or nucleosomes(T0) yielded AUC values of 0.64, 0.65, and 0.63, respectively. The three parameter model yielded an AUC of 0.74. Conclusions: To our knowledge, this is the first head-to-head comparison of cfDNA and nucleosomes in diagnosing sepsis and predicting sepsis-related mortality. Both cfDNA and nucleosome concentrations demonstrated a modest ability to distinguish sepsis survivors and non-survivors and provided additive diagnostic predictive accuracy in differentiating sepsis from non-infectious SIRS when integrated into a diagnostic prediction model including PCT and APACHE II. A sepsis biomarker strategy incorporating measures of the apoptotic pathway may serve as an important component of a sepsis diagnostic and mortality prediction tool

Henry Ford Health System Scholarly Commons

D-MaPs - DNA-microarray projects: Web-based software for multi-platform microarray analysis

Author: Ana C. Deckmann
Barrett T
Bengtsson A
Diez D
Eschrich SA
Gentleman RC
Gonçalo A.G. Pereira
Irizarry RA
Marcelo F. Carazzolle
Nuwaysir EF
Quackenbush
Rainer J
Rehrauer H
Rincones J
Romualdi C
Schena M
Smyth GK
Taís S. Herig
Wu W
Xia X
Yang YH
Publication venue: Sociedade Brasileira de Genética
Publication date: 01/01/2009
Field of study

The web application D-Maps provides a user-friendly interface to researchers performing studies based on microarrays. The program was developed to manage and process one- or two-color microarray data obtained from several platforms (currently, GeneTAC, ScanArray, CodeLink, NimbleGen and Affymetrix). Despite the availability of many algorithms and many software programs designed to perform microarray analysis on the internet, these usually require sophisticated knowledge of mathematics, statistics and computation. D-maps was developed to overcome the requirement of high performance computers or programming experience. D-Maps performs raw data processing, normalization and statistical analysis, allowing access to the analyzed data in text or graphical format. An original feature presented by D-Maps is GEO (Gene Expression Omnibus) submission format service. The D-MaPs application was already used for analysis of oligonucleotide microarrays and PCR-spotted arrays (one- and two-color, laser and light scanner). In conclusion, D-Maps is a valuable tool for microarray research community, especially in the case of groups without a bioinformatic core

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Directory of Open Access Journals

PubMed Central

Repositorio da Producao Cientifica e Intelectual da Unicamp

Gene identification and protein classification in microbial metagenomic sequence data via incremental clustering

Author: A Bateman
A Nekrutenko
AC McHardy
AG Murzin
CA Orengo
D Fischer
DB Rusch
DH Haft
DL Wheeler
E Birney
ED Harrington
EF DeLong
EF DeLong
F Corpet
F Sanger
FMDL Vega
Granger Sutton
GW Tyson
H Noguchi
H Ochman
J Besemer
J Quackenbush
JA Eisen
JC Venter
K Chen
K Mavromatis
L Krause
L Rychlewski
M Margulies
M Sait
N Siew
R Seshadri
R Unger
RC Edgar
S Yooseph
SF Altschul
SF Altschul
SG Tringe
Shibu Yooseph
SJ Giovannoni
SR Gill
W Li
W Li
W Li
Weizhong Li
Z Yang
Z Yang
Publication venue: BioMed Central
Publication date: 01/04/2008
Field of study

Abstract Background The identification and study of proteins from metagenomic datasets can shed light on the roles and interactions of the source organisms in their communities. However, metagenomic datasets are characterized by the presence of organisms with varying GC composition, codon usage biases etc., and consequently gene identification is challenging. The vast amount of sequence data also requires faster protein family classification tools. Results We present a computational improvement to a sequence clustering approach that we developed previously to identify and classify protein coding genes in large microbial metagenomic datasets. The clustering approach can be used to identify protein coding genes in prokaryotes, viruses, and intron-less eukaryotes. The computational improvement is based on an incremental clustering method that does not require the expensive all-against-all compute that was required by the original approach, while still preserving the remote homology detection capabilities. We present evaluations of the clustering approach in protein-coding gene identification and classification, and also present the results of updating the protein clusters from our previous work with recent genomic and metagenomic sequences. The clustering results are available via CAMERA, (http://camera.calit2.net). Conclusion The clustering paradigm is shown to be a very useful tool in the analysis of microbial metagenomic data. The incremental clustering method is shown to be much faster than the original approach in identifying genes, grouping sequences into existing protein families, and also identifying novel families that have multiple members in a metagenomic dataset. These clusters provide a basis for further studies of protein families.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Functional Annotation and Identification of Candidate Disease Genes by Computational Analysis of Normal Tissue Gene Expression Data

Author: C Lefevre
CJ Wolfe
F Pagani
Fabio Rosa
Ferdinando Di Cunto
J Quackenbush
J Sulko
J van Helden
K Wutz
Laura Miozzi
Lorenzo Silengo
M Ashburner
M Michaelides
MA van Driel
MB Eisen
N Udar
NA Faustino
O Steinlein
OL Griffith
Oliver Hofmann
PA Krakowiak
Paolo Provero
PO Brown
R Edgar
RB Roth
Rosario Michael Piro
SM Garvey
T Barrett
TJP Hubbard
Ugo Ala
W Zhang
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

Background: High-throughput gene expression data can predict gene function through the ‘‘guilt by association’ ’ principle: coexpressed genes are likely to be functionally associated. Methodology/Principal Findings: We analyzed publicly available expression data on normal human tissues. The analysis is based on the integration of data obtained with two experimental platforms (microarrays and SAGE) and of various measures of dissimilarity between expression profiles. The building blocks of the procedure are the Ranked Coexpression Groups (RCG), small sets of tightly coexpressed genes which are analyzed in terms of functional annotation. Functionally characterized RCGs are selected by means of the majority rule and used to predict new functional annotations. Functionally characterized RCGs are enriched in groups of genes associated to similar phenotypes. We exploit this fact to find new candidate disease genes for many OMIM phenotypes of unknown molecular origin. Conclusions/Significance: We predict new functional annotations for many human genes, showing that the integration of different data sets and coexpression measures significantly improves the scope of the results. Combining gene expression data, functional annotation and known phenotype-gene associations we provide candidate genes for several geneti

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Institutional Research Information System University of Turin

High-confidence glycosome proteome for procyclic form Trypanosoma brucei by epitope-tag organelle enrichment and SILAC proteomics

Author: Alan Prescott
Amy Tavendale
Antonenkov V. D.
Aslett
Bandini G.
Brenchley R.
Brennand A.
Bringaud F.
Brun R.
Colasante C.
Colasante C.
Cox J.
Cox J.
Duffieux F.
Gabaldon T.
Gillespie J. R.
Gualdron-Lopez M.
Gualdron-Lopez M.
Gualdron-Lopez M.
Guther M. L.
Hannaert V.
Heise N.
Igoillo-Esteve M.
Kovacs W. J.
Krogh A.
Kuettel S.
Maria Lucia S. Güther
Marino K.
Marino K.
McLennan A. G.
Michael A. J. Ferguson
Michael D. Urbaniak
Michels P. A.
Michels P. A.
Moyersoen J.
Nuttall J. M.
Opperdoes F. R.
Opperdoes F. R.
Opperdoes F. R.
Parsons M.
Quackenbush J.
Roper J. R.
Stoffel S. A.
Stokes M. J.
Szoor B.
Turnock D. C.
Urbaniak M. D.
Urbaniak M. D.
Urbaniak M. D.
Vertommen D.
Vizcaino J. A.
Voncken F.
Voncken F.
Wang X.
Wendler A.
Wilkinson S. R.
Wirtz E.
Wisniewski J. R.
Yernaux C.
Zomer A. W.
Publication venue: 'American Chemical Society (ACS)'
Publication date: 05/05/2014
Field of study

The glycosome of the pathogenic African trypanosome Trypanosoma brucei is a specialized peroxisome that contains most of the enzymes of glycolysis and several other metabolic and catabolic pathways. The contents and transporters of this membrane-bounded organelle are of considerable interest as potential drug targets. Here we use epitope tagging, magnetic bead enrichment, and SILAC quantitative proteomics to determine a high-confidence glycosome proteome for the procyclic life cycle stage of the parasite using isotope ratios to discriminate glycosomal from mitochondrial and other contaminating proteins. The data confirm the presence of several previously demonstrated and suggested pathways in the organelle and identify previously unanticipated activities, such as protein phosphatases. The implications of the findings are discussed

Crossref

PubMed Central

University of Dundee Online Publications

Lancaster E-Prints

Lone-pair stabilization in transparent amorphous tin oxides:a potential route to p-type conduction pathways

Author: Arena Dario A.
Butler Keith T.
Guo Jinghua
Hendon Christopher H.
Lebens-Higgins Zachary W.
Mason Katie
Nandur Abhishek S.
Paik Hanjong
Piper Louis F J
Quackenbush Nicholas F.
Sallis Shawn
Schlom Darrell G.
Treharne Robert E.
Wahila Matthew J.
Walsh Aron
Watson Graeme W.
White Bruce E.
Woicik Joseph C.
Publication venue: 'American Chemical Society (ACS)'
Publication date: 08/06/2016
Field of study

The electronic and atomic structures of amorphous transparent tin oxides have been investigated by a combination of X-ray spectroscopy and atomistic calculations. Crystalline SnO is a promising p-type transparent oxide semiconductor due to a complex lone-pair hybridization that affords both optical transparency despite a small electronic band gap and spherical s-orbital character at the valence band edge. We find that both of these desirable properties (transparency and s-orbital valence band character) are retained upon amorphization despite the disruption of the layered lone-pair states by structural disorder. We explain the anomalously large band gap widening necessary to maintain transparency in terms of lone-pair stabilization via atomic clustering. Our understanding of this mechanism suggests that continuous hole conduction pathways along extended lone pair clusters should be possible under certain stoichiometries. Moreover, these findings should be applicable to other lone-pair active semiconductors

OPUS

Crossref

Spiral - Imperial College Digital Repository

FigShare

Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data

Author: Axel Benner
C Chang
D Jones
DB Allison
E Dimitriadou
F Markowetz
G Fung
Grischa Toedt
H Froehlich
H Zou
HH Zhang
I Guyon
I Guyon
I Inza
J Fan
J Quackenbush
JC Hsu
JD Hoheisel
JD Storey
L Wang
L Wang
LJ van't Veer
M Greiner
M Johannes
MJ van de Vijver
N Becker
Natalia Becker
Peter Lichter
PS Bradley
Q Liu
R Kohavi
R Kohavi
R Tibshirani
T Hastie
V Vapnik
W Gu
X Li
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Classification and variable selection play an important role in knowledge discovery in high-dimensional data. Although Support Vector Machine (SVM) algorithms are among the most powerful classification and prediction methods with a wide range of scientific applications, the SVM does not include automatic feature selection and therefore a number of feature selection procedures have been developed. Regularisation approaches extend SVM to a feature selection method in a flexible way using penalty functions like LASSO, SCAD and Elastic Net. We propose a novel penalty function for SVM classification tasks, Elastic SCAD, a combination of SCAD and ridge penalties which overcomes the limitations of each penalty alone. Since SVM models are extremely sensitive to the choice of tuning parameters, we adopted an interval search algorithm, which in comparison to a fixed grid search finds rapidly and more precisely a global optimal solution. Results Feature selection methods with combined penalties (Elastic Net and Elastic SCAD SVMs) are more robust to a change of the model complexity than methods using single penalties. Our simulation study showed that Elastic SCAD SVM outperformed LASSO (<it>L</it>1) and SCAD SVMs. Moreover, Elastic SCAD SVM provided sparser classifiers in terms of median number of features selected than Elastic Net SVM and often better predicted than Elastic Net in terms of misclassification error. Finally, we applied the penalization methods described above on four publicly available breast cancer data sets. Elastic SCAD SVM was the only method providing robust classifiers in sparse and non-sparse situations. Conclusions The proposed Elastic SCAD SVM algorithm provides the advantages of the SCAD penalty and at the same time avoids sparsity limitations for non-sparse data. We were first to demonstrate that the integration of the interval search algorithm and penalized SVM classification techniques provides fast solutions on the optimization of tuning parameters. The penalized SVM classification algorithms as well as fixed grid and interval search for finding appropriate tuning parameters were implemented in our freely available R package 'penalizedSVM'. We conclude that the Elastic SCAD SVM is a flexible and robust tool for classification and feature selection tasks for high-dimensional data such as microarray data sets.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Auditory-inspired morphological processing of speech spectrograms: applications in automatic speech recognition and speech enhancement

Author: A Rix
Ascensión Gallardo-Antolín
B Glasberg
B Moore
B Moore
C Martínez
C Peláez-Moreno
Carmen Peláez-Moreno
E Zwicker
E Zwicker
E Zwicker
F Jelinek
Francisco J. Valverde-Albacete
H Fastl
J Baker
J Beerends
Joyner Cadore
L Rabiner
L ten Bosch
M Florentine
Q Summerfield
R Gonzalez
R Meddis
R Meddis
R Patterson
S Davis
S Quackenbush
SS Stevens
T Irino
T Irino
TF Quatieri
TS Gunawan
W Jesteadt
Y Ephraim
Y Hu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

New auditory-inspired speech processing methods are presented in this paper, combining spectral subtraction and two-dimensional non-linear filtering techniques originally conceived for image processing purposes. In particular, mathematical morphology operations, like erosion and dilation, are applied to noisy speech spectrograms using specifically designed structuring elements inspired in the masking properties of the human auditory system. This is effectively complemented with a pre-processing stage including the conventional spectral subtraction procedure and auditory filterbanks. These methods were tested in both speech enhancement and automatic speech recognition tasks. For the first, time-frequency anisotropic structuring elements over grey-scale spectrograms were found to provide a better perceptual quality than isotropic ones, revealing themselves as more appropriate—under a number of perceptual quality estimation measures and several signal-to-noise ratios on the Aurora database—for retaining the structure of speech while removing background noise. For the second, the combination of Spectral Subtraction and auditory-inspired Morphological Filtering was found to improve recognition rates in a noise-contaminated version of the Isolet database.This work has been partially supported by the Spanish Ministry of Science and Innovation CICYT Project No. TEC2008-06382/TEC.Publicad

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Universidad Carlos III de Madrid e-Archivo

Microarray scanner calibration curves: characteristics and implications

Author: AM Dudley
Axon
BA Rosenzweig
D Hekstra
EP Hoffman
F Naef
Federico M Goodsaid
Felix W Frueh
GA Held
H Bengtsson
H Lyng
H Yue
Hong Fang
Huixiao Hong
IV Yang
J Fuscoe
J Quackenbush
James C Fuscoe
James J Chen
Jing Han
JN Weinstein
K Dobbin
K Dobbin
L Shi
L Shi
LE Dodd
Lei Guo
Leming Shi
MJ Martinez
N Raghavachari
Qian Xie
Raj K Puri
Roger G Perkins
S Pickett
Stephen C Harris
T Yuen
Tao Han
VG Cheung
VG Desai
W Tong
W Tong
Weida Tong
William S Branham
WR Foster
Y Zong
YH Yang
Z Alex Xu
Zhenqiang Su
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: Microarray-based measurement of mRNA abundance assumes a linear relationship between the fluorescence intensity and the dye concentration. In reality, however, the calibration curve can be nonlinear. RESULTS: By scanning a microarray scanner calibration slide containing known concentrations of fluorescent dyes under 18 PMT gains, we were able to evaluate the differences in calibration characteristics of Cy5 and Cy3. First, the calibration curve for the same dye under the same PMT gain is nonlinear at both the high and low intensity ends. Second, the degree of nonlinearity of the calibration curve depends on the PMT gain. Third, the two PMTs (for Cy5 and Cy3) behave differently even under the same gain. Fourth, the background intensity for the Cy3 channel is higher than that for the Cy5 channel. The impact of such characteristics on the accuracy and reproducibility of measured mRNA abundance and the calculated ratios was demonstrated. Combined with simulation results, we provided explanations to the existence of ratio underestimation, intensity-dependence of ratio bias, and anti-correlation of ratios in dye-swap replicates. We further demonstrated that although Lowess normalization effectively eliminates the intensity-dependence of ratio bias, the systematic deviation from true ratios largely remained. A method of calculating ratios based on concentrations estimated from the calibration curves was proposed for correcting ratio bias. CONCLUSION: It is preferable to scan microarray slides at fixed, optimal gain settings under which the linearity between concentration and intensity is maximized. Although normalization methods improve reproducibility of microarray measurements, they appear less effective in improving accuracy

Crossref

Springer - Publisher Connector

PubMed Central