Search CORE

100 research outputs found

SignS: a parallelized, open-source, freely available, web-based tool for gene selection and molecular signatures for survival and censored data

Author: A Alibés
C Ambroise
C Hughes
D Turek
EJ Kontoghiorghes
F Harrell
H Li
H Li
H Sutter
HMM Bøvelstad
Hothorn
I Foster
J Dongarra
J Gui
J Gui
J Klein
J Waldo
K Asanovic
KF Fogel
KH Pan
L Kaderali
M Reich
M Schumacher
MR Segal
N Sha
P Bühlmann
P Graham
P Pacheco
P Van Roy
PJ Park
R Bair
R Development Core Team
R Diaz-Uriarte
R Díaz-Uriarte
R Díaz-Uriarte
R Simon
Ramon Diaz-Uriarte
RL Somorjai
S Dudoit
S Ma
S Ma
S Ma
S Varma
SM Baxter
SS Dave
T Hothorn
T Hothorn
T Hothorn
WN van Wieringen
Y Pawitan
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Censored data are increasingly common in many microarray studies that attempt to relate gene expression to patient survival. Several new methods have been proposed in the last two years. Most of these methods, however, are not available to biomedical researchers, leading to many re-implementations from scratch of ad-hoc, and suboptimal, approaches with survival data. Results We have developed SignS (Signatures for Survival data), an open-source, freely-available, web-based tool and R package for gene selection, building molecular signatures, and prediction with survival data. SignS implements four methods which, according to existing reviews, perform well and, by being of a very different nature, offer complementary approaches. We use parallel computing via MPI, leading to large decreases in user waiting time. Cross-validation is used to asses predictive performance and stability of solutions, the latter an issue of increasing concern given that there are often several solutions with similar predictive performance. Biological interpretation of results is enhanced because genes and signatures in models can be sent to other freely-available on-line tools for examination of PubMed references, GO terms, and KEGG and Reactome pathways of selected genes. Conclusion SignS is the first web-based tool for survival analysis of expression data, and one of the very few with biomedical researchers as target users. SignS is also one of the few bioinformatics web-based applications to extensively use parallelization, including fault tolerance and crash recovery. Because of its combination of methods implemented, usage of parallel computing, code availability, and links to additional data bases, SignS is a unique tool, and will be of immediate relevance to biomedical researchers, biostatisticians and bioinformaticians.</p

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Biblos-e Archivo

Conditional variable importance for random forests

Author: A Bureau
Achim Zeileis
Anne-Laure Boulesteix
BJ van Os
C Strobl
C Strobl
C Strobl
Carolin Strobl
E Bauer
JH Silber
K Nicodemus
KJ Archer
KL Lunetta
L Breiman
L Breiman
L Breiman
L Breiman
L Breiman
M Nason
MR Segal
Mvan der Laan
P Bühlmann
P Good
R Development Core Team
R Diaz-Uriarte
R Diaz-Uriarte
R Feraud
SM Stigler
T Hastie
T Hothorn
TG Dietterich
Thomas Augustin
Thomas Kneib
V Svetnik
W Rodenburg
X Huang
X Xia
Y Lin
Y Qi
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Random forests are becoming increasingly popular in many scientific fields because they can cope with ``small n large p'' problems, complex interactions and even highly correlated predictor variables. Their variable importance measures have recently been suggested as screening tools for, e.g., gene expression studies. However, these variable importance measures show a bias towards correlated predictor variables. We identify two mechanisms responsible for this finding: (i) A preference for the selection of correlated predictors in the tree building process and (ii) an additional advantage for correlated predictor variables induced by the unconditional permutation scheme that is employed in the computation of the variable importance measure. Based on these considerations we develop a new, conditional permutation scheme for the computation of the variable importance measure. The resulting conditional variable importance is shown to reflect the true impact of each predictor variable more reliably than the original marginal approach

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Open Access LMU

Elektronische Publikationen der Wirtschaftsuniversität Wien

The detection and location estimation of disasters using Twitter and the identification of Non-Governmental Organisations using crowdsourcing

Author: B Jongman
CC Aggarwal
CH Lee
F Atefeh
HM Saleem
HP Kriegel
J Capdevila
J Sander
J Weng
JP De Albuquerque
K Sparck Jones
M Kremer
M Sokolova
NV Chawla
O Ozdikis
PM Landwehr
R Díaz-Uriarte
R Rifkin
S Unankard
SM Omohundro
T Cheng
T Sakaki
TBN Hoang
W Sherchan
WH Smith
X Zhou
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/07/2020
Field of study

status: publishe

Lirias

Crossref

Edinburgh Research Explorer

Epigenetic mechanisms and metabolic reprogramming in fibrogenesis: dual targeting of G9a and DNMT1 for the inhibition of liver fibrosis

Author: Alvarez L
Arechederra M
Avila MA
Barcena-Varela M
Berasain C
Claveria A
Colyn L
Fernandez-Barrena MG
French J
Garate M
Iraburu MJ
Latasa MU
Mann J
Milkiewicz M
Milkiewicz P
Oakley F
Oyarzabal J
Paish H
Pardo-Saganta A
Prosper F
Recalde M
Robinson SM
Rombouts K
Sangro B
Santamaria E
Uriarte I
Publication venue
Publication date: 23/04/2020
Field of study

OBJECTIVE: Hepatic stellate cells (HSC) transdifferentiation into myofibroblasts is central to fibrogenesis. Epigenetic mechanisms, including histone and DNA methylation, play a key role in this process. Concerted action between histone and DNA-mehyltransferases like G9a and DNMT1 is a common theme in gene expression regulation. We aimed to study the efficacy of CM272, a first-in-class dual and reversible G9a/DNMT1 inhibitor, in halting fibrogenesis. DESIGN: G9a and DNMT1 were analysed in cirrhotic human livers, mouse models of liver fibrosis and cultured mouse HSC. G9a and DNMT1 expression was knocked down or inhibited with CM272 in human HSC (hHSC), and transcriptomic responses to transforming growth factor-β1 (TGFβ1) were examined. Glycolytic metabolism and mitochondrial function were analysed with Seahorse-XF technology. Gene expression regulation was analysed by chromatin immunoprecipitation and methylation-specific PCR. Antifibrogenic activity and safety of CM272 were studied in mouse chronic CCl4 administration and bile duct ligation (BDL), and in human precision-cut liver slices (PCLSs) in a new bioreactor technology. RESULTS: G9a and DNMT1 were detected in stromal cells in areas of active fibrosis in human and mouse livers. G9a and DNMT1 expression was induced during mouse HSC activation, and TGFβ1 triggered their chromatin recruitment in hHSC. G9a/DNMT1 knockdown and CM272 inhibited TGFβ1 fibrogenic responses in hHSC. TGFβ1-mediated profibrogenic metabolic reprogramming was abrogated by CM272, which restored gluconeogenic gene expression and mitochondrial function through on-target epigenetic effects. CM272 inhibited fibrogenesis in mice and PCLSs without toxicity. CONCLUSIONS: Dual G9a/DNMT1 inhibition by compounds like CM272 may be a novel therapeutic strategy for treating liver fibrosis

UCL Discovery

A random forest approach to the detection of epistatic interactions in case-control studies

Author: A Bureau
A Collins
AG Heidema
AM Glazier
BA McKinney
CT Tsai
E Lander
HC Fung
J Hoh
J Marchini
J Millstein
J Simon-Sanchez
JH Moore
JK Pritchard
L Breiman
L Kruglyak
L Tiret
MD Ritchie
MP Martin
MR Nelson
N Chatterjee
NJ Risch
R Culverhouse
R Diaz-Uriarte
R Jiang
R Jiang
RJ Klein
RO Duda
Rui Jiang
SM Williams
TM Phuong
Wanwan Tang
Wenhui Fu
X Chen
Xuebing Wu
Y Ye
Y Zhang
YM Cho
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background The key roles of epistatic interactions between multiple genetic variants in the pathogenesis of complex diseases notwithstanding, the detection of such interactions remains a great challenge in genome-wide association studies. Although some existing multi-locus approaches have shown their successes in small-scale case-control data, the "combination explosion" course prohibits their applications to genome-wide analysis. It is therefore indispensable to develop new methods that are able to reduce the search space for epistatic interactions from an astronomic number of all possible combinations of genetic variants to a manageable set of candidates. Results We studied case-control data from the viewpoint of binary classification. More precisely, we treated single nucleotide polymorphism (SNP) markers as categorical features and adopted the random forest to discriminate cases against controls. On the basis of the gini importance given by the random forest, we designed a sliding window sequential forward feature selection (SWSFS) algorithm to select a small set of candidate SNPs that could minimize the classification error and then statistically tested up to three-way interactions of the candidates. We compared this approach with three existing methods on three simulated disease models and showed that our approach is comparable to, sometimes more powerful than, the other methods. We applied our approach to a genome-wide case-control dataset for Age-related Macular Degeneration (AMD) and successfully identified two SNPs that were reported to be associated with this disease. Conclusion Besides existing pure statistical approaches, we demonstrated the feasibility of incorporating machine learning methods into genome-wide case-control studies. The gini importance offers yet another measure for the associations between SNPs and complex diseases, thereby complementing existing statistical measures to facilitate the identification of epistatic interactions and the understanding of epistasis in the pathogenesis of complex diseases.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Growth Strategies of Tropical Tree Species: Disentangling Light and Size Effects

Author: A Gelman
A Gelman
A Gelman
AJ Kerkhoff
B Herault
C Baraloto
CD Canham
CD Canham
CJE Metcalf
CW Welden
D Purves
D Sheil
DA Clark
DA Clark
DA Coomes
DA King
DH Dent
DM Windsor
DM Windsor
EGJ Leigh
EGJ Leigh
Enrico Scalas
G Kunstler
Ghislain Vieilledent
H Akaike
HC Muller-Landau
J Chave
JB Yavitt
JMG Bloor
JS Clark
JS Clark
JS Denslow
JS Denslow
JS Denslow
JS Wright
JW Dalling
JW Lichstein
K Kitajima
KD Coates
KJ Niklas
L Poorter
L Poorter
L Poorter
L Sack
LS Comita
M Lieberman
M Uriarte
M Uriarte
N Rüger
N Rüger
Nadja Rüger
NCA Pitman
ND Brown
OL Phillips
PH Wyckoff
PJ van der Meer
R Condit
R Condit
R Condit
R Condit
R Wirth
RA Montgomery
RB Foster
RC de Gouvenain
Richard Condit
RJW Brienen
RK Kobe
S Iriarte Vivar Balderrama
SA Mangan
SE Johnson
SJ Davies
SJ Wright
SM McMahon
SP Hubbell
Stephen P. Hubbell
SW Pacala
T Kohyama
TB Croat
Uta Berger
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

An understanding of the drivers of tree growth at the species level is required to predict likely changes of carbon stocks and biodiversity when environmental conditions change. Especially in species-rich tropical forests, it is largely unknown how species differ in their response of growth to resource availability and individual size. We use a hierarchical Bayesian approach to quantify the impact of light availability and tree diameter on growth of 274 woody species in a 50-ha long-term forest census plot in Barro Colorado Island, Panama. Light reaching each individual tree was estimated from yearly vertical censuses of canopy density. The hierarchical Bayesian approach allowed accounting for different sources of error, such as negative growth observations, and including rare species correctly weighted by their abundance. All species grew faster at higher light. Exponents of a power function relating growth to light were mostly between 0 and 1. This indicates that nearly all species exhibit a decelerating increase of growth with light. In contrast, estimated growth rates at standardized conditions (5 cm dbh, 5% light) varied over a 9-fold range and reflect strong growth-strategy differentiation between the species. As a consequence, growth rankings of the species at low (2%) and high light (20%) were highly correlated. Rare species tended to grow faster and showed a greater sensitivity to light than abundant species. Overall, tree size was less important for growth than light and about half the species were predicted to grow faster in diameter when bigger or smaller, respectively. Together light availability and tree diameter only explained on average 12% of the variation in growth rates. Thus, other factors such as soil characteristics, herbivory, or pathogens may contribute considerably to shaping tree growth in the tropics

Crossref

Directory of Open Access Journals

PubMed Central

Agritrop

Adipose Gene Expression Prior to Weight Loss Can Differentiate and Weakly Predict Dietary Responders

Author: A Liaw
A Perez-Diez
A Rissanen
A Rosenwald
Arne Astrup
B Heidecker
Claus Holst
Corneliu Henegar
David M. Mutch
Dawn Albertson
DM Mutch
DM Mutch
DM Mutch
Dominique Langin
DT Ross
Florence Combes
J Kaput
J Lapointe
J. Alfredo Martinez
Jean-Daniel Zucker
K Clement
Karine Clément
KK Jain
L Breiman
L Ein-Dor
L Perusse
L van't Veer
M Petersen
M. Ramzi Temanni
MA Shipp
MJ Moreno-Aliaga
N Finer
N Viguerie
N Viguerie
Nathalie Viguerie
R Diaz-Uriarte
R Kohavi
RA Koza
RJ Loos
S Dudoit
S Klaus
SL Pomeroy
SM Lin
T Hastie
T Mary-Huard
Thorkild I. A. Sørensen
TIA Sorensen
TR Golub
TS Furey
Véronique Pelloux
Wim H. M. Saris
Y Lee
YH Tseng
Publication venue: Public Library of Science
Publication date: 01/01/2007
Field of study

BACKGROUND: The ability to identify obese individuals who will successfully lose weight in response to dietary intervention will revolutionize disease management. Therefore, we asked whether it is possible to identify subjects who will lose weight during dietary intervention using only a single gene expression snapshot. METHODOLOGY/PRINCIPAL FINDINGS: The present study involved 54 female subjects from the Nutrient-Gene Interactions in Human Obesity-Implications for Dietary Guidelines (NUGENOB) trial to determine whether subcutaneous adipose tissue gene expression could be used to predict weight loss prior to the 10-week consumption of a low-fat hypocaloric diet. Using several statistical tests revealed that the gene expression profiles of responders (8-12 kgs weight loss) could always be differentiated from non-responders (<4 kgs weight loss). We also assessed whether this differentiation was sufficient for prediction. Using a bottom-up (i.e. black-box) approach, standard class prediction algorithms were able to predict dietary responders with up to 61.1%+/-8.1% accuracy. Using a top-down approach (i.e. using differentially expressed genes to build a classifier) improved prediction accuracy to 80.9%+/-2.2%. CONCLUSION: Adipose gene expression profiling prior to the consumption of a low-fat diet is able to differentiate responders from non-responders as well as serve as a weak predictor of subjects destined to lose weight. While the degree of prediction accuracy currently achieved with a gene expression snapshot is perhaps insufficient for clinical use, this work reveals that the comprehensive molecular signature of adipose tissue paves the way for the future of personalized nutrition

Public Library of Science (PLOS)

Maastricht University Research Portal

Crossref

Directory of Open Access Journals

Universidad de Navarra

PubMed Central

Copenhagen University Research Information System

Horizon / Pleins textes

Dadun, University of Navarra

Machine Learning Approach for Prescriptive Plant Breeding

Author: A Liaw
A Singh
AK Singh
AT Mastrodomenico
BH Menze
BL Ma
BS Christenson
C Penone
C Ziyomo
D Pauli
Deniz Akdemir
DJ Stekhoven
DM Lambert
DS Harris
ER Cober
F Gao
FA van Eeuwijk
FH Andrade
G Machado
I Guyon
J Crain
J Jin
J Zhang
James B. Holland
JE Board
JE Specht
JE Vogelmann
Jessica Rutkoski
JH Friedman
JJ Suhre
JL Araus
JL De Bruin
JL De Bruin
JW Singer
K Rincker
Kyle Parmley
L Breiman
M de Felipe
M Garriga
M Kuhn
M Reynolds
MJ Morrison
NR Keep
NV McKinney
OA Montesinos-López
PM Granitto
R Díaz-Uriarte
R Wei
R Wells
RD Cook
RK Teal
RP Koester
S Ghosal
S Thapa
SC Rowntree
SM Hock
SP Conley
WF Schillinger
WJ Ethredge
WR Fehr
X Liu
X Xiao
Publication venue: Iowa State University Digital Repository
Publication date: 20/11/2019
Field of study

We explored the capability of fusing high dimensional phenotypic trait (phenomic) data with a machine learning (ML) approach to provide plant breeders the tools to do both in-season seed yield (SY) prediction and prescriptive cultivar development for targeted agro-management practices (e.g., row spacing and seeding density). We phenotyped 32 SoyNAM parent genotypes in two independent studies each with contrasting agro-management treatments (two row spacing, three seeding densities). Phenotypic trait data (canopy temperature, chlorophyll content, hyperspectral reflectance, leaf area index, and light interception) were generated using an array of sensors at three growth stages during the growing season and seed yield (SY) determined by machine harvest. Random forest (RF) was used to train models for SY prediction using phenotypic traits (predictor variables) to identify the optimal temporal combination of variables to maximize accuracy and resource allocation. RF models were trained using data from both experiments and individually for each agro-management treatment. We report the most important traits agnostic of agro-management practices. Several predictor variables showed conditional importance dependent on the agro-management system. We assembled predictive models to enable in-season SY prediction, enabling the development of a framework to integrate phenomics information with powerful ML for prediction enabled prescriptive plant breeding

Digital Repository @ Iowa State University (ISU)

Crossref

Individualized markers optimize class prediction of microarray data

Author: A Ben-Dor
A von Heydebreck
A Wong
AA Ferrando
AR Dabney
C Chang
C Motz
C Wu
CA Iacobuzio-Donahue
E Coustan-Smith
E Schleiff
F Martella
G Callagy
G Salomons
I Guyon
I Inza
J Catlett
J Held-Feindt
J Li
J Lyons-Weiler
J Reiss
J Weston
J Zhang
JG Thomas
K Bloch
M Sanchez-Carbayo
M Steinau
M West
ME Lenburg
ME Ross
MI Ryder
MS Felipe
P Baldi
P Ganigi
P Luciani
P Luciani
Panayiota Poirazi
Pavlos Pavlidis
R Bijlani
R Diaz-Uriarte
S Aulwurm
S Ilyin
S Ilyin
S Kumar
S Nambiar
S Steller
S Varma
SA Armstrong
SK Shevade
SL Pomeroy
SM Arfin
T Karakas
TR Golub
TS Tanaka
U Fayyad
V Sriuranpong
W Kolch
X Chen
X Liu
X Liu
X Yan
Y Chen
Y Cheng
Y Li
Y Wang
Y Yanagi
Publication venue: BioMed Central
Publication date: 01/07/2006
Field of study

BACKGROUND: Identification of molecular markers for the classification of microarray data is a challenging task. Despite the evident dissimilarity in various characteristics of biological samples belonging to the same category, most of the marker – selection and classification methods do not consider this variability. In general, feature selection methods aim at identifying a common set of genes whose combined expression profiles can accurately predict the category of all samples. Here, we argue that this simplified approach is often unable to capture the complexity of a disease phenotype and we propose an alternative method that takes into account the individuality of each patient-sample. RESULTS: Instead of using the same features for the classification of all samples, the proposed technique starts by creating a pool of informative gene-features. For each sample, the method selects a subset of these features whose expression profiles are most likely to accurately predict the sample's category. Different subsets are utilized for different samples and the outcomes are combined in a hierarchical framework for the classification of all samples. Moreover, this approach can innately identify subgroups of samples within a given class which share common feature sets thus highlighting the effect of individuality on gene expression. CONCLUSION: In addition to high classification accuracy, the proposed method offers a more individualized approach for the identification of biological markers, which may help in better understanding the molecular background of a disease and emphasize the need for more flexible medical interventions

Crossref

Directory of Open Access Journals

PubMed Central

Transcription Initiation Activity Sets Replication Origin Efficiency in Mammalian Cells

Genomic mapping of DNA replication origins (ORIs) in mammals provides a powerful means for understanding the regulatory complexity of our genome. Here we combine a genome-wide approach to identify preferential sites of DNA replication initiation at 0.4% of the mouse genome with detailed molecular analysis at distinct classes of ORIs according to their location relative to the genes. Our study reveals that 85% of the replication initiation sites in mouse embryonic stem (ES) cells are associated with transcriptional units. Nearly half of the identified ORIs map at promoter regions and, interestingly, ORI density strongly correlates with promoter density, reflecting the coordinated organisation of replication and transcription in the mouse genome. Detailed analysis of ORI activity showed that CpG island promoter-ORIs are the most efficient ORIs in ES cells and both ORI specification and firing efficiency are maintained across cell types. Remarkably, the distribution of replication initiation sites at promoter-ORIs exactly parallels that of transcription start sites (TSS), suggesting a co-evolution of the regulatory regions driving replication and transcription. Moreover, we found that promoter-ORIs are significantly enriched in CAGE tags derived from early embryos relative to all promoters. This association implies that transcription initiation early in development sets the probability of ORI activation, unveiling a new hallmark in ORI efficiency regulation in mammalian cells

Public Library of Science (PLOS)

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

PubMed Central

Estudo Geral

Oxford University Research Archive

Digital.CSIC

Biblos-e Archivo