Search CORE

782 research outputs found

Hypothesis exploration with visualization of variance.

Author: Bilder Robert M
Congdon Eliza
Parker Douglass Stott
Publication venue: eScholarship, University of California
Publication date: 01/01/2014
Field of study

BackgroundThe Consortium for Neuropsychiatric Phenomics (CNP) at UCLA was an investigation into the biological bases of traits such as memory and response inhibition phenotypes-to explore whether they are linked to syndromes including ADHD, Bipolar disorder, and Schizophrenia. An aim of the consortium was in moving from traditional categorical approaches for psychiatric syndromes towards more quantitative approaches based on large-scale analysis of the space of human variation. It represented an application of phenomics-wide-scale, systematic study of phenotypes-to neuropsychiatry research.ResultsThis paper reports on a system for exploration of hypotheses in data obtained from the LA2K, LA3C, and LA5C studies in CNP. ViVA is a system for exploratory data analysis using novel mathematical models and methods for visualization of variance. An example of these methods is called VISOVA, a combination of visualization and analysis of variance, with the flavor of exploration associated with ANOVA in biomedical hypothesis generation. It permits visual identification of phenotype profiles-patterns of values across phenotypes-that characterize groups. Visualization enables screening and refinement of hypotheses about variance structure of sets of phenotypes.ConclusionsThe ViVA system was designed for exploration of neuropsychiatric hypotheses by interdisciplinary teams. Automated visualization in ViVA supports 'natural selection' on a pool of hypotheses, and permits deeper understanding of the statistical architecture of the data. Large-scale perspective of this kind could lead to better neuropsychiatric diagnostics

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

Detection of changes in gene regulatory patterns, elicited by perturbations of the Hsp90 molecular chaperone complex, by visualizing multiple experiments with an animation

Author: A Barsky
A Bernthaler
A Kralli
A Stanhill
AA Duina
AJ Caplan
AJ Weaver
AK Mandal
AM Erkine
B Wong
BJ Breitkreutz
D Gadelle
D Picard
DA Stavreva
Deo P Pandey
Didier Picard
DP Pandey
E Boy-Marcotte
F Estruch
F Forafonov
Fedor Forafonov
Guillaume Mühlebach
I Grad
I Ulitsky
J Ptacek
JF Louvion
JJ Kovacs
JL Johnson
JL Johnson
JM Thevelein
K Richter
M Ashburner
M Geymonat
MM Ali
MS Cline
MT Martinez-Pastor
N Gehlenborg
OA Toogun
P Adler
P Fox
P Zarzov
Pablo C Echeverría
PC Echeverría
PG Besant
SB Ferguson
SH McLaughlin
SJ Felts
SK Wandinger
SP Bohen
SW Ki
T Ideker
WP Sullivan
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background To make sense out of gene expression profiles, such analyses must be pushed beyond the mere listing of affected genes. For example, if a group of genes persistently display similar changes in expression levels under particular experimental conditions, and the proteins encoded by these genes interact and function in the same cellular compartments, this could be taken as very strong indicators for co-regulated protein complexes. One of the key requirements is having appropriate tools to detect such regulatory patterns. Results We have analyzed the global adaptations in gene expression patterns in the budding yeast when the Hsp90 molecular chaperone complex is perturbed either pharmacologically or genetically. We integrated these results with publicly accessible expression, protein-protein interaction and intracellular localization data. But most importantly, all experimental conditions were simultaneously and dynamically visualized with an animation. This critically facilitated the detection of patterns of gene expression changes that suggested underlying regulatory networks that a standard analysis by pairwise comparison and clustering could not have revealed. Conclusions The results of the animation-assisted detection of changes in gene regulatory patterns make predictions about the potential roles of Hsp90 and its co-chaperone p23 in regulating whole sets of genes. The simultaneous dynamic visualization of microarray experiments, represented in networks built by integrating one's own experimental with publicly accessible data, represents a powerful discovery tool that allows the generation of new interpretations and hypotheses.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Archive ouverte UNIGE

Clustering-based approaches to SAGE data mining

Author: C Keime
D Porter
F Rioult
Francisco Azuaje
GM Boratyn
H Chen
H Thygesen
H Wang
H Wang
H Wang
H Zheng
Haiying Wang
Huiru Zheng
I Mechaly
J Handl
J Lu
J Sander
J Stollberg
JB Vos
JM Ruijter
K Kim
KA Baggerly
KA Baggerly
L Cai
MA El-Meanawy
MA Gilchrist
MB Eisen
MC Abba
MZ Man
N Bolshakova
P Buckhaults
P Divina
P Tamayo
RT Ng
RZ Vêncio
S Audic
S Blackshaw
S Mclntosh
S Saha
SD Zuyderduyn
T Beißbarth
T Chu
T Kohonen
T Lee
VE Velculescu
VR Akmaev
VR Akmaev
W Chan
W Yasui
WD Patino
X Jin
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Serial analysis of gene expression (SAGE) is one of the most powerful tools for global gene expression profiling. It has led to several biological discoveries and biomedical applications, such as the prediction of new gene functions and the identification of biomarkers in human cancer research. Clustering techniques have become fundamental approaches in these applications. This paper reviews relevant clustering techniques specifically designed for this type of data. It places an emphasis on current limitations and opportunities in this area for supporting biologically-meaningful data mining and visualisation

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Genetic variants and their interactions in disease risk prediction – machine learning and network perspectives

Author: 1000 Genomes Project
A Ashworth
A Burga
A Califano
A Galvan
A Gyenesei
A Statnikov
A Torkamani
A Torkamani
AL Barabási
AL Hopkins
B Lehner
B Lehner
B Maher
B Rakitsch
BA McKinney
BA McKinney
BS Srinivasan
C Ambroise
C Kooperberg
C Tian
C Winter
CG Lambert
CS Greene
D Merico
D Urbach
DJ Balding
DM Evans
DW Aha
DW Huang
DW Huang
E Lee
EA Ashley
EE Eichler
EE Schadt
ES Lander
F Barrenäs
G Bebek
G Gibson
G Hannum
G Peng
GK Chen
GM Clarke
H Eleftherohorinou
H Holm
H Zhong
HJ Cordell
HY Chuang
I Feldman
I Guyon
I König
I Surakka
J Corander
J Jakobsdottir
J Kruppa
J Tuikkala
J Yang
JD Iglehart
JH Moore
JH Moore
K Askland
K Wang
KA Pattin
KS Reynolds
L Luo
M Ladouceur
M Michaut
M Mooney
M Smoot
M Vidal
MA Heiskanen
MD Ritchie
MJ Sillanpää
NA Lavender
NF Marko
O Lavi
O Zuk
P Beltrao
P Donnelly
P Kraft
P Sebastiani
P Smialowski
PC Phillips
PJ Castaldi
Q He
R Braun
R Jelier
R Makowsky
R Simon
RO Lindén
S Lee
S Okser
S Ripatti
S Varma
SE Baranzini
Sebastian Okser
SJ Dixon
SW Hartley
T Hu
T Ideker
T Pahikkala
T Peltola
T Schupbach
TA Manolio
Tapio Pahikkala
Tero Aittokallio
TS Deisboeck
TT Wu
U Ober
U Ober
V Bansal
VK Ramanan
W Huang
Wellcome Trust Case Control Consortium
WG Kaelin Jr
Y Saeys
Z Wang
Z Wei
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A Normalized Tree Index for identification of correlated clinical parameters in microarray experiments

Author: A Goldhirsch
A Tauchen
A Tauchen
Anika Tauchen
Anke Becker
C Martin
C Sotiriou
Christian W Martin
CM Perou
E Huang
GA Pavlopoulos
H Wang
J Handl
J Quackenbush
J Wang
Kendall
L Sachs
LJ van't Veer
M Halkidi
MB Eisen
MF Ochs
MJ van de Vijver
NA Samaan
NL Johnson
RA Fisher
S Datta
S Loi
S Tsumoto
T Decker
T Sorlie
Tim W Nattkemper
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Martin C, Tauchen A, Becker A, Nattkemper TW. A Normalized Tree Index for identification of correlated clinical parameters in microarray data. BioData Mining. 2011;4(1): 2.BACKGROUND: Measurements on gene level are widely used to gain new insights in complex diseases e.g. cancer. A promising approach to understand basic biological mechanisms is to combine gene expression profiles and classical clinical parameters. However, the computation of a correlation coefficient between high-dimensional data and such parameters is not covered by traditional statistical methods. METHODS: We propose a novel index, the Normalized Tree Index (NTI), to compute a correlation coefficient between the clustering result of high-dimensional microarray data and nominal clinical parameters. The NTI detects correlations between hierarchically clustered microarray data and nominal clinical parameters (labels) and gives a measurement of significance in terms of an empiric p-value of the identified correlations. Therefore, the microarray data is clustered by hierarchical agglomerative clustering using standard settings. In a second step, the computed cluster tree is evaluated. For each label, a NTI is computed measuring the correlation between that label and the clustered microarray data. RESULTS: The NTI successfully identifies correlated clinical parameters at different levels of significance when applied on two real-world microarray breast cancer data sets. Some of the identified highly correlated labels confirm the actual state of knowledge whereas others help to identify new risk factors and provide a good basis to formulate new hypothesis. CONCLUSIONS: The NTI is a valuable tool in the domain of biomedical data analysis. It allows the identification of correlations between high-dimensional data and nominal labels, while at the same time a p-value measures the level of significance of the detected correlations

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Publications at Bielefeld University

Representing and querying disease networks using graph databases

Author: Auffray C.
Lysenko A.
Mazein A.
Rawlings C. J.
Roznovat I. A.
Saqi M.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

BACKGROUND: Systems biology experiments generate large volumes of data of multiple modalities and this information presents a challenge for integration due to a mix of complexity together with rich semantics. Here, we describe how graph databases provide a powerful framework for storage, querying and envisioning of biological data. RESULTS: We show how graph databases are well suited for the representation of biological information, which is typically highly connected, semi-structured and unpredictable. We outline an application case that uses the Neo4j graph database for building and querying a prototype network to provide biological context to asthma related genes. CONCLUSIONS: Our study suggests that graph databases provide a flexible solution for the integration of multiple types of biological data and facilitate exploratory data mining to support hypothesis generation. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13040-016-0102-8) contains supplementary material, which is available to authorized users

Springer - Publisher Connector

PubMed Central

Rothamsted Repository

Testing Multiple Hypotheses through IMP weighted FDR Based on a Genetic Functional Network with Application to a New Zebrafish Transcriptome Study

Author: Greene Casey S
Gui Jiang
Kim Carol
Moore Jason H
Sullivan Con
Taylor Walter
Publication venue: Dartmouth Digital Commons
Publication date: 01/01/2015
Field of study

In genome-wide studies, hundreds of thousands of hypothesis tests are performed simultaneously. Bonferroni correction and False Discovery Rate (FDR) can effectively control type I error but often yield a high false negative rate. We aim to develop a more powerful method to detect differentially expressed genes. We present a Weighted False Discovery Rate (WFDR) method that incorporate biological knowledge from genetic networks. We first identify weights using Integrative Multi-species Prediction (IMP) and then apply the weights in WFDR to identify differentially expressed genes through an IMP-WFDR algorithm. We performed a gene expression experiment to identify zebrafish genes that change expression in the presence of arsenic during a systemic Pseudomonas aeruginosa infection. Zebrafish were exposed to arsenic at 10 parts per billion and/or infected with P. aeruginosa. Appropriate controls were included. We then applied IMP-WFDR during the analysis of differentially expressed genes. We compared the mRNA expression for each group and found over 200 differentially expressed genes and several enriched pathways including defense response pathways, arsenic response pathways, and the Notch signaling pathway

Crossref

Springer - Publisher Connector

PubMed Central

Dartmouth Digital Commons (Dartmouth College)

Identifying Gene-Gene Interactions that are Highly Associated with Body Mass Index Using Quantitative Multifactor Dimensionality Reduction (QMDR)

Author: De Rishika
Drenos Fotios
Holzinger Emily R.
Verma Shefali S.
Publication venue: Dartmouth Digital Commons
Publication date: 14/12/2015
Field of study

Despite heritability estimates of 40–70% for obesity, less than 2% of its variation is explained by Body Mass Index (BMI) associated loci that have been identified so far. Epistasis, or gene-gene interactions are a plausible source to explain portions of the missing heritability of BMI. Using genotypic data from 18,686 individuals across five study cohorts – ARIC, CARDIA, FHS, CHS, MESA – we filtered SNPs (Single Nucleotide Polymorphisms) using two parallel approaches. SNPs were filtered either on the strength of their main effects of association with BMI, or on the number of knowledge sources supporting a specific SNP-SNP interaction in the context of BMI. Filtered SNPs were specifically analyzed for interactions that are highly associated with BMI using QMDR (Quantitative Multifactor Dimensionality Reduction). QMDR is a nonparametric, genetic model-free method that detects non-linear interactions associated with a quantitative trait

Dartmouth Digital Commons (Dartmouth College)

Discovery and replication of SNP-SNP interactions for quantitative lipid traits in over 60,000 individuals

Author: Amuzu A
Asselbergs FW
Baumert J
Brautbar A
Burt A
Carrell DS
Charlotte Onland-Moret N
Crosslin DR
Cruickshanks KJ
Dale C
De R
Drenos F
Dudek S
Farrall M
Furlong CE
Gaunt TR
Gilbert-Diamond D
Hall M
Hingorani AD
Holzinger ER
Hovingh GK
Jarvik GP
Keating BJ
Kim DS
Kivimaki M
Kleber ME
Klein BE
Klein R
Koenig W
Kuivaniemi H
Kullo IJ
Kumari M
Lange LA
Lanktree MB
Larson EB
M?rz W
Moore CB
Moore JH
North KE
Pankratz N
Rasmussen-Torvik LJ
Reiner AP
Riess H
Ritchie MD
Sivapalaratnam S
Talmud PJ
Tragante V
Tromp G
van der Schouw YT
van Iperen EPA
Verma SS
Wilson JG
Publication venue: BioMed Central
Publication date: 01/01/2017
Field of study

Background The genetic etiology of human lipid quantitative traits is not fully elucidated, and interactions between variants may play a role. We performed a gene-centric interaction study for four different lipid traits: low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), total cholesterol (TC), and triglycerides (TG). Results Our analysis consisted of a discovery phase using a merged dataset of five different cohorts (n = 12,853 to n = 16,849 depending on lipid phenotype) and a replication phase with ten independent cohorts totaling up to 36,938 additional samples. Filters are often applied before interaction testing to correct for the burden of testing all pairwise interactions. We used two different filters: 1. A filter that tested only single nucleotide polymorphisms (SNPs) with a main effect of p < 0.001 in a previous association study. 2. A filter that only tested interactions identified by Biofilter 2.0. Pairwise models that reached an interaction significance level of p < 0.001 in the discovery dataset were tested for replication. We identified thirteen SNP-SNP models that were significant in more than one replication cohort after accounting for multiple testing. Conclusions These results may reveal novel insights into the genetic etiology of lipid levels. Furthermore, we developed a pipeline to perform a computationally efficient interaction analysis with multi-cohort replication

Directory of Open Access Journals

Carolina Digital Repository

PuSH

Brunel University Research Archive

University of Essex Research Repository

LSHTM Research Online

Heidelberger Dokumentenserver

UCL Discovery

Oxford University Research Archive

Utrecht University Repository

Stellenbosch University SUNScholar Repository

Explore Bristol Research

Machine Learning and Integrative Analysis of Biomedical Big Data.

Author: Choi Howard
Chung Neo Christopher
Mirza Bilal
Ping Peipei
Wang Jie
Wang Wei
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

Multidisciplinary Digital Publishing Institute

Ezid

Directory of Open Access Journals

eScholarship - University of California