Search CORE

159 research outputs found

A Toolbox for Functional Analysis and the Systematic Identification of Diagnostic and Prognostic Gene Expression Signatures Combining Meta-Analysis and Machine Learning

Author: Fuchs Maximilian
Kapsner Lorenz A.
Kunz Meik
Unberath Philipp
Veronesi Giulia
Vey Johannes
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

The identification of biomarker signatures is important for cancer diagnosis and prognosis. However, the detection of clinical reliable signatures is influenced by limited data availability, which may restrict statistical power. Moreover, methods for integration of large sample cohorts and signature identification are limited. We present a step-by-step computational protocol for functional gene expression analysis and the identification of diagnostic and prognostic signatures by combining meta-analysis with machine learning and survival analysis. The novelty of the toolbox lies in its all-in-one functionality, generic design, and modularity. It is exemplified for lung cancer, including a comprehensive evaluation using different validation strategies. However, the protocol is not restricted to specific disease types and can therefore be used by a broad community. The accompanying R package vignette runs in ~1 h and describes the workflow in detail for use by researchers with limited bioinformatics training

Online-Publikations-Server der Universität Würzburg

A Qualitative Modeling Approach for Whole Genome Prediction Using High-Throughput Toxicogenomics Data and Pathway-Based Validation

Author: Barbara A. Wetmore
Bethany B. Parks
Briana Foley
Kamel Mansouri
Melvin E. Andersen
Michael B. Black
Patrick D. McMullen
Rebecca A. Clewell
Saad Haider
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2018
Field of study

Efficient high-throughput transcriptomics (HTT) tools promise inexpensive, rapid assessment of possible biological consequences of human and environmental exposures to tens of thousands of chemicals in commerce. HTT systems have used relatively small sets of gene expression measurements coupled with mathematical prediction methods to estimate genome-wide gene expression and are often trained and validated using pharmaceutical compounds. It is unclear whether these training sets are suitable for general toxicity testing applications and the more diverse chemical space represented by commercial chemicals and environmental contaminants. In this work, we built predictive computational models that inferred whole genome transcriptional profiles from a smaller sample of surrogate genes. The model was trained and validated using a large scale toxicogenomics database with gene expression data from exposure to heterogeneous chemicals from a wide range of classes (the Open TG-GATEs data base). The method of predictor selection was designed to allow high fidelity gene prediction from any pre-existing gene expression data set, regardless of animal species or data measurement platform. Predictive qualitative models were developed with this TG-GATES data that contained gene expression data of human primary hepatocytes with over 941 samples covering 158 compounds. A sequential forward search-based greedy algorithm, combining different fitting approaches and machine learning techniques, was used to find an optimal set of surrogate genes that predicted differential expression changes of the remaining genome. We then used pathway enrichment of up-regulated and down-regulated genes to assess the ability of a limited gene set to determine relevant patterns of tissue response. In addition, we compared prediction performance using the surrogate genes found from our greedy algorithm (referred to as the SV2000) with the landmark genes provided by existing technologies such as L1000 (Genometry) and S1500 (Tox21), finding better predictive performance for the SV2000. The ability of these predictive algorithms to predict pathway level responses is a positive step toward incorporating mode of action (MOA) analysis into the high throughput prioritization and testing of the large number of chemicals in need of safety evaluation

Directory of Open Access Journals

Frontiers - Publisher Connector

Design and implementation of the international genetics and translational research in transplantation network

Author: Akdere Abdullah
Al-Ali Amein K.
Al-Ali Rudaynah
Al-Mueilo Samir
Al-Muhanna Fahad A.
Al-Rubaish Abdullah M.
Alexander Stephen I.
Alonso-Pulpon Luis A.
Alzahrani Alhusain J.
Amaral Sandra
Aplenc Richard
Asselbergs Folkert W.
Baan Carla C.
Bakker Stephan J. L.
Balasubramanian Suganthi
Birdwell Kelly A.
Byrne Edna M.
Cao Hongzhi
Carlquist John
Chandrupatla Hareesh R.
Chang Bao-Li
Christie Jason D.
Colasacco Abigail
Cole Brian S.
Comez-Bueno Manuel
Conlon Peter J.
de Bakker Paul I. W.
de Borst Martin H.
de Jonge Nicolaas
De Vlaminck Iwijn
de Weger Roel A.
Fu Yao
Fuentes Maria Hernandez
Gao Hui
Garcia-Pavia Pablo
Garifallou James
Gerstein Mark B.
Goldfarb Samuel
Gonzalez Ana
Guan Weihua
Guettouche Toumy
Hakonarson Hakon
Hao Ke
Heeger Peter
Himes Aubree
Holmes Michael V.
Horne Benjamin
Hou Cuiping
Israni Ajay K.
Jacobson Pamala A.
Jaramillo Natalia
Kamoun Malek
Karczewski Konrad J.
Keating Brendan J.
Kfoury Abdallah G.
Khush Kiran
Klintmalm Goran B.
Kluin Jolanda
Kohli Utkarsh
Lankree Matthew B.
Lee Takesha
Lek Monkol
Levine John E.
Li Yun R.
Lloyd Kelsey M.
Lord Graham M.
MacArthur DanielG.
Manintveld Olivier C.
McGinn Daniel
Menon Madhav
Michaud Zach
Miller David
Miller Michael B.
Mohebnasab Maede
Monos Dimitri S.
Moore Jason H.
Mukhtar Eyas
Murphy Barbara
Naesens Maarten
Nair Nikhil
O'Leary Jacqueline G.
Oberbauer Rainer
Oetting William S.
Olthoff Kim M.
Onengut-Gumuscu Suna
Otten Henny G.
Palmer Scott
Pawashe Mayur
Pereira Alexandre C.
Perez Leat
Petersdorf Effie W.
Phillips Randy
Piening Brian D.
Potena Luciano
Ragoussis Jiannis
Rand Elizabeth
Reindl-Schwaighofer Roman
Rich Stephen S.
Ritchie Marylyn D.
Rossano Joseph W.
Saw Chee L.
Schadt Eric E.
Schladt David
Segovia Javier
Shaked Abraham
Shaked Oren
Shaw Mary
Snieder Harold
Snyder Michael
Stahl Eli
Steel Laura
Stranger Barbara E.
Strehl Calvin
Suthanthiran Manikkam
Tedesco Helio
Tragante Vinicius
van de Graaf Ed A.
van Setten Jessica
Verma Shefali S.
Vilches Carlos
Wang Jun
Wijmenga Cisca
Wong Chanel
Wu Baolin
Zhang Weijia
Zhang Zhongyang
Zuckermann Andreas
Publication venue: 'Ovid Technologies (Wolters Kluwer Health)'
Publication date: 01/01/2015
Field of study

Copenhagen University Research Information System

Concept and design of a genome-wide association genotyping array tailored for transplantation-specific studies

Author: A Cortes
A Gratwohl
Abdullah Akdere
Abdullah M. Al-Rubaish
Abhinav Gangasani
Abigail Colasacco
Abraham Shaked
Ajay K. Israni
Alhusain J. Alzahrani
AM Reeves-Daniel
Amein K. Al-Ali
Ana Gonzalez
Andrew Pasquier
AS Goldfarb-Rumyantzev
Aubree Himes
B Howie
B Kaplan
B Krüger
B Marder
Baoli Chang
Baolin Wu
Barbara Murphy
BF Voight
BJ Keating
Brendan J. Keating
Chanel Wong
Cisca Wijmenga
CM Burton
Cuiping Hou
D Welter
DA Peiffer
Daniel G. MacArthur
Daniel McGinn
David Schladt
DG MacArthur
Dimitri S. Monos
Elena Carrigan
Eyas Mukhtar
Fahad A. Al-Muhanna
Folkert W. Asselbergs
GR Abecasis
GR Abecasis
Hakon Hakonarson
Hareesh Chandrupatla
Hongzhi Cao
Hui Gao
International HapMap Consortium
J Marchini
J Stehlik
JA McCaughan
Jacob van Houten
James Garifallou
James Snyder
JC Tan
Jessica van Setten
JM Venstrom
KA Birdwell
Kelly A. Birdwell
Kelly Thomas
Kelsey M. Llyod
Kim M. Olthoff
Konrad J. Karczewski
LA Hindorff
Laura Steel
M Hewett
M Hornum
Maede Mohebnasab
Malek Kamoun
Marylyn D. Ritchie
Matthew B. Lanktree
MI McCarthy
Michael B. Miller
Michael V. Holmes
ML Vampa
Monkol Lek
MR Charlton
N Pallet
Nikhil Nair
P Nickerson
PA Jacobson
PA Jacobson
PA Jacobson
Pablo Garcia-Pavia
Pamala A. Jacobson
Paul I W de Bakker
PIW Bakker de
R Horton
Randy Phillips
Reina Yu
RP O’Brien
S Purcell
SA McCarroll
Samir Al-Mueilo
Shefali S. Verma
SR Browning
Suganthi Balasubramanian
Takesha Lee
Teresa Webster
The Genome of the Netherlands Consortium
Tiancheng Wang
TJ Hoffmann
TK Sigdel
Toumy Guettouche
Vinicius Tragante
Weihua Guan
William S. Oetting
WS Oetting
X Jia
X Zuo
Yontao Lu
Yun R. Li
Zach Michaud
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Recommended from our members

A Single-Cell Immune Map of Normal and Cancerous Breast Reveals an Expansion of Phenotypic States Driven by the Tumor Microenvironment

Author: Carr Ambrose James
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2018
Field of study

Knowledge of the phenotypic states of immune cells in the tumor microenvironment is essential to understand immunological mechanisms of cancer progression, responses to cancer immunotherapy, and the development of novel rational treatments. Yet, this knowledge is opaque to traditional bulk sequencing methods, and novel single-cell RNA sequencing (scRNA-seq) methods which could potentially address these questions introduce complex patterns of error into data that are poorly characterized. This dissertation describes a computational framework, SEQC, built to facilitate rapid and agile analysis of scRNA-seq approaches that utilize molecular barcodes. It combines SEQC with a clustering and normalization method, BISCUIT, and approaches to examine phenotypic diversity and gene variation. These methods are applied to address the unique computational challenges inherent to analysis of single-cell RNA-seq data derived from multiple patients with diverse phenotypes. This dissertation describes an experiment comprising scRNA-seq of over 47,000 immune cells collected from primary breast carcinomas, matched normal breast tissue, peripheral blood, and using these computational approaches. This atlas revealed significant similarity between normal and tumor tissue resident immune cells. However, it also describes continuous tumor-specific phenotypic expansions driven by distinct environmental cues. These results argue against discrete activation states in T cells and the polarization model of macrophage activation in cancer, and have important implications for characterizing tumor-infiltrating immune cells

Columbia University Academic Commons

Statistical Methods for Networks with Node Covariates

Author: Liu Yumu
Publication venue
Publication date: 01/01/2020
Field of study

Network data, which represent relations or interactions between individual entities, together with nodal covariates information, arise in many scientific and engineering fields such as biology and social science. This dissertation focuses on developing statistical models and theory that utilize information from both the network structure and node covariates to improve statistical learning tasks, such as community detection and missing value imputation. The first project studies the problem of community detection for degree-heterogeneous networks with covariates, where we aim to cluster the nodes into groups that share similar patterns in link connectivity and/or covariates distribution. We consider incorporating node covariates via a flexible degree-corrected block model by allowing the community memberships to depend on node covariates, while the link probabilities are determined by both node community memberships and degree parameters. We develop two algorithms, one using the variational inference and the other based on the pseudo-likelihood for estimating the proposed model. Simulation studies indicate that the proposed model can obtain better community detection results compared to methods that only utilize the network information. Further, we show that under mild conditions, the community memberships and the covariate parameters can be estimated consistently. The second project considers the problem of missing value imputation when individuals are linked through a network. We assume the edges in the network are related with the distances in the covariates of the individuals through a latent space network model. We propose an iterative imputation algorithm that is flexible and utilizes both the correlation among node variables and the connectivity between observations given by the network. We relate the proposed method to a Bayesian model and discuss the convergence of the imputation distribution when the specified conditional models for imputation are compatible with the true underlying model of the covariates. We also use simulation studies and a data example to illustrate empirically that the imputation accuracy can be improved by incorporating network information. The final contribution of this dissertation is on incorporating covariates under the edge exchangeable framework. Edge exchangeable models have attractive theoretical and practical properties which make them appropriate for modeling many sparse real-world interaction networks constructed through edge sampling mechanisms. However, as far as we know, there is no edge exchangeable network model that allows for node covariates. In the third project, we propose a model that incorporates node covariates under the edge exchangeable model framework and show that it enjoys properties such as sparsity, and partial exchangeability. We further develop a maximum likelihood estimation method to estimate the model parameters and demonstrate its performance through both simulation studies and a data example.PHDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163165/1/liuyumu_1.pd

Deep Blue Documents at the University of Michigan

Analysis and Visualization of Local Phylogenetic Structure within Species

Author: Wang Jeremy
Publication venue: University of North Carolina at Chapel Hill
Publication date: 01/01/2013
Field of study

While it is interesting to examine the evolutionary history and phylogenetic relationship between species, for example, in a sort of tree of life, there is also a great deal to be learned from examining population structure and relationships within species. A careful description of phylogenetic relationships within species provides insights into causes of phenotypic variation, including disease susceptibility. The better we are able to understand the patterns of genotypic variation within species, the better these populations may be used as models to identify causative variants and possible therapies, for example through targeted genome-wide association studies (GWAS). My thesis describes a model of local phylogenetic structure, how it can be effectively derived under various circumstances, and useful applications and visualizations of this model to aid genetic studies. I introduce a method for discovering phylogenetic structure among individuals of a population by partitioning the genome into a minimal set of intervals within which there is no evidence of recombination. I describe two extensions of this basic method. The first allows it to be applied to heterozygous, in addition to homozygous, genotypes and the second makes it more robust to errors in the source genotypes. I demonstrate the predictive power of my local phylogeny model using a novel method for genome-wide genotype imputation. This imputation method achieves very high accuracy - on the order of the accuracy rate in the sequencing technology - by imputing genotypes in regions of shared inheritance based on my local phylogenies. Comparative genomic analysis within species can be greatly aided by appropriate visualization and analysis tools. I developed a framework for web-based visualization and analysis of multiple individuals within a species, with my model of local phylogeny providing the underlying structure. I will describe the utility of these tools and the applications for which they have found widespread use.Doctor of Philosoph

Carolina Digital Repository

ProQuest OAI Repository

Applications of Graph Segmentation Algorithms For Quantitative Genomic Analyses

Author: Gunady Mohamed
Publication venue
Publication date: 01/01/2020
Field of study

There is a growing interest in utilizing graph formulations and graph-based algorithms in different subproblems of genomic analysis. Since graphs provide a natural and efficient representation of sequences of data where some structural relationships are observed within the data, we study some graph applications in quantitative analysis of typical RNA-seq and Whole Genome Sequencing pipelines. Analysis of differential alternative splicing from RNA-seq data is complicated by the fact that many RNA-seq reads map to multiple transcripts, besides, the annotated transcripts are often a small subset of the possible transcripts of a gene. This work describes Yanagi, a tool for segmenting transcriptomes to create a library of maximal L-disjoint segments from a complete transcriptome annotation. That segment library preserves transcriptome substrings and structural relationships between transcripts while eliminating unnecessary sequence duplications. First, we formalize the concept of transcriptome segmentation and propose an efficient algorithm for generating segment libraries. The resulting segment sequences can be used with pseudo-alignment tools to quantify gene expression and alternative splicing at the segment level and provide gene-level visualization of the segments for more interpretability. The notion of transcript segmentation as introduced here and implemented in Yanagi opens the door for the application of lightweight, ultra-fast pseudo-alignment algorithms in a wide variety of RNA-seq analyses. Furthermore, we show how transcriptome quantification can be performed from segment-level statistics. We present an EM algorithm that uses segment counts as features to estimate transcripts relative abundances in a way that maximizes the likelihood of the observed sequenced data. Then we tackle the problem of quantification in an incomplete annotation setting. We propose an assembly-free correction procedure that reduces bias in the estimated abundances of the annotated transcripts caused by the presence of unannotated transcripts in an RNA-seq sample, while avoiding the need to assemble the missing transcripts first. Another use case of our graph segmentation approach is representing population reference genome graphs used in Whole Genome Sequencing (WGS), which can be crucial for some genomic analysis studying highly polymorphic genes like HLA. Usually graph-based aligners are slow and computationally demanding. Using segments empowers any linear aligner with the efficient graph representation of population variations, while avoiding the expensive computational overhead of aligning over graphs. Lastly, we explore the use of Generative Adversarial Networks (GANs) for imputing the sparse and noisy expression data obtained in single cell RNA sequencing (scRNA-seq) experiments. scRNA-seq provides a rich view into the heterogeneity underlying a cell population which is usually lost when performing bulk RNA-seq. However, these datasets are usually noisy and very sparse, and a number of methods have been proposed to impute zeros in these datasets with the goal of improving downstream analysis. In this work, we propose an approach, scGAIN, to impute zero counts of dropout genes in single cell data using Generative Adversarial Networks (GANs) by learning an approximation of the data distribution. The work presented here discusses an approach to adopt GAIN, a GAN model developed to impute data in image data, into the domain of imputing single cell data. Experiments show that scGAIN gives competitive results compared to the state-of-the-art imputation approaches while showing superiority in various aspects in simulation and real data. Imputation by scGAIN successfully recovers the underlying clustering of cell sub-populations, provides sharp estimates around true mean expression, reducing variability in the data, and increases the correspondence with matched bulk RNA-seq experiments

Digital Repository at the University of Maryland

Statistical power analysis for single-cell RNA-sequencing

Author: Vieth Beate
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 20/01/2020
Field of study

RNA-sequencing (RNA-seq) is an established method to quantify levels of gene expression genome-wide. The recent development of single cell RNA sequencing (scRNA-seq) protocols opens up the possibility to systematically characterize cell transcriptomes and their underlying developmental and regulatory mechanisms. Since the first publication on single-cell transcriptomics a decade ago, hundreds of scRNA-seq datasets from a variety of sources have been released, profiling gene expression of sorted cells, tumors, whole dissociated organs and even complete organisms. Currently, it is also the main tool to systematically characterize human cells within the Human Cell Atlas Project. Given its wide applicability and increasing popularity, many experimental protocols and computational analysis approaches exist for scRNA-seq. However, the technology remains experimentally and computationally challenging. Firstly, single cells contain only minute mRNA amounts that need to be reliably captured and amplified for accurate quantification by sequencing. Importantly, the Polymerase Chain Reaction (PCR) is commonly used for amplification which might introduce biases and increase technical variation. Secondly, once the sequencing results are obtained, finding the best computational processing pipeline can be a struggle. A number of comparison studies have already been conducted - esp. for bulk RNA-seq - but usually they deal only with one aspect of the workflow. Furthermore, in how far the conclusions and recommendations of these studies can be transferred to scRNA-seq is unknown. Related to the processing of RNA-sequencing, we investigate the effect of PCR amplification on differential expression analysis. We find that computational removal of duplicates has either a negligible or a negative impact on specificity and sensitivity of differential expression analysis, and we therefore recommend not to remove read duplicates by mapping position. In contrast, if duplicates are identified using unique molecular identifiers (UMIs) tagging RNA molecules, both specificity and sensitivity improve. The first integral step of any scRNA-seq experiment is the preparation of sequencing libraries from the cells. We conducted an independent benchmarking study of popular library preparation protocols in terms of detection sensitivity, accuracy and precision using the same mouse embryonic stem cells and exogenous mRNA spike-ins. We recapitulate our previous finding that technical variance is markedly decreased when using UMIs to remove duplicates. In order to assign a monetary value to the detected amounts of technical variance, we developed a simulation framework, that enabled us to compare the power to detect differentially expressed genes across the scRNA-seq library preparation protocols. Our experiences during this comparison study led to the development of the sequencing data processing in zUMIs and the simulation framework and power analysis in powsimR. zUMIs is a pipeline for processing scRNA-seq data with flexible choices regarding UMI and cell barcode design. In addition, we showed with powsimR simulations that the inclusion of intronic reads for gene expression quantification increases the power to detect DE genes and added it as a unique feature to zUMIs. In powsimR, we present our simulation framework extending choices concerning data analysis, enabling researchers to assess experimental design and analysis plans of RNA-seq in terms of statistical power. Lastly, we conducted a systematic evaluation of scRNA-seq experimental and analytical pipelines. We found that choices made concerning normalisation and library preparation protocols have the biggest impact on the validity of scRNA-seq DE analysis. Choosing a good scRNA-seq pipeline can have the same impact on detecting a biological signal as quadrupling the cell sample size. Taken together, we have established and applied a simulation framework that allowed us to benchmark experimental and computational scRNA-seq protocols and hence inform the experimental design and method choices of this important technology

Digitale Hochschulschriften der LMU