Search CORE

7 research outputs found

Reusable, extensible, and modifiable R scripts and Kepler workflows for comprehensive single set ChIP-seq analysis

Author: Mark Bieda
Nathan Cormier
Tyler Kolisnik
Publication venue: Springer Nature
Publication date: 01/01/2016
Field of study

BACKGROUND: There has been an enormous expansion of use of chromatin immunoprecipitation followed by sequencing (ChIP-seq) technologies. Analysis of large-scale ChIP-seq datasets involves a complex series of steps and production of several specialized graphical outputs. A number of systems have emphasized custom development of ChIP-seq pipelines. These systems are primarily based on custom programming of a single, complex pipeline or supply libraries of modules and do not produce the full range of outputs commonly produced for ChIP-seq datasets. It is desirable to have more comprehensive pipelines, in particular ones addressing common metadata tasks, such as pathway analysis, and pipelines producing standard complex graphical outputs. It is advantageous if these are highly modular systems, available as both turnkey pipelines and individual modules, that are easily comprehensible, modifiable and extensible to allow rapid alteration in response to new analysis developments in this growing area. Furthermore, it is advantageous if these pipelines allow data provenance tracking. RESULTS: We present a set of 20 ChIP-seq analysis software modules implemented in the Kepler workflow system; most (18/20) were also implemented as standalone, fully functional R scripts. The set consists of four full turnkey pipelines and 16 component modules. The turnkey pipelines in Kepler allow data provenance tracking. Implementation emphasized use of common R packages and widely-used external tools (e.g., MACS for peak finding), along with custom programming. This software presents comprehensive solutions and easily repurposed code blocks for ChIP-seq analysis and pipeline creation. Tasks include mapping raw reads, peakfinding via MACS, summary statistics, peak location statistics, summary plots centered on the transcription start site (TSS), gene ontology, pathway analysis, and de novo motif finding, among others. CONCLUSIONS: These pipelines range from those performing a single task to those performing full analyses of ChIP-seq data. The pipelines are supplied as both Kepler workflows, which allow data provenance tracking, and, in the majority of cases, as standalone R scripts. These pipelines are designed for ease of modification and repurposing

Springer - Publisher Connector

PubMed Central

Scipedia

Evolutionary Signatures amongst Disease Genes Permit Novel Methods for Gene Prioritization and Construction of Informative Gene-Based Networks

Author: Clark NL
Priedigkeit N
Wolfe N
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2015
Field of study

Genes involved in the same function tend to have similar evolutionary histories, in that their rates of evolution covary over time. This coevolutionary signature, termed Evolutionary Rate Covariation (ERC), is calculated using only gene sequences from a set of closely related species and has demonstrated potential as a computational tool for inferring functional relationships between genes. To further define applications of ERC, we first established that roughly 55% of genetic diseases posses an ERC signature between their contributing genes. At a false discovery rate of 5% we report 40 such diseases including cancers, developmental disorders and mitochondrial diseases. Given these coevolutionary signatures between disease genes, we then assessed ERC's ability to prioritize known disease genes out of a list of unrelated candidates. We found that in the presence of an ERC signature, the true disease gene is effectively prioritized to the top 6% of candidates on average. We then apply this strategy to a melanoma-associated region on chromosome 1 and identify MCL1 as a potential causative gene. Furthermore, to gain global insight into disease mechanisms, we used ERC to predict molecular connections between 310 nominally distinct diseases. The resulting “disease map” network associates several diseases with related pathogenic mechanisms and unveils many novel relationships between clinically distinct diseases, such as between Hirschsprung's disease and melanoma. Taken together, these results demonstrate the utility of molecular evolution as a gene discovery platform and show that evolutionary signatures can be used to build informative gene-based networks

D-Scholarship@Pitt

Reusable, extensible, and modifiable R scripts and Kepler workflows for comprehensive single set ChIP-seq analysis

Author: A Barski
AF Bardet
AR Quinlan
B Langmead
B Ludäscher
C Zang
E Mercier
F Leisch
H Ji
H Li
H Xing
H Yan
I Barozzi
I Kouskoumvekaki
J Goecks
J Wang
J Wang
JD Phillips
KR Blahnik
L Shen
LJ Zhu
M Bieda
M Yu
Mark Bieda
Nathan Cormier
R Gentleman
RD Peng
S Falcon
S John
S Roy
S Wang
S Yoo
T Bailey
T Liu
T Stropp
T Ye
TL Bailey
Tyler Kolisnik
W Luo
W Luo
W Ma
Y Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

What's that gene (or protein)? Online resources for exploring functions of genes, transcripts, and proteins

Author: Alberts B
Alexander J
Altschul SF
Amberger J
Asplund A
Barrett T
Becker KG
Benson DA
Bento AP
Bhagwat M
Blake JA
Boutros M
Bragin E
Bulusu KC
Capaldi AP
Chatr-Aryamontri A
Croft D
Davis AP
de Beer TA
Deribe YL
Dinkel H
Dinkel H
Dorée M
Doug Kellogg
Eisenhaber B
Fernandez JM
Finn RD
Fitch WM
Fleischmann A
Flicek P
Forbes SA
Forsburg SL
Franceschini A
Gaudet P
Geer LY
Geer LY
Gnad F
Good MC
Griss J
Gutmanas A
Hayles J
Hedegaard J
Herraez A
Hibbs MA
Hopkins AL
Hornbeck PV
Horowitz NH
Huang da W
Huang da W
Hung JH
Hunt T
Hunter S
Huntley RP
Hutchins JRA
James R. A. Hutchins
Kanehisa M
Karolchik D
Kersey PJ
Kersey PJ
Kim DU
Kirschner MW
Kornberg A
Kosuge T
Kouskoumvekaki I
Landrum MJ
Lane DP
Lappalainen I
Law V
Lee Y
Lees JG
Letunic I
Liebel U
Lipinski CA
Lotia S
Lu CT
Lu Z
Lütjohann DS
Madej T
Marchler-Bauer A
Mi H
Mi T
Müller HM
NCBI Resource Coordinators
Neumann B
Niedringhaus TP
Nishimura D
Obenauer JC
Ooi HS
Orchard S
Orchard S
Owens J
Ozsolak F
Pakseresht N
Petryszak R
Pruitt KD
Que S
Reardon S
Rose PW
Rosenbloom KR
Rustici G
Saito R
Schomburg I
Schreiber F
Sigrist CJ
Sillitoe I
Smith RN
Smoot ME
Stelzer G
Suzek BE
Sönnichsen B
UniProt Consortium
Villaveces JM
Walsh CT
Walther TC
Wang Y
Wang Y
Wolfsberg TG
Wood V
Yang W
Young NL
Publication venue: 'American Society for Cell Biology (ASCB)'
Publication date
Field of study

Crossref

Microbial ecology of hot desert edaphic systems

Author: Cowan Don A.
Frossard Aline
Gunnigle Eoin
Makhalanyane Thulani P.
Ramond Jean-Baptiste
Valverde Angel
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/03/2015
Field of study

A significant proportion of the Earth's surface is desert or in the process of desertification. The extreme environmental conditions that characterize these areas result in a surface that is essentially barren, with a limited range of higher plants and animals. Microbial communities are probably the dominant drivers of these systems, mediating key ecosystem processes. In this review, we examine the microbial communities of hot desert terrestrial biotopes (including soils, cryptic and refuge niches and plant-root-associated microbes) and the processes that govern their assembly. We also assess the possible effects of global climate change on hot desert microbial communities and the resulting feedback mechanisms. We conclude by discussing current gaps in our understanding of the microbiology of hot deserts and suggest fruitful avenues for future research.South African National Research Foundation, the University of Pretoria and the Genomics Research Institute.http://femsre.oxfordjournals.org2016-03-31hb201

UPSpace at the University of Pretoria

Facilitating the use of large-scale biological data and tools in the era of translational bioinformatics

Author: Aiden
Altman
Ashburner
Bader
Bellucci
Bernt
Corbett
De Keersmaecker
Diez
Ebert
Eng
Feuerer
Garcia
Gehlenborg
Goecks
Hoshida
Huang
Huang
Hui
I. Kouskoumvekaki
Jensen
Jonquet
Kanehisa
Karasavvas
Kohl
Lank
Larrayoz
Martin
May
Mesirov
Monticelli
N. Shublaq
Nepusz
Pennings
Pico
Ragnedda
Reich
Ringwald
Robinson
Ross-Innes
Rosse
S. Brunak
Saito
Sherman
Skov
Smith
Smith
Soemedi
Stropp
Tan
Torocsik
Turcan
Weir
Wu
Wu
Zhu
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/08/2013
Field of study

Crossref

Copenhagen University Research Information System

Online Research Database In Technology

유전체 서열 분석에서 고차 관계의 진화적 기계학습

Author: 이제근
Publication venue: 서울대학교 대학원
Publication date: 01/02/2014
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 협동과정 생물정보학전공, 2014. 2. 장병탁.One of the basic research goals in life science is to understand the complex relationships between biological factors and phenotypes, and to identify the various factors affecting the phenotype. In particular, genomic sequences play a significant role in determining the phenotype, such as gene expression and a susceptibility to disease, so the studies for the fundamental information stored in genome is essential to understanding biological processes. Previous genomic sequence analyses mainly focused on identification of a single associated factor or pairwise relationships with significant effects. Recent development of high-throughput technologies has made it possible to identify the causal factors by carrying out genome-wide analysis. However, it still remains as a challenge to discover higher-order interactions of multiple factors because this involves huge search spaces and computational costs. In this dissertation, we develop effective methods for identifying the higher-order relationships of sequence elements affecting the phenotype, by combining statistical learning with evolutionary computation. The methods are applied to finding the associated combinatorial factors and dysfunctional modules in various genome-wide sequence analysis problems. Firstly, we show statistical learning-based methods to detect co-regulatory sequence motifs and to investigate combinatorial effects of DNA methylation, affecting on downstream gene expression. Next, to examine the sequence datasets with a huge number of attributes on human genome, we apply evolutionary computation approaches. Our methods search the problem feature space based on machine learning techniques using training datasets in evolutionary computation processes and are able to find candidate solution well in computationally expensive optimization problems. The experimental results show that the approaches are useful to find the higher-order relationships associated to disease using genomic and epigenomic datasets. In conclusion, our studies would provide practical methods to analyze complex interactions among sequence elements in genomic/epigenomic studies.Abstract i 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Organization of the dissertation . . . . . . . . . . . . . . . . . . . . . 7 2 Genome biology and computational analysis 9 2.1 Fundamentals of genome biology . . . . . . . . . . . . . . . . . . . . 9 2.1.1 DNA, gene, chromosomes and cell biology . . . . . . . . . . . 9 2.1.2 Gene expression and regulation . . . . . . . . . . . . . . . . . 10 2.1.3 Genomics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1.4 Epigenomics . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 Evolutionary machine learning . . . . . . . . . . . . . . . . . . . . . 13 2.2.1 Machine learning and evolutionary computation . . . . . . . 13 2.2.2 Evolutionary computation in biology . . . . . . . . . . . . . . 13 3 Identifying co-regulatory sequence motifs 16 3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2.1 Investigation of the relationship between regulatory sequence motifs and expression prolfies . . . . . . . . . . . . . . . . . . 18 3.2.2 Preparation of the gene expression datasets . . . . . . . . . . 21 3.2.3 Preparation of the gene sequence datasets . . . . . . . . . . . 22 3.2.4 Measurement of the eect of motif combinations . . . . . . . 23 3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3.1 Identication of the relationship between gene expression and known motifs . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3.2 Identification of cell cycle-related motifs . . . . . . . . . . . . 28 3.3.3 Combinational effects of regulatory motifs . . . . . . . . . . . 30 3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4 Investigation of combinatorial eects of DNA methylation 35 4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.2 Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.2.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.2.2 Proling of DNA methylation patterns . . . . . . . . . . . . . 39 4.2.3 Identifying differentially methylated/expressed genes by information theoretic analysis . . . . . . . . . . . . . . . . . . . . 39 4.2.4 Identifying downregulated genes in each subtype for integrative analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.2.5 Correlation between DNA methylation and gene expression . 41 4.2.6 Combinatorial effects of DNA methylation in various genomic regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.2.7 Analysis of transcription factor binding regions possibly blocked by DNA methylation . . . . . . . . . . . . . . . . . . . . . . . 43 4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.3.1 DNA methylation in 30 ICBP cell lines . . . . . . . . . . . . 44 4.3.2 Information theoretic analysis of phenotype-differentially methylated and expressed genes . . . . . . . . . . . . . . . . . . . . 45 4.3.3 Integrated analysis of DNA methylation and gene expression 47 4.3.4 Investigation of the combinatorial eects of DNA methylation in various regions on downstream gene expression levels . . . 52 4.3.5 Integrative analysis of transcription factors, DNA methylation and gene expression . . . . . . . . . . . . . . . . . . . . . . . 56 4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5 Detecting multiple SNP interaction via evolutionary learning 63 5.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.2 Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . . . 65 5.2.1 Identifying higher-order interaction of SNPs . . . . . . . . . . 65 5.2.2 Algorithm Description . . . . . . . . . . . . . . . . . . . . . . 66 5.2.3 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.3.1 Identifying interaction between features in simulation data . 72 5.3.2 Identifying higher-order SNP interactions in Korean population 74 5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 6 Identifying DNA methylation modules by probabilistic evolution- ary learning 85 6.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 6.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 6.2.1 Evolutionary learning procedure to identify a set of DNA methylation sites associated to disease . . . . . . . . . . . . . . . . 87 6.2.2 Learning dependency graph . . . . . . . . . . . . . . . . . . . 88 6.2.3 Fitness evaluation in population . . . . . . . . . . . . . . . . 90 6.2.4 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 6.3.1 DNA methylation modules associated to breast cancer . . . 92 6.3.2 Modules associated to colorectal cancer using high-throughput sequencing data . . . . . . . . . . . . . . . . . . . . . . . . . . 96 6.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 7 Conclusion 104 Bibliography 106 초록 133Docto

SNU Open Repository and Archive