17 research outputs found
Segmentation and genome annotation algorithms
Segmentation and genome annotation (SAGA) algorithms are widely used to
understand genome activity and gene regulation. These algorithms take as input
epigenomic datasets, such as chromatin immunoprecipitation-sequencing
(ChIP-seq) measurements of histone modifications or transcription factor
binding. They partition the genome and assign a label to each segment such that
positions with the same label exhibit similar patterns of input data. SAGA
algorithms discover categories of activity such as promoters, enhancers, or
parts of genes without prior knowledge of known genomic elements. In this
sense, they generally act in an unsupervised fashion like clustering
algorithms, but with the additional simultaneous function of segmenting the
genome. Here, we review the common methodological framework that underlies
these methods, review variants of and improvements upon this basic framework,
catalogue existing large-scale reference annotations, and discuss the outlook
for future work
Additional file 2 of Choosing panels of genomics assays using submodular optimization
List of all assays used. File is in gzipped, tab-delimited format. Columns correspond to: (1) assay type, (2) cell type, (3) file name of file on original server, and (4) URL of server that the file was downloaded from. (TAB 300 kb
Additional file 1 of Choosing panels of genomics assays using submodular optimization
Additional figures and text. (PDF 606 kb
Segway 2.0 Application Note Datasets
<p>Learned parameters and resulting segmentation corresponding to the analyses shown in the Segway 2.0 application note.</p>
<p>Directory structure:</p>
<p><strong>GMM</strong> (datasets corresponding to the mixture of Gaussians analysis)</p>
<ul>
<li>1-component
<ul>
<li>traindir/
<ul>
<li>log/ (training log likelihood progression)</li>
<li>params/ (learned parameters)</li>
</ul>
</li>
<li>identifydir/
<ul>
<li>segway.bed.gz (segmentation)</li>
</ul>
</li>
</ul>
</li>
<li>3-component
<ul>
<li>traindir/
<ul>
<li>log/ (training log likelihood progression)</li>
<li>params/ (learned parameters)</li>
</ul>
</li>
<li>identifydir/
<ul>
<li>segway.bed.gz (segmentation)</li>
</ul>
</li>
</ul>
</li>
</ul>
<p><strong>minibatch-fixed</strong> (datasets corresponding to the minibatch learning analysis)</p>
<ul>
<li>fixed/
<ul>
<li>traindir/
<ul>
<li>log/ (training and validation log likelihood progression)</li>
<li>params/ (learned parameters)</li>
</ul>
</li>
</ul>
</li>
<li>minibatch/
<ul>
<li>traindir/
<ul>
<li>log/ (training and validation log likelihood progression)</li>
<li>params/ (learned parameters)</li>
</ul>
</li>
</ul>
</li>
</ul>
<p><strong>TSS_prediction</strong> (datasets corresponding to the TSS prediction analysis) (where k=component number=1-5, n=random start number=1-10)</p>
<ul>
<li>outputs_[date]_k/
<ul>
<li>traindir/
<ul>
<li>log/ (training and validation log likelihood progression)</li>
<li>params/ (learned parameters)</li>
</ul>
</li>
<li>identifydir_n/
<ul>
<li>segway.bed.gz (segmentation)</li>
</ul>
</li>
</ul>
</li>
</ul
Segway 2.0 Application Note Datasets
<p>Learned parameters and resulting segmentation corresponding to the analyses shown in the Segway 2.0 application note.</p>
<p>Directory structure:</p>
<p><strong>GMM</strong> (datasets corresponding to the mixture of Gaussians analysis)</p>
<ul>
<li>1-component
<ul>
<li>traindir/
<ul>
<li>log/ (training log likelihood progression)</li>
<li>params/ (learned parameters)</li>
</ul>
</li>
<li>identifydir/
<ul>
<li>segway.bed.gz (segmentation)</li>
</ul>
</li>
</ul>
</li>
<li>3-component
<ul>
<li>traindir/
<ul>
<li>log/ (training log likelihood progression)</li>
<li>params/ (learned parameters)</li>
</ul>
</li>
<li>identifydir/
<ul>
<li>segway.bed.gz (segmentation)</li>
</ul>
</li>
</ul>
</li>
</ul>
<p><strong>minibatch-fixed</strong> (datasets corresponding to the minibatch learning analysis)</p>
<ul>
<li>fixed/
<ul>
<li>traindir/
<ul>
<li>log/ (training and validation log likelihood progression)</li>
<li>params/ (learned parameters)</li>
</ul>
</li>
</ul>
</li>
<li>minibatch/
<ul>
<li>traindir/
<ul>
<li>log/ (training and validation log likelihood progression)</li>
<li>params/ (learned parameters)</li>
</ul>
</li>
</ul>
</li>
</ul>
<p><strong>TSS_prediction</strong> (datasets corresponding to the TSS prediction analysis) (where k=component number=1-5, n=random start number=1-10)</p>
<ul>
<li>outputs_[date]_k/
<ul>
<li>traindir/
<ul>
<li>log/ (training and validation log likelihood progression)</li>
<li>params/ (learned parameters)</li>
</ul>
</li>
<li>identifydir_n/
<ul>
<li>segway.bed.gz (segmentation)</li>
</ul>
</li>
</ul>
</li>
</ul