Search CORE

251 research outputs found

Systematic clustering of transcription start site landscapes

Author: Parker Brian J.
Sandelin Albin
Valen Eivind
Zhao Xiaobei
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 11/12/2015
Field of study

Genome-wide, high-throughput methods for transcription start site (TSS) detection have shown that most promoters have an array of neighboring TSSs where some are used more than others, forming a distribution of initiation propensities. TSS distributions (TSSDs) vary widely between promoters and earlier studies have shown that the TSSDs have biological implications in both regulation and function. However, no systematic study has been made to explore how many types of TSSDs and by extension core promoters exist and to understand which biological features distinguish them. In this study, we developed a new non-parametric dissimilarity measure and clustering approach to explore the similarities and stabilities of clusters of TSSDs. Previous studies have used arbitrary thresholds to arrive at two general classes: broad and sharp. We demonstrated that in addition to the previous broad/sharp dichotomy an additional category of promoters exists. Unlike typical TATA-driven sharp TSSDs where the TSS position can vary a few nucleotides, in this category virtually all TSSs originate from the same genomic position. These promoters lack epigenetic signatures of typical mRNA promoters and a substantial subset of them are mapping upstream of ribosomal protein pseudogenes. We present evidence that these are likely mapping errors, which have confounded earlier analyses, due to the high similarity of ribosomal gene promoters in combination with known G addition bias in the CAGE libraries. Thus, previous two-class separations of promoter based on TSS distributions are motivated, but the ultra-sharp TSS distributions will confound downstream analyses if not removed.This work was supported by a grant from the Novo Nordisk Foundation, http://www.novonordiskfonden.dk/. The European Research Council (http:// erc.europa.eu/) has provided financial support to Dr. Sandelin under the EU 7th Framework Programme (FP7/2007-2013)/ERC grant agreement 204135

The Australian National University

A code for transcription initiation in mammalian genomes

Author: Carninci Piero
Frith Martin C.
Hayashizaki Yoshihide
Krogh Anders
Sandelin Albin
Valen Eivind
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 01/01/2007
Field of study

Genome-wide detection of transcription start sites (TSSs) has revealed that RNA Polymerase II transcription initiates at millions of positions in mammalian genomes. Most core promoters do not have a single TSS, but an array of closely located TSSs with different rates of initiation. As a rule, genes have more than one such core promoter; however, defining the boundaries between core promoters is not trivial. These discoveries prompt a re-evaluation of our models for transcription initiation. We describe a new framework for understanding the organization of transcription initiation. We show that initiation events are clustered on the chromosomes at multiple scales-clusters within clusters-indicating multiple regulatory processes. Within the smallest of such clusters, which can be interpreted as core promoters, the local DNA sequence predicts the relative transcription start usage of each nucleotide with a remarkable 91% accuracy, implying the existence of a DNA code that determines TSS selection. Conversely, the total expression strength of such clusters is only partially determined by the local DNA sequence. Thus, the overall control of transcription can be understood as a combination of large- and small-scale effects; the selection of transcription start sites is largely governed by the local DNA sequence, whereas the transcriptional activity of a locus is regulated at a different level; it is affected by distal features or events such as enhancers and chromatin remodeling

Crossref

Copenhagen University Research Information System

PubMed Central

UQ eSpace (University of Queensland)

A step-by-step guide to analyzing CAGE data using R/Bioconductor

Author: Sandelin Albin
Thodberg Malte
Publication venue: 'F1000 Research Ltd'
Publication date: 18/06/2019
Field of study

Copenhagen University Research Information System

New histone supply regulates replication fork speed and PCNA unloading

Author: Alabert Constance
Feng Yunpeng
Groth Anja
Jasencakova Zuzana
Lees Michael
Lopes Massimo
Mejlvang Jakob
Neelsen Kai J.
Pasero Philippe
Sandelin Albin
Zhao Xiaobei
Publication venue: 'Rockefeller University Press'
Publication date: 30/12/2013
Field of study

International audienc

Crossref

Copenhagen University Research Information System

HAL Descartes

PubMed Central

ZORA

Discovery Research Portal

Portail HAL Um (Université de Montpellier)

HAL: Hyper Article en Ligne

Multivariate Hawkes process models of the occurrence of regulatory elements

Author: Carstensen Lisbeth
Hansen Niels R
Sandelin Albin
Winther Ole
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background A central question in molecular biology is how transcriptional regulatory elements (TREs) act in combination. Recent high-throughput data provide us with the location of multiple regulatory regions for multiple regulators, and thus with the possibility of analyzing the multivariate distribution of the occurrences of these TREs along the genome. Results We present a model of TRE occurrences known as the Hawkes process. We illustrate the use of this model by analyzing two different publically available data sets. We are able to model, in detail, how the occurrence of one TRE is affected by the occurrences of others, and we can test a range of natural hypotheses about the dependencies among the TRE occurrences. In contrast to earlier efforts, pre-processing steps such as clustering or binning are not needed, and we thus retain information about the dependencies among the TREs that is otherwise lost. For each of the two data sets we provide two results: first, a qualitative description of the dependencies among the occurrences of the TREs, and second, quantitative results on the favored or avoided distances between the different TREs. Conclusions The Hawkes process is a novel way of modeling the joint occurrences of multiple TREs along the genome that is capable of providing new insights into dependencies among elements involved in transcriptional regulation. The method is available as an R package from <url>http://www.math.ku.dk/~richard/ppstat/</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Copenhagen University Research Information System

Online Research Database In Technology

Systematic clustering of transcription start site landscapes

Author: Parker Brian J
Sandelin Albin Gustav
Valen Eivind
Zhao Xiaobei
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2011
Field of study

Crossref

Directory of Open Access Journals

Copenhagen University Research Information System

PubMed Central

The Francis Crick Institute

Systematic identification of long noncoding RNAs expressed during zebrafish embryogenesis

Author: Fan Lin
Garber Manuel
Levin Joshua Z.
Lin Michael F.
Pauli Andrea
Regev Aviv
Rinn John L.
Sandelin Albin
Schier Alexander F.
Valen Eivind
Vastenhouw Nadine L.
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 01/10/2011
Field of study

Long noncoding RNAs (lncRNAs) comprise a diverse class of transcripts that structurally resemble mRNAs but do not encode proteins. Recent genome-wide studies in humans and the mouse have annotated lncRNAs expressed in cell lines and adult tissues, but a systematic analysis of lncRNAs expressed during vertebrate embryogenesis has been elusive. To identify lncRNAs with potential functions in vertebrate embryogenesis, we performed a time-series of RNA-seq experiments at eight stages during early zebrafish development. We reconstructed 56,535 high-confidence transcripts in 28,912 loci, recovering the vast majority of expressed RefSeq transcripts while identifying thousands of novel isoforms and expressed loci. We defined a stringent set of 1133 noncoding multi-exonic transcripts expressed during embryogenesis. These include long intergenic ncRNAs (lincRNAs), intronic overlapping lncRNAs, exonic antisense overlapping lncRNAs, and precursors for small RNAs (sRNAs). Zebrafish lncRNAs share many of the characteristics of their mammalian counterparts: relatively short length, low exon number, low expression, and conservation levels comparable to that of introns. Subsets of lncRNAs carry chromatin signatures characteristic of genes with developmental functions. The temporal expression profile of lncRNAs revealed two novel properties: lncRNAs are expressed in narrower time windows than are protein-coding genes and are specifically enriched in early-stage embryos. In addition, several lncRNAs show tissue-specific expression and distinct subcellular localization patterns. Integrative computational analyses associated individual lncRNAs with specific pathways and functions, ranging from cell cycle regulation to morphogenesis. Our study provides the first systematic identification of lncRNAs in a vertebrate embryo and forms the foundation for future genetic, genomic, and evolutionary studies.National Human Genome Research Institute (U.S.) (Grant 1RO1HG005111-01

Copenhagen University Research Information System

CAGEfightR:analysis of 5'-end data using R/Bioconductor

Author: Andersson Robin
Sandelin Albin
Thieffry Axel
Thodberg Malte
Vitting-Seerup Kristoffer
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Copenhagen University Research Information System

JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles.

Author: Arenillas David J
Chen Chih-Yu
Denay Grégoire
Fornes Oriol
Lee Jessica
Lenhard Boris
Mathelier Anthony
Parcy François
Sandelin Albin
Shi Wenqiang
Shyr Casper
Tan Ge
Wasserman Wyeth W.
Worsley-Hunt Rebecca
Zhang Allen W
Publication venue: 'Oxford University Press (OUP)'
Publication date: 03/11/2015
Field of study

International audienceJASPAR (http://jaspar.genereg.net) is an open-access database storing curated, non-redundant transcription factor (TF) binding profiles representing transcription factor binding preferences as position frequency matrices for multiple species in six taxonomic groups. For this 2016 release, we expanded the JASPAR CORE collection with 494 new TF binding profiles (315 in vertebrates, 11 in nematodes, 3 in insects, 1 in fungi and 164 in plants) and updated 59 profiles (58 in vertebrates and 1 in fungi). The introduced profiles represent an 83% expansion and 10% update when compared to the previous release. We updated the structural annotation of the TF DNA binding domains (DBDs) following a published hierarchical structural classification. In addition, we introduced 130 transcription factor flexible models trained on ChIP-seq data for vertebrates, which capture dinucleotide dependencies within TF binding sites. This new JASPAR release is accompanied by a new web tool to infer JASPAR TF binding profiles recognized by a given TF protein sequence. Moreover, we provide the users with a Ruby module complementing the JASPAR API to ease programmatic access and use of the JASPAR collection of profiles. Finally, we provide the JASPAR2016 R/Bioconductor data package with the data of this release

Crossref

Hal - Université Grenoble Alpes

Copenhagen University Research Information System

PubMed Central

Spiral - Imperial College Digital Repository

HAL-CEA

ProdInra

Identification of conserved regulatory elements by comparative genome analysis

Author: Engström Pär
Jareborg Niclas
Lenhard Boris
Mendoza Luis
Sandelin Albin
Wasserman Wyeth W
Publication venue: BioMed Central
Publication date: 01/01/2003
Field of study

BACKGROUND: For genes that have been successfully delineated within the human genome sequence, most regulatory sequences remain to be elucidated. The annotation and interpretation process requires additional data resources and significant improvements in computational methods for the detection of regulatory regions. One approach of growing popularity is based on the preferential conservation of functional sequences over the course of evolution by selective pressure, termed 'phylogenetic footprinting'. Mutations are more likely to be disruptive if they appear in functional sites, resulting in a measurable difference in evolution rates between functional and non-functional genomic segments. RESULTS: We have devised a flexible suite of methods for the identification and visualization of conserved transcription-factor-binding sites. The system reports those putative transcription-factor-binding sites that are both situated in conserved regions and located as pairs of sites in equivalent positions in alignments between two orthologous sequences. An underlying collection of metazoan transcription-factor-binding profiles was assembled to facilitate the study. This approach results in a significant improvement in the detection of transcription-factor-binding sites because of an increased signal-to-noise ratio, as demonstrated with two sets of promoter sequences. The method is implemented as a graphical web application, ConSite, which is at the disposal of the scientific community at . CONCLUSIONS: Phylogenetic footprinting dramatically improves the predictive selectivity of bioinformatic approaches to the analysis of promoter sequences. ConSite delivers unparalleled performance using a novel database of high-quality binding models for metazoan transcription factors. With a dynamic interface, this bioinformatics tool provides broad access to promoter analysis with phylogenetic footprinting

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Swepub