Search CORE

13 research outputs found

Identification of functional elements and regulatory circuits by Drosophila modENCODE

Author: Ahmad Kami
Alekseyenko Artyom
Andrews Justen
Arshinoff Bradley
Artieri Carlo
Ay Ferhat
Berezikov Eugene
Berger Bonnie
Booth Benjamin
Brenner Steven
Brent Michael R
Bristow Christopher
Brooks Angela
Brown Christopher
Candeias Rogerio
Carlson Joseph
Carr Adrian
Celniker Susan
Cherbas Lucy
Cherbas Peter
Dai Qi
Davis Carrie
Di Stefano Luisa
Duff Michael
Eaton Matthew
Elgin Sarah C.R.
Ernst Jason
Feingold Elise
Feng Xin
Gingeras Thomas
Good Peter
Gorchakov Andrey
Graveley Brenton
Grossman Robert
Gu Tingting
Guyer Mark
Henikoff Jorja
Henikoff Steven
Hoskins Roger
Jungreis Irwin
Kapranov Philipp
Karpen Gary
Kaufman Thomas
Kellis Manolis
Kent William
Kharchenko Peter
Kheradpour Pouya
Kuroda Mitzi
Lai Eric
Landolin Jane
Lewis Suzanna
Li Renhua
Lin Michael
Lowdon Rebecca
Ma Lijia
MacAlpine David
MacAlpine Heather
Malone John
Marbach Daniel
Meyer Patrick
Micklem Gos
Minoda Aki
Negre Nicolas
Nordman Jared
Okamura Katsutomo
Oliver Brian
Orr-Weaver Terry
Park Peter
Perrimon Norbert
Perry Marc
Pirrotta Vincenzo
Posakony James
Powell Sara
Ren Bing
Riddle Nicole
Robine Nicolas
Roy Sushmita
Russell Steven
Sakai Akiko
Samsonova Anastasia
Sandler Jeremy
Schwartz Yuri
Sealfon Rachel
Sher Noa
Spokony Rebecca
Stein Lincoln
Sturgill David
Tolstorukov Michael
van Baren Marijke
Wan Kenneth
Washietl Stefan
Washington Nicole
White Kevin
Will Sebastian
Yang Li
Yu Charles
Publication venue: Washington University Open Scholarship
Publication date: 01/12/2010
Field of study

To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- and tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation

Washington University St. Louis: Open Scholarship

ENCODE-like study using PacBio sequencing

Author: David Catchside (521627)
Elizabeth Tseng (521137)
Jane Landolin (521135)
Jane Yeadon (521624)
Joan Wilson (521626)
Kristi Kim (521625)
Susana Wang (521623)
Publication venue
Publication date
Field of study

<p>Integrative functional biology of a fungus: Using PacBio SMRT sequencing to interrogate the genome, epigenome, and transcriptome of Neurospora crassa</p

FigShare

Sequence features that drive human promoter function and tissue specificity

Author: Aldred Shelly F.
Johnson David S.
Landolin Jane M.
Medina Catherine
Myers Richard M.
Shulha Hennady
Trinklein Nathan D.
Weng Zhiping
Publication venue: Cold Spring Harbor Laboratory Press
Publication date: 01/07/2010
Field of study

Promoters are important regulatory elements that contain the necessary sequence features for cells to initiate transcription. To functionally characterize a large set of human promoters, we measured the transcriptional activities of 4575 putative promoters across eight cell lines using transient transfection reporter assays. In parallel, we measured gene expression in the same cell lines and observed a significant correlation between promoter activity and endogenous gene expression (r = 0.43). As transient transfection assays directly measure the promoting effect of a defined fragment of DNA sequence, decoupled from epigenetic, chromatin, or long-range regulatory effects, we sought to predict whether a promoter was active using sequence features alone. CG dinucleotide content was highly predictive of ubiquitous promoter activity, necessitating the separation of promoters into two groups: high CG promoters, mostly ubiquitously active, and low CG promoters, mostly cell line–specific. Computational models trained on the binding potential of transcriptional factor (TF) binding motifs could predict promoter activities in both high and low CG groups: average area under the receiver operating characteristic curve (AUC) of the models was 91% and exceeded the AUC of CG content by an average of 23%. Known relationships, for example, between HNF4A and hepatocytes, were recapitulated in the corresponding cell lines, in this case the liver-derived cell line HepG2. Half of the associations between tissue-specific TFs and cell line–specific promoters were new. Our study underscores the importance of collecting functional information from complementary assays and conditions to understand biology in a systematic framework

Crossref

PubMed Central

eScholarship@UMMS

Regulation of alternative splicing in Drosophila

Author: Angela N. Brooks
Benjamin W. Booth
Brenton R. Graveley
Gemma May
Jane Landolin
Jeremy Sandler
Ken Wan
Li Yang
Michael O. Duff
Mohan Bolisetty
Steven E. Brenner
Susan E. Celniker
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 01/11/2015
Field of study

Brooks et al. 2 Alternative splicing is regulated by RNA binding proteins (RBPs) that recognize pre-mRNA sequence elements and activate or repress adjacent exons. Here, we used RNA interference and RNA-seq to identify splicing events regulated by 56 Drosophila proteins, some previously unknown to regulate splicing. Nearly all proteins affected alternative first exons suggesting that RBPs play important roles in first exon choice. Half of the splicing events were regulated by multiple proteins, demonstrating extensive combinatorial regulation. We observed that SR and hnRNP proteins tend to act coordinately with each other, not antagonistically. We also identified a cross-regulatory network where splicing regulators affected the splicing of pre-mRNAs encoding other splicing regulators. This large-scale study substantially enhances our understanding of recent models of splicing regulation and provides a resource of thousands of exons that are regulated by 56 diverse RBPs. Cold Spring Harbor Laboratory Press on September 2, 2015- Published by genome.cshlp.orgDownloaded from Brooks et al

CiteSeerX

Crossref

PubMed Central

eScholarship - University of California

Regulation of alternative splicing in Drosophila by 56 RNA binding proteins.

Author: Bolisetty Mohan
Booth Benjamin W
Brenner Steven E
Brooks Angela N
Celniker Susan E
Duff Michael O
Graveley Brenton R
Landolin Jane
May Gemma
Sandler Jeremy
Wan Ken
Yang Li
Publication venue: eScholarship, University of California
Publication date: 01/11/2015
Field of study

Alternative splicing is regulated by RNA binding proteins (RBPs) that recognize pre-mRNA sequence elements and activate or repress adjacent exons. Here, we used RNA interference and RNA-seq to identify splicing events regulated by 56 Drosophila proteins, some previously unknown to regulate splicing. Nearly all proteins affected alternative first exons, suggesting that RBPs play important roles in first exon choice. Half of the splicing events were regulated by multiple proteins, demonstrating extensive combinatorial regulation. We observed that SR and hnRNP proteins tend to act coordinately with each other, not antagonistically. We also identified a cross-regulatory network where splicing regulators affected the splicing of pre-mRNAs encoding other splicing regulators. This large-scale study substantially enhances our understanding of recent models of splicing regulation and provides a resource of thousands of exons that are regulated by 56 diverse RBPs

CiteSeerX

PubMed Central

eScholarship - University of California

Long-read, whole-genome shotgun sequence data for five model organisms

Author: Babayan Primo
Bergman Casey M
Catcheside David E A
Celniker Susan E
Chin Chen-Shan
Fisher William W
Kim Kristi E
Landolin Jane M
Li Joachim
Peluso Paul
Phillippy Adam M
Rank David R
Rapicavoli Nicole A
Yeadon P Jane
Yu Charles
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 23/10/2014
Field of study

Single molecule, real-time (SMRT) sequencing from Pacific Biosciences is increasingly used in many areas of biological research including de novo genome assembly, structural-variant identification, haplotype phasing, mRNA isoform discovery, and base-modification analyses. High-quality, public datasets of SMRT sequences can spur development of analytic tools that can accommodate unique characteristics of SMRT data (long read lengths, lack of GC or amplification bias, and a random error profile leading to high consensus accuracy). In this paper, we describe eight high-coverage SMRT sequence datasets from five organisms (Escherichia coli, Saccharomyces cerevisiae, Neurospora crassa, Arabidopsis thaliana, and Drosophila melanogaster) that have been publicly released to the general scientific community (NCBI Sequence Read Archive ID SRP040522). Data were generated using two sequencing chemistries (P4C2 and P5C3) on the PacBio RS II instrument. The datasets reported here can be used without restriction by the research community to generate whole-genome assemblies, test new algorithms, investigate genome structure and evolution, and identify base modifications in some of the most widely-studied model systems in biological research

Crossref

PubMed Central

The University of Manchester - Institutional Repository

Recommended from our members

Highly accurate long-read HiFi sequencing data for five complex genomes.

Author: Hardigan Michael A
Hon Ting
Karalius Joseph W
Knapp Steven J
Kudrna David
Landolin Jane M
Mars Kristin
Maurer Nicholas
Peluso Paul
Rank David R
Shapiro Beth
Steiner Cynthia C
Tsai Yu-Chih
Ware Doreen
Young Greg
Publication venue: eScholarship, University of California
Publication date: 01/11/2020
Field of study

The PacBio® HiFi sequencing method yields highly accurate long-read sequencing datasets with read lengths averaging 10-25 kb and accuracies greater than 99.5%. These accurate long reads can be used to improve results for complex applications such as single nucleotide and structural variant detection, genome assembly, assembly of difficult polyploid or highly repetitive genomes, and assembly of metagenomes. Currently, there is a need for sample data sets to both evaluate the benefits of these long accurate reads as well as for development of bioinformatic tools including genome assemblers, variant callers, and haplotyping algorithms. We present deep coverage HiFi datasets for five complex samples including the two inbred model genomes Mus musculus and Zea mays, as well as two complex genomes, octoploid Fragaria × ananassa and the diploid anuran Rana muscosa. Additionally, we release sequence data from a mock metagenome community. The datasets reported here can be used without restriction to develop new algorithms and explore complex genome structure and evolution. Data were generated on the PacBio Sequel II System

eScholarship - University of California

Resolving the complexity of the human genome using single-molecule sequencing.

Author: Antonacci Francesca
Boitano Matthew
Chaisson Mark JP
Dennis Megan Y
Eichler Evan E
Hormozdiari Fereydoun
Huddleston John
Hunkapiller Michael W
Korlach Jonas
Landolin Jane M
Malig Maika
Sandstrom Richard
Stamatoyannopoulos John A
Sudmant Peter H
Surti Urvashi
Publication venue: eScholarship, University of California
Publication date: 10/11/2014
Field of study

The human genome is arguably the most complete mammalian reference assembly, yet more than 160 euchromatic gaps remain and aspects of its structural variation remain poorly understood ten years after its completion. To identify missing sequence and genetic variation, here we sequence and analyse a haploid human genome (CHM1) using single-molecule, real-time DNA sequencing. We close or extend 55% of the remaining interstitial gaps in the human GRCh37 reference genome--78% of which carried long runs of degenerate short tandem repeats, often several kilobases in length, embedded within (G+C)-rich genomic regions. We resolve the complete sequence of 26,079 euchromatic structural variants at the base-pair level, including inversions, complex insertions and long tracts of tandem repeats. Most have not been previously reported, with the greatest increases in sensitivity occurring for events less than 5 kilobases in size. Compared to the human reference, we find a significant insertional bias (3:1) in regions corresponding to complex insertions and long short tandem repeats. Our results suggest a greater complexity of the human genome in the form of variation of longer and more complex repetitive DNA that can now be largely resolved with the application of this longer-read sequencing technology

Crossref

PubMed Central

eScholarship - University of California

Genome-wide analysis of promoter architecture in Drosophila melanogaster

Author: Andrews Justen
Bickel Peter J.
Boley Nathan
Booth Benjamin W.
Brown James B.
Carlson Joseph W.
Carninci Piero
Celniker Susan E.
Graveley Brenton R.
Hoskins Roger A.
Kaufman Thomas C.
Landolin Jane M.
Lassmann Timo
Sandler Jeremy E.
Takahashi Hazuki
Wan Kenneth H.
Yang Li
Yu Charles
Zhang Dayu
Publication venue: Cold Spring Harbor Laboratory Press
Publication date: 20/10/2010
Field of study

Core promoters are critical regions for gene regulation in higher eukaryotes. However, the boundaries of promoter regions, the relative rates of initiation at the transcription start sites (TSSs) distributed within them, and the functional significance of promoter architecture remain poorly understood. We produced a high-resolution map of promoters active in the Drosophila melanogaster embryo by integrating data from three independent and complementary methods: 21 million cap analysis of gene expression (CAGE) tags, 1.2 million RNA ligase mediated rapid amplification of cDNA ends (RLM-RACE) reads, and 50,000 cap-trapped expressed sequence tags (ESTs). We defined 12,454 promoters of 8037 genes. Our analysis indicates that, due to non-promoter-associated RNA background signal, previous studies have likely overestimated the number of promoter-associated CAGE clusters by fivefold. We show that TSS distributions form a complex continuum of shapes, and that promoters active in the embryo and adult have highly similar shapes in 95% of cases. This suggests that these distributions are generally determined by static elements such as local DNA sequence and are not modulated by dynamic signals such as histone modifications. Transcription factor binding motifs are differentially enriched as a function of promoter shape, and peaked promoter shape is correlated with both temporal and spatial regulation of gene expression. Our results contribute to the emerging view that core promoters are functionally diverse and control patterning of gene expression in Drosophila and mammals

Crossref

PubMed Central

UNT Digital Library

The transcriptional diversity of 25 Drosophila cell lines

Author: Andrews Justen
Bell Kim
Brent Michael R.
Carlson Joseph W.
Celniker Susan E.
Cherbas Lucy
Cherbas Peter
Choi Jeong-Hyeon
Davis Carrie A.
Dobin Alexander
Duff Michael O.
Dumais Jacqueline
Eads Brian D.
Ghosh Srinka
Gingeras Thomas R.
Graveley Brenton R.
Hoskins Roger A.
Kapranov Philipp
Kaufman Thomas C.
Landolin Jane M.
Langton Laura
Lin Wei
Perrimon Norbert
Roberts Johnny
Samsonova Anastasia
Tang Haixu
Tenney Aaron E.
van Baren Marijke J.
Willingham Aarron
Yang Li
Zaleski Chris
Zhang Dayu
Zou Yi
Publication venue: Cold Spring Harbor Laboratory Press
Publication date: 15/11/2010
Field of study

Drosophila melanogaster cell lines are important resources for cell biologists. Here, we catalog the expression of exons, genes, and unannotated transcriptional signals for 25 lines. Unannotated transcription is substantial (typically 19% of euchromatic signal). Conservatively, we identify 1405 novel transcribed regions; 684 of these appear to be new exons of neighboring, often distant, genes. Sixty-four percent of genes are expressed detectably in at least one line, but only 21% are detected in all lines. Each cell line expresses, on average, 5885 genes, including a common set of 3109. Expression levels vary over several orders of magnitude. Major signaling pathways are well represented: most differentiation pathways are “off” and survival/growth pathways “on.” Roughly 50% of the genes expressed by each line are not part of the common set, and these show considerable individuality. Thirty-one percent are expressed at a higher level in at least one cell line than in any single developmental stage, suggesting that each line is enriched for genes characteristic of small sets of cells. Most remarkable is that imaginal disc-derived lines can generally be assigned, on the basis of expression, to small territories within developing discs. These mappings reveal unexpected stability of even fine-grained spatial determination. No two cell lines show identical transcription factor expression. We conclude that each line has retained features of an individual founder cell superimposed on a common “cell line“ gene expression pattern

Crossref

Cold Spring Harbor Laboratory Institutional Repository

PubMed Central

UNT Digital Library