Search CORE

225 research outputs found

Using several pair-wise informant sequences for de novo prediction of alternatively spliced transcripts

Author: Brent Michael R
Flicek Paul
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: As part of the ENCODE Genome Annotation Assessment Project (EGASP), we developed the MARS extension to the Twinscan algorithm. MARS is designed to find human alternatively spliced transcripts that are conserved in only one or a limited number of extant species. MARS is able to use an arbitrary number of informant sequences and predicts a number of alternative transcripts at each gene locus. RESULTS: MARS uses the mouse, rat, dog, opossum, chicken, and frog genome sequences as pairwise informant sources for Twinscan and combines the resulting transcript predictions into genes based on coding (CDS) region overlap. Based on the EGASP assessment, MARS is one of the more accurate dual-genome prediction programs. Compared to the GENCODE annotation, we find that predictive sensitivity increases, while specificity decreases, as more informant species are used. MARS correctly predicts alternatively spliced transcripts for 11 of the 236 multi-exon GENCODE genes that are alternatively spliced in the coding region of their transcripts. For these genes a total of 24 correct transcripts are predicted. CONCLUSION: The MARS algorithm is able to predict alternatively spliced transcripts without the use of expressed sequence information, although the number of loci in which multiple predicted transcripts match multiple alternatively spliced transcripts in the GENCODE annotation is relatively small

Springer - Publisher Connector

PubMed Central

Cohesin-based chromatin interactions enable regulated gene expression within pre-existing architectural compartments

Author: Dekker Job
Faure Andre
Fisher Amanda
Flicek Paul
Giorgetti Luca
Heard Edith
Ing-Simmons Elizabeth
Lajoie Bryan R.
Lenhard Boris
McCord Rachel Patton
Merkenschlager Matthias
Seitan Vlad
Zhan Ye
Publication venue: eScholarship@UMassChan
Publication date: 03/09/2013
Field of study

Chromosome conformation capture approaches have shown that interphase chromatin is partitioned into spatially segregated Mb-sized compartments and sub-Mb-sized topological domains. This compartmentalization is thought to facilitate the matching of genes and regulatory elements, but its precise function and mechanistic basis remain unknown. Cohesin controls chromosome topology to enable DNA repair and chromosome segregation in cycling cells. In addition, cohesin associates with active enhancers and promoters and with CTCF to form long-range interactions important for gene regulation. Although these findings suggest an important role for cohesin in genome organization, this role has not been assessed on a global scale. Unexpectedly, we find that architectural compartments are maintained in non-cycling mouse thymocytes after genetic depletion of cohesin in vivo. Cohesin was however required for specific long-range interactions within compartments where cohesin-regulated genes reside. Cohesin depletion diminished interactions between cohesin-bound sites, while alternative interactions between chromatin features associated with transcriptional activation and repression became more prominent, with corresponding changes in gene expression. Our findings indicate that cohesin-mediated long-range interactions facilitate discrete gene expression states within pre-existing chromosomal compartments

Crossref

PubMed Central

eScholarship@UMMS

King's Research Portal

Gene finding in the chicken genome

Author: Antonarakis Stylianos E
Birney Ewan
Brent Michael R
Bye Jacqueline M
Camara Francisco
Castelo Robert
Eyras Eduardo
Flicek Paul
Guigo Roderic
Huckle Elizabeth J
Parra Genis
Reymond Alexandre
Rogers Jane
Shteynberg David D
Wyss Carine
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: Despite the continuous production of genome sequence for a number of organisms, reliable, comprehensive, and cost effective gene prediction remains problematic. This is particularly true for genomes for which there is not a large collection of known gene sequences, such as the recently published chicken genome. We used the chicken sequence to test comparative and homology-based gene-finding methods followed by experimental validation as an effective genome annotation method. RESULTS: We performed experimental evaluation by RT-PCR of three different computational gene finders, Ensembl, SGP2 and TWINSCAN, applied to the chicken genome. A Venn diagram was computed and each component of it was evaluated. The results showed that de novo comparative methods can identify up to about 700 chicken genes with no previous evidence of expression, and can correctly extend about 40% of homology-based predictions at the 5' end. CONCLUSIONS: De novo comparative gene prediction followed by experimental verification is effective at enhancing the annotation of the newly sequenced genomes provided by standard homology-based methods

Springer - Publisher Connector

Serveur académique lausannois

Directory of Open Access Journals

PubMed Central

Digital Commons@Becker

UPF Digital Repository

Secretaría de Estado de Cultura

Archive ouverte UNIGE

EGASP: the human ENCODE Genome Annotation Assessment Project

Author: Abril Ferrando Josep Francesc, 1970-
Antonarakis Stylianos E.
Ashburner Michael
Bajic Vladimir B.
Birney Ewan
Castelo Robert
Denoeud France
Eyras Eduardo
Flicek Paul
Gingeras Thomas R.
Guigó Serra Roderic
Harrow Jennifer
Hubbard Tim
Lagarde Julien
Lewis Suzanna E.
Reese Martin G.
Reymond Alexandre
Ucla Catherine
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

Background: Non-long terminal repeat (non-LTR) retrotransposons have contributed to shaping the structure and function of genomes. In silico and experimental approaches have been used to identify the non-LTR elements of the urochordate Ciona intestinalis. Knowledge of the types and abundance of non-LTR elements in urochordates is a key step in understanding their contribution to the structure and function of vertebrate genomes. Results: Consensus elements phylogenetically related to the I, LINE1, LINE2, LOA and R2 elements of the 14 eukaryotic non-LTR clades are described from C. intestinalis. The ascidian elements showed conservation of both the reverse transcriptase coding sequence and the overall structural organization seen in each clade. The apurinic/apyrimidinic endonuclease and nucleic-acid-binding domains encoded upstream of the reverse transcriptase, and the RNase H and the restriction enzyme-like endonuclease motifs encoded downstream of the reverse transcriptase were identified in the corresponding Ciona families. Conclusions: The genome of C. intestinalis harbors representatives of at least five clades of non-LTR retrotransposons. The copy number per haploid genome of each element is low, less than 100, far below the values reported for vertebrate counterparts but within the range for protostomes. Genomic and sequence analysis shows that the ascidian non-LTR elements are unmethylated and flanked by genomic segments with a gene density lower than average for the genome. The analysis provides valuable data for understanding the evolution of early chordate genomes and enlarges the view on the distribution of the non-LTR retrotransposons in eukaryotes

CiteSeerX

Cold Spring Harbor Laboratory Institutional Repository

Serveur académique lausannois

PubMed Central

UPF Digital Repository

King's Research Portal

Secretaría de Estado de Cultura

Diposit Digital de la Universitat de Barcelona

Archive ouverte UNIGE

Locus Reference Genomic sequences: an improved basis for describing human DNA variants

Author: Astashyn Alex
Birney Ewan
Brookes Anthony J
Béroud Christophe
Chen Yuan
Cunningham Fiona
Dalgleish Raymond
den Dunnen Johan T
Devereau Andrew
Dobson Glen
Flicek Paul
Larsson Pontus
Lehväslaiho Heikki
Maglott Donna R
McLaren William M
Proctor Glenn
Taschner Peter EM
Tully Raymond E
Vaughan Brendan W
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

As our knowledge of the complexity of gene architecture grows, and we increase our understanding of the subtleties of gene expression, the process of accurately describing disease-causing gene variants has become increasingly problematic. In part, this is due to current reference DNA sequence formats that do not fully meet present needs. Here we present the Locus Reference Genomic (LRG) sequence format, which has been designed for the specific purpose of gene variant reporting. The format builds on the successful National Center for Biotechnology Information (NCBI) RefSeqGene project and provides a single-file record containing a uniquely stable reference DNA sequence along with all relevant transcript and protein sequences essential to the description of gene variants. In principle, LRGs can be created for any organism, not just human. In addition, we recognize the need to respect legacy numbering systems for exons and amino acids and the LRG format takes account of these. We hope that widespread adoption of LRGs - which will be created and maintained by the NCBI and the European Bioinformatics Institute (EBI) - along with consistent use of the Human Genome Variation Society (HGVS)-approved variant nomenclature will reduce errors in the reporting of variants in the literature and improve communication about variants affecting human health. Further information can be found on the LRG web site: http://www.lrg-sequence.org

Crossref

Springer - Publisher Connector

HAL-Inserm

PubMed Central

HAL Descartes

Leiden University Scholary Publications

Leicester Research Archive

A practical, bioinformatic workflow system for large data sets generated by next-generation sequencing

Author: Aaron R. Jex
Altschul
Anja Joachim
Ashburner
Bentley
Bethony
Björnberg
Blaxter
Boag
Bronwyn E. Campbell
Caffrey
Campbell
Cantacessi
Cantacessi
Cantacessi
Cantacessi
Chan
Chang
Cinzia Cantacessi
Clifton
Conesa
Cottee
Cottee
Datu
DeRisi
Doyle
Flicek
Freigofas
Gasser
Golden
Greene
Gupta
Hawdon
Hopkins
Hotez
Hu
Huang
Hunter
Iseli
Jackson
Joachim
Joachim
Keil
Krasky
Letunic
Li
Li
Li
Lipinski
Makedonka Mitreva
Margulies
Matthew J. Nolan
McKay
Metzker
Miller
Miller
Mizuarai
Moreno
Morozova
Moser
Mufson
Mulvenna
Nagaraj
Nagaraj
Neil D. Young
Nikolaou
Nisbet
Olson
Parkinson
Paul W. Sternberg
Pong
Portman
Ranganathan
Ren
Robertson
Robin B. Gasser
Robinson
Ross S. Hall
Sahar Abubucker
Sanger
Sanger
Santos
Shoba Ranganathan
Soderlund
Stathopoulos
Stockdale
Tanaka
Vibranovski
Wang
Williamson
Wilson
Wu
Young
Young
Zhan
Zhong
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2010
Field of study

Transcriptomics (at the level of single cells, tissues and/or whole organisms) underpins many fields of biomedical science, from understanding the basic cellular function in model organisms, to the elucidation of the biological events that govern the development and progression of human diseases, and the exploration of the mechanisms of survival, drug-resistance and virulence of pathogens. Next-generation sequencing (NGS) technologies are contributing to a massive expansion of transcriptomics in all fields and are reducing the cost, time and performance barriers presented by conventional approaches. However, bioinformatic tools for the analysis of the sequence data sets produced by these technologies can be daunting to researchers with limited or no expertise in bioinformatics. Here, we constructed a semi-automated, bioinformatic workflow system, and critically evaluated it for the analysis and annotation of large-scale sequence data sets generated by NGS. We demonstrated its utility for the exploration of differences in the transcriptomes among various stages and both sexes of an economically important parasitic worm (Oesophagostomum dentatum) as well as the prediction and prioritization of essential molecules (including GTPases, protein kinases and phosphatases) as novel drug target candidates. This workflow system provides a practical tool for the assembly, annotation and analysis of NGS data sets, also to researchers with a limited bioinformatic expertise. The custom-written Perl, Python and Unix shell computer scripts used can be readily modified or adapted to suit many different applications. This system is now utilized routinely for the analysis of data sets from pathogens of major socio-economic importance and can, in principle, be applied to transcriptomics data sets from any organism

CiteSeerX

ResearchOnline@JCU

Crossref

ResearchOnline at James Cook University

PubMed Central

Digital Commons@Becker

Caltech Authors

UGD Academic Repository

Macquarie University ResearchOnline

University of Melbourne Institutional Repository

A practical, bioinformatic workflow system for large data sets generated by next-generation sequencing

Author: Aaron R. Jex
Altschul
Anja Joachim
Ashburner
Bentley
Bethony
Björnberg
Blaxter
Boag
Bronwyn E. Campbell
Caffrey
Campbell
Cantacessi
Cantacessi
Cantacessi
Cantacessi
Chan
Chang
Cinzia Cantacessi
Clifton
Conesa
Cottee
Cottee
Datu
DeRisi
Doyle
Flicek
Freigofas
Gasser
Golden
Greene
Gupta
Hawdon
Hopkins
Hotez
Hu
Huang
Hunter
Iseli
Jackson
Joachim
Joachim
Keil
Krasky
Letunic
Li
Li
Li
Lipinski
Makedonka Mitreva
Margulies
Matthew J. Nolan
McKay
Metzker
Miller
Miller
Mizuarai
Moreno
Morozova
Moser
Mufson
Mulvenna
Nagaraj
Nagaraj
Neil D. Young
Nikolaou
Nisbet
Olson
Parkinson
Paul W. Sternberg
Pong
Portman
Ranganathan
Ren
Robertson
Robin B. Gasser
Robinson
Ross S. Hall
Sahar Abubucker
Sanger
Sanger
Santos
Shoba Ranganathan
Soderlund
Stathopoulos
Stockdale
Tanaka
Vibranovski
Wang
Williamson
Wilson
Wu
Young
Young
Zhan
Zhong
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2010
Field of study

CiteSeerX

ResearchOnline@JCU

Crossref

ResearchOnline at James Cook University

PubMed Central

Digital Commons@Becker

Caltech Authors

UGD Academic Repository

Macquarie University ResearchOnline

University of Melbourne Institutional Repository

Ensembl regulation resources

Author: Anne Parker
Bethan Pritchard
Damian Keefe
Dan Sheppard
Daniel R. Zerbino
Daniel Sobral
Emily Perry
Ewan Birney
Herrero
Ian Dunham
Ikhlak Ahmed
Ilias Lavidas
Michael Nuhn
Nathan Johnson
Paul Flicek
Quentin Raffaillac-Desfosses
Rhoda Kinsella
Ridwan Amode
Simon Brent
Stefan Gräf
Steven P. Wilder
Steven Trevanion
Thomas Juetteman
Publication venue: 'Oxford University Press (OUP)'
Publication date: 24/11/2015
Field of study

New experimental techniques in epigenomics allow researchers to assay a diversity of highly dynamic features such as histone marks, DNA modifications or chromatin structure. The study of their fluctuations should provide insights into gene expression regulation, cell differentiation and disease. The Ensembl project collects and maintains the Ensembl regulation data resources on epigenetic marks, transcription factor binding and DNA methylation for human and mouse, as well as microarray probe mappings and annotations for a variety of chordate genomes. From this data, we produce a functional annotation of the regulatory elements along the human and mouse genomes with plans to expand to other species as data becomes available. Starting from well-studied cell lines, we will progressively expand our library of measurements to a greater variety of samples. Ensembl's regulation resources provide a central and easy-to-query repository for reference epigenomes. As with all Ensembl data, it is freely available at http://www.ensembl.org, from the Perl and REST APIs and from the public Ensembl MySQL database server at ensembldb.ensembl.org.Database URL: http://www.ensembl.org.Wellcome Trust grant: (WT098051); National Human Genome Research Institute grants: (U41HG007234, 1U01 HG004695); Biotechnology and Biological Sciences Research Council grant: (BB/L024225/1); European Molecular Biology Laboratory; European Union’s Seventh Framework Programme; European Research Council

Access to Research and Communications Annals

Crossref

PubMed Central

Genome sequence of an Australian kangaroo, Macropus eugenii, provides insight into the evolution of mammalian reproduction and development.

Author: Al Nadaf Shafagh
Beal Kathryn
Belov Katherine
Carone Dawn M
Chen Lei
Chew Keng Yih
Cocks Benjamin G
Cooper Desmond W
Cree Andrew
Davis John
Deakin Janine E
Delbridge Margaret L
Deng Jixin
Dinh Huyen H
Edson Janette
Fairley Susan
Feng Zhi-Ping
Ferguson-Smith Malcolm
Flicek Paul
Forrest Susan M
Fowler Gerald
Frankenberg Stephen R
Fujiyama Asao
Gibbs Richard A
Graves Jennifer AM
Hall Allison
Hazar-Rethinam Mehlika
Heider Thomas
Herrero Javier
Hickford Danielle
Hore Timothy A
Hsu Arthur
Hu Yanqiu
Jhangiani Shalini N
Jing Chyn
Joshi Vandita
Kondo Shinji
Kovar Christie L
Kuczek Elizabeth
Kuroki Yoko
Lansdell Benjamin
Lara Fremiet
Lefèvre Christophe M
Levchenko Tanya
Lewis Lora R
Lindsay James
Liu Yue
Mandiou Ion
McColl Kaighin A
McGrath Annette
Men Artem
Menzies Brandon R
Mohammadi Amir
Morgan Margaret B
Muzny Donna M
Nazareth Lynne
Nicholas Frank W
Nicholas Kevin R
Nishida Yuichiro
O'Hara William
O'Neill Rachel J
Okwuonu Geoffrey O
Papenfuss Anthony T
Pask Andrew J
Patel Hardip R
Pharo Elizabeth A
Renfree Marilyn B
Rens Willem
Ruiz San Juana
Sakaki Yoshiyuki
Santibanez Jireh
Schneider Nanette Y
Searle Stephen MJ
Shaw Geoff
Shen Joshua Y
Short Kirsty R
Siddle Hannah V
Song Xing-Zhi
Speed Terence P
Stephens Amber
Stringer Jessica M
Sugano Sumio
Sundaravadanam Yogi
Suzuki Shunsuke
Suzuki Yutaka
Tatsumoto Shoji
Thomas Daniel
Thornton Rebecca
Toyoda Atsushi
Troon Carmen
Wakefield Matthew J
Wang Chenwei
Wang Jianghui
Waters Paul D
Weinstock George
Williams Sarah
Wilson Peter
Wong Emily SW
Wood David
Worley Kim C
Wu Chen
Yapa Lankesha
Yu Hongshi
Zenger Kyall R
Publication venue: Genome Biol
Publication date: 01/01/2011
Field of study

BACKGROUND: We present the genome sequence of the tammar wallaby, Macropus eugenii, which is a member of the kangaroo family and the first representative of the iconic hopping mammals that symbolize Australia to be sequenced. The tammar has many unusual biological characteristics, including the longest period of embryonic diapause of any mammal, extremely synchronized seasonal breeding and prolonged and sophisticated lactation within a well-defined pouch. Like other marsupials, it gives birth to highly altricial young, and has a small number of very large chromosomes, making it a valuable model for genomics, reproduction and development. RESULTS: The genome has been sequenced to 2 × coverage using Sanger sequencing, enhanced with additional next generation sequencing and the integration of extensive physical and linkage maps to build the genome assembly. We also sequenced the tammar transcriptome across many tissues and developmental time points. Our analyses of these data shed light on mammalian reproduction, development and genome evolution: there is innovation in reproductive and lactational genes, rapid evolution of germ cell genes, and incomplete, locus-specific X inactivation. We also observe novel retrotransposons and a highly rearranged major histocompatibility complex, with many class I genes located outside the complex. Novel microRNAs in the tammar HOX clusters uncover new potential mammalian HOX regulatory elements. CONCLUSIONS: Analyses of these resources enhance our understanding of marsupial gene evolution, identify marsupial-specific conserved non-coding elements and critical genes across a range of biological systems, including reproduction, development and immunity, and provide new insight into marsupial and mammalian biology and genome evolution

ResearchOnline at James Cook University

University of Canberra Research Repository

UNSWorks

ResearchOnline@JCU

Crossref

Springer - Publisher Connector

Adelaide Research & Scholarship

PubMed Central

Queensland University of Technology ePrints Archive

UCL Discovery

Apollo (Cambridge)

University of Melbourne Institutional Repository

Evidence for intron length conservation in a set of mammalian genes associated with embryonic development

Author: A Nott
A Oates
A Vinogradov
APO Aulehla
B Paten
C Castillo-Davis
Cathal Seoighe
D Huang
D Huang
D Larson
D Rearick
EAR Zdobnov
I Letunic
I Swinburne
J Mattick
L Fedorova
M Lynch
M Lynch
P Flicek
Paul K Korir
R Elkon
R Waterhouse
S Boireau
S Lu
T Brend
T Hubbard
T Tange
Y Bessho
Y Takashima
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background We carried out an analysis of intron length conservation across a diverse group of nineteen mammalian species. Motivated by recent research suggesting a role for time delays associated with intron transcription in gene expression oscillations required for early embryonic patterning, we searched for examples of genes that showed the most extreme conservation of total intron content in mammals. Results Gene sets annotated as being involved in pattern specification in the early embryo or containing the homeobox DNA-binding domain, were significantly enriched among genes with highly conserved intron content. We used ancestral sequences reconstructed with probabilistic models that account for insertion and deletion mutations to distinguish insertion and deletion events on lineages leading to human and mouse from their last common ancestor. Using a randomization procedure, we show that genes containing the homeobox domain show less change in intron content than expected, given the number of insertion and deletion events within their introns. Conclusions Our results suggest selection for gene expression precision or the existence of additional development-associated genes for which transcriptional delay is functionally significant.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Access to Research at National University of Ireland, Galway