Search CORE

1,282 research outputs found

ncRNA orthologies in the vertebrate lineage.

Author: Flicek P
Gordon L
Herrero J
Muffato M
Pignatelli M
Vilella AJ
White S
Publication venue
Publication date: 15/03/2016
Field of study

Annotation of orthologous and paralogous genes is necessary for many aspects of evolutionary analysis. Methods to infer these homology relationships have traditionally focused on protein-coding genes and evolutionary models used by these methods normally assume the positions in the protein evolve independently. However, as our appreciation for the roles of non-coding RNA genes has increased, consistently annotated sets of orthologous and paralogous ncRNA genes are increasingly needed. At the same time, methods such as PHASE or RAxML have implemented substitution models that consider pairs of sites to enable proper modelling of the loops and other features of RNA secondary structure. Here, we present a comprehensive analysis pipeline for the automatic detection of orthologues and paralogues for ncRNA genes. We focus on gene families represented in Rfam and for which a specific covariance model is provided. For each family ncRNA genes found in all Ensembl species are aligned using Infernal, and several trees are built using different substitution models. In parallel, a genomic alignment that includes the ncRNA genes and their flanking sequence regions is built with PRANK. This alignment is used to create two additional phylogenetic trees using the neighbour-joining (NJ) and maximum-likelihood (ML) methods. The trees arising from both the ncRNA and genomic alignments are merged using TreeBeST, which reconciles them with the species tree in order to identify speciation and duplication events. The final tree is used to infer the orthologues and paralogues following Fitch's definition. We also determine gene gain and loss events for each family using CAFE. All data are accessible through the Ensembl Comparative Genomics ('Compara') API, on our FTP site and are fully integrated in the Ensembl genome browser, where they can be accessed in a user-friendly manner.Database URL: http://www.ensembl.org

UCL Discovery

PubMed Central

Decoupling of evolutionary changes in transcription factor binding and gene expression in mammals

Author: Flicek Paul
Odom Duncan T.
Schmitt Bianca M.
Stefflova Klara
Thybert David
Wong Emily S.
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 01/02/2015
Field of study

To understand the evolutionary dynamics between transcription factor (TF) binding and gene expression in mammals, we compared transcriptional output and the binding intensities for three tissue-specific TFs in livers from four closely related mouse species. For each transcription factor, TF-dependent genes and the TF binding sites most likely to influence mRNA expression were identified by comparing mRNA expression levels between wild-type and TF knockout mice. Independent evolution was observed genome-wide between the rate of change in TF binding and the rate of change in mRNA expression across taxa, with the exception of a small number of TF-dependent genes. We also found that binding intensities are preferentially conserved near genes whose expression is dependent on the TF, and the conservation is shared among binding peaks in close proximity to each other near the TSS. Expression of TF-dependent genes typically showed an increased sensitivity to changes in binding levels as measured by mRNA abundance. Taken together, these results highlight a significant tolerance to evolutionary changes in TF binding intensity in mammalian transcriptional networks and suggest that some TF-dependent genes may be largely regulated by a single TF across evolution

University of Queensland eSpace

A database and API for variation, dense genotyping and resequencing data

Author: Birney Ewan
Chen Yuan
Cunningham Fiona
Flicek Paul
McLaren William M
Rios Daniel
Stabenau Arne
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Advances in sequencing and genotyping technologies are leading to the widespread availability of multi-species variation data, dense genotype data and large-scale resequencing projects. The 1000 Genomes Project and similar efforts in other species are challenging the methods previously used for storage and manipulation of such data necessitating the redesign of existing genome-wide bioinformatics resources. Results Ensembl has created a database and software library to support data storage, analysis and access to the existing and emerging variation data from large mammalian and vertebrate genomes. These tools scale to thousands of individual genome sequences and are integrated into the Ensembl infrastructure for genome annotation and visualisation. The database and software system is easily expanded to integrate both public and non-public data sources in the context of an Ensembl software installation and is already being used outside of the Ensembl project in a number of database and application environments. Conclusions Ensembl's powerful, flexible and open source infrastructure for the management of variation, genotyping and resequencing data is freely available at <url>http://www.ensembl.org</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Spatial enhancer clustering and regulation of enhancer-proximal genes by cohesin

Author: Carroll T
Dekker J
Faure AJ
Fisher AG
Flicek P
Ing-Simmons E
Lenhard B
Merkenschlager M
Seitan VC
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 01/01/2015
Field of study

In addition to mediating sister chromatid cohesion during the cell cycle, the cohesin complex associates with CTCF and with active gene regulatory elements to form long-range interactions between its binding sites. Genome-wide chromosome conformation capture had shown that cohesin's main role in interphase genome organization is in mediating interactions within architectural chromosome compartments, rather than specifying compartments per se. However, it remains unclear how cohesin-mediated interactions contribute to the regulation of gene expression. We have found that the binding of CTCF and cohesin is highly enriched at enhancers and in particular at enhancer arrays or “super-enhancers” in mouse thymocytes. Using local and global chromosome conformation capture, we demonstrate that enhancer elements associate not just in linear sequence, but also in 3D, and that spatial enhancer clustering is facilitated by cohesin. The conditional deletion of cohesin from noncycling thymocytes preserved enhancer position, H3K27ac, H4K4me1, and enhancer transcription, but weakened interactions between enhancers. Interestingly, ∼50% of deregulated genes reside in the vicinity of enhancer elements, suggesting that cohesin regulates gene expression through spatial clustering of enhancer elements. We propose a model for cohesin-dependent gene regulation in which spatial clustering of enhancer elements acts as a unified mechanism for both enhancer-promoter “connections” and “insulation.

Crossref

PubMed Central

eScholarship@UMMS

Spiral - Imperial College Digital Repository

King's Research Portal

Variant calling on the GRCh38 assembly with the data from phase three of the 1000 Genomes Project

Author: 1000 Genomes Project C.
Clarke L.
Fairley S.
Flicek P.
Lowy-Gallego E.
Ruffier M.
Timmermann B.
Zheng-Bradley X.
Publication venue: 'F1000 Research Ltd'
Publication date: 11/03/2019
Field of study

We present biallelic SNVs called from 2,548 samples across 26 populationsfrom the 1000 Genomes Project, called directly on GRCh38. We believethis will be a useful reference resource for those using GRCh38,representing an improvement over the “lift-overs” of the 1000 GenomesProject data that have been available to date and providing a resourcenecessary for the full adoption of GRCh38 by the community. Here, wedescribe how the call set was created and provide benchmarking datadescribing how our call set compares to that produced by the final phase ofthe 1000 Genomes Project on GRCh37

MPG.PuRe

Recommended from our members

Functional signatures of evolutionarily young CTCF binding sites

Author: Azazi Dhoyazan
Flicek Paul
Mudge Jonathan M.
Odom Duncan T.
Publication venue: BMC Biology
Publication date: 23/09/2020
Field of study

Abstract: Background: The introduction of novel CTCF binding sites in gene regulatory regions in the rodent lineage is partly the effect of transposable element expansion, particularly in the murine lineage. The exact mechanism and functional impact of evolutionarily novel CTCF binding sites are not yet fully understood. We investigated the impact of novel subspecies-specific CTCF binding sites in two Mus genus subspecies, Mus musculus domesticus and Mus musculus castaneus, that diverged 0.5 million years ago. Results: CTCF binding site evolution is influenced by the action of the B2-B4 family of transposable elements independently in both lineages, leading to the proliferation of novel CTCF binding sites. A subset of evolutionarily young sites may harbour transcriptional functionality as evidenced by the stability of their binding across multiple tissues in M. musculus domesticus (BL6), while overall the distance of subspecies-specific CTCF binding to the nearest transcription start sites and/or topologically associated domains (TADs) is largely similar to musculus-common CTCF sites. Remarkably, we discovered a recurrent regulatory architecture consisting of a CTCF binding site and an interferon gene that appears to have been tandemly duplicated to create a 15-gene cluster on chromosome 4, thus forming a novel BL6 specific immune locus in which CTCF may play a regulatory role. Conclusions: Our results demonstrate that thousands of CTCF binding sites show multiple functional signatures rapidly after incorporation into the genome

Apollo (Cambridge)

Improving duplicated nodes position in vertebrate gene trees

Author: Amélie Peres
Hugues Roest Crollius
M Muffato
P Flicek
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

MoKCa database - mutations of kinases in cancer

Author: Alfarano
Altschul
Berman
Braconi Quintaje
Bruford
Burnworth
Chatr-aryamontri
Christopher J. Richardson
Clifford
Costas Mitsopoulous
Daley
Diella
Fernández
Finn
Flicek
Forbes
Frances M. G. Pearl
Gene Ontology Consortium
Greenman
Greenman
Hanahan
Hulo
Kaminker
Kaminker
Kerrien
Koorstra
Lappalainen
Laurence H. Pearl
Letunic
Manning
Marketa Zvelebil
Mishra
Ng
O'Brien
Ortutay
Pagel
Pearl
Qiong Gao
Sawyers
Sjöblom
Stark
Torkamani
UniProt Consortium
Vastrik
Velankar
Wheeler
Wood
Yeats
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2009
Field of study

Members of the protein kinase family are amongst the most commonly mutated genes in human cancer, and both mutated and activated protein kinases have proved to be tractable targets for the development of new anticancer therapies The MoKCa database (Mutations of Kinases in Cancer, http://strubiol.icr.ac.uk/extra/mokca) has been developed to structurally and functionally annotate, and where possible predict, the phenotypic consequences of mutations in protein kinases implicated in cancer. Somatic mutation data from tumours and tumour cell lines have been mapped onto the crystal structures of the affected protein domains. Positions of the mutated amino-acids are highlighted on a sequence-based domain pictogram, as well as a 3D-image of the protein structure, and in a molecular graphics package, integrated for interactive viewing. The data associated with each mutation is presented in the Web interface, along with expert annotation of the detailed molecular functional implications of the mutation. Proteins are linked to functional annotation resources and are annotated with structural and functional features such as domains and phosphorylation sites. MoKCa aims to provide assessments available from multiple sources and algorithms for each potential cancer-associated mutation, and present these together in a consistent and coherent fashion to facilitate authoritative annotation by cancer biologists and structural biologists, directly involved in the generation and analysis of new mutational data

Crossref

PubMed Central

Institute of Cancer Research Repository

Sussex Research Online

Ensembl variation resources

Author: Birney Ewan
Brent Simon
Chen Yuan
Cunningham Fiona
Flicek Paul
Kulesha Eugene
Marin-Garcia Pablo
McLaren William M
Pritchard Bethan
Rios Daniel
Smedley Damian
Smith James
Spudich Giulietta M
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

How and why DNA barcodes underestimate the diversity of microbial eukaryotes

Author: Adam Eyre-Walker
AR Boyko
AZ Worden
AZ Worden
B Charlesworth
B Palenik
DT Jones
F Not
G Piganeau
Gwenael Piganeau
Hervé Moreau
J Coyne
J Crow
JJ Welch
K Romari
M Viprey
ML Cuvelier
Nigel Grimsley
P Flicek
P Lopez-Garcia
PD Keightley
Purification Lopez-Garcia
S Gourbiere
S Jancek
S Proost
SB Needleman
SJ Williamson
SL Baldauf
SY Moon-van der Staay
Z Yang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/02/2011
Field of study

Background: Because many picoplanktonic eukaryotic species cannot currently be maintained in culture, direct sequencing of PCR-amplified 18S ribosomal gene DNA fragments from filtered sea-water has been successfully used to investigate the astounding diversity of these organisms. The recognition of many novel planktonic organisms is thus based solely on their 18S rDNA sequence. However, a species delimited by its 18S rDNA sequence might contain many cryptic species, which are highly differentiated in their protein coding sequences. Principal Findings: Here, we investigate the issue of species identification from one gene to the whole genome sequence. Using 52 whole genome DNA sequences, we estimated the global genetic divergence in protein coding genes between organisms from different lineages and compared this to their ribosomal gene sequence divergences. We show that this relationship between proteome divergence and 18S divergence is lineage dependant. Unicellular lineages have especially low 18S divergences relative to their protein sequence divergences, suggesting that 18S ribosomal genes are too conservative to assess planktonic eukaryotic diversity. We provide an explanation for this lineage dependency, which suggests that most species with large effective population sizes will show far less divergence in 18S than protein coding sequences. Conclusions: There is therefore a trade-off between using genes that are easy to amplify in all species, but which by their nature are highly conserved and underestimate the true number of species, and using genes that give a better description of the number of species, but which are more difficult to amplify. We have shown that this trade-off differs between unicellular and multicellular organisms as a likely consequence of differences in effective population sizes. We anticipate that biodiversity of microbial eukaryotic species is underestimated and that numerous ''cryptic species'' will become discernable with the future acquisition of genomic and metagenomic sequences

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Sussex Research Online