28 research outputs found
FunTree: a resource for exploring the functional evolution of structurally defined enzyme superfamilies.
FunTree is a new resource that brings together sequence, structure, phylogenetic, chemical and mechanistic information for structurally defined enzyme superfamilies. Gathering together this range of data into a single resource allows the investigation of how novel enzyme functions have evolved within a structurally defined superfamily as well as providing a means to analyse trends across many superfamilies. This is done not only within the context of an enzyme's sequence and structure but also the relationships of their reactions. Developed in tandem with the CATH database, it currently comprises 276 superfamilies covering ~1800 (70%) of sequence assigned enzyme reactions. Central to the resource are phylogenetic trees generated from structurally informed multiple sequence alignments using both domain structural alignments supplemented with domain sequences and whole sequence alignments based on commonality of multi-domain architectures. These trees are decorated with functional annotations such as metabolite similarity as well as annotations from manually curated resources such the catalytic site atlas and MACiE for enzyme mechanisms. The resource is freely available through a web interface: www.ebi.ac.uk/thorton-srv/databases/FunTree
New functional families (FunFams) in CATH to improve the mapping of conserved functional sites to 3D structures.
CATH version 3.5 (Class, Architecture, Topology, Homology, available at http://www.cathdb.info/) contains 173 536 domains, 2626 homologous superfamilies and 1313 fold groups. When focusing on structural genomics (SG) structures, we observe that the number of new folds for CATH v3.5 is slightly less than for previous releases, and this observation suggests that we may now know the majority of folds that are easily accessible to structure determination. We have improved the accuracy of our functional family (FunFams) sub-classification method and the CATH sequence domain search facility has been extended to provide FunFam annotations for each domain. The CATH website has been redesigned. We have improved the display of functional data and of conserved sequence features associated with FunFams within each CATH superfamily
The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution
We report the latest release (version 3.0) of the CATH protein domain database (). There has been a 20% increase in the number of structural domains classified in CATH, up to 86 151 domains. Release 3.0 comprises 1110 fold groups and 2147 homologous superfamilies. To cope with the increases in diverse structural homologues being determined by the structural genomics initiatives, more sensitive methods have been developed for identifying boundaries in multi-domain proteins and for recognising homologues. The CATH classification update is now being driven by an integrated pipeline that links these automated procedures with validation steps, that have been made easier by the provision of information rich web pages summarising comparison scores and relevant links to external sites for each domain being classified. An analysis of the population of domains in the CATH hierarchy and several domain characteristics are presented for version 3.0. We also report an update of the CATH Dictionary of homologous structures (CATH-DHS) which now contains multiple structural alignments, consensus information and functional annotations for 1459 well populated superfamilies in CATH. CATH is directly linked to the Gene3D database which is a projection of CATH structural data onto ∼2 million sequences in completed genomes and UniProt
CATH: comprehensive structural and functional annotations for genome sequences.
The latest version of the CATH-Gene3D protein structure classification database (4.0, http://www.cathdb.info) provides annotations for over 235,000 protein domain structures and includes 25 million domain predictions. This article provides an update on the major developments in the 2 years since the last publication in this journal including: significant improvements to the predictive power of our functional families (FunFams); the release of our 'current' putative domain assignments (CATH-B); a new, strictly non-redundant data set of CATH domains suitable for homology benchmarking experiments (CATH-40) and a number of improvements to the web pages
Exploring the Evolution of Novel Enzyme Functions within Structurally Defined Protein Superfamilies
In order to understand the evolution of enzyme reactions and to gain an overview of biological catalysis we have combined sequence and structural data to generate phylogenetic trees in an analysis of 276 structurally defined enzyme superfamilies, and used these to study how enzyme functions have evolved. We describe in detail the analysis of two superfamilies to illustrate different paradigms of enzyme evolution. Gathering together data from all the superfamilies supports and develops the observation that they have all evolved to act on a diverse set of substrates, whilst the evolution of new chemistry is much less common. Despite that, by bringing together so much data, we can provide a comprehensive overview of the most common and rare types of changes in function. Our analysis demonstrates on a larger scale than previously studied, that modifications in overall chemistry still occur, with all possible changes at the primary level of the Enzyme Commission (E.C.) classification observed to a greater or lesser extent. The phylogenetic trees map out the evolutionary route taken within a superfamily, as well as all the possible changes within a superfamily. This has been used to generate a matrix of observed exchanges from one enzyme function to another, revealing the scale and nature of enzyme evolution and that some types of exchanges between and within E.C. classes are more prevalent than others. Surprisingly a large proportion (71%) of all known enzyme functions are performed by this relatively small set of 276 superfamilies. This reinforces the hypothesis that relatively few ancient enzymatic domain superfamilies were progenitors for most of the chemistry required for life
Fine-mapping of the HNF1B multicancer locus identifies candidate variants that mediate endometrial cancer risk.
Common variants in the hepatocyte nuclear factor 1 homeobox B (HNF1B) gene are associated with the risk of Type II diabetes and multiple cancers. Evidence to date indicates that cancer risk may be mediated via genetic or epigenetic effects on HNF1B gene expression. We previously found single-nucleotide polymorphisms (SNPs) at the HNF1B locus to be associated with endometrial cancer, and now report extensive fine-mapping and in silico and laboratory analyses of this locus. Analysis of 1184 genotyped and imputed SNPs in 6608 Caucasian cases and 37 925 controls, and 895 Asian cases and 1968 controls, revealed the best signal of association for SNP rs11263763 (P = 8.4 × 10(-14), odds ratio = 0.86, 95% confidence interval = 0.82-0.89), located within HNF1B intron 1. Haplotype analysis and conditional analyses provide no evidence of further independent endometrial cancer risk variants at this locus. SNP rs11263763 genotype was associated with HNF1B mRNA expression but not with HNF1B methylation in endometrial tumor samples from The Cancer Genome Atlas. Genetic analyses prioritized rs11263763 and four other SNPs in high-to-moderate linkage disequilibrium as the most likely causal SNPs. Three of these SNPs map to the extended HNF1B promoter based on chromatin marks extending from the minimal promoter region. Reporter assays demonstrated that this extended region reduces activity in combination with the minimal HNF1B promoter, and that the minor alleles of rs11263763 or rs8064454 are associated with decreased HNF1B promoter activity. Our findings provide evidence for a single signal associated with endometrial cancer risk at the HNF1B locus, and that risk is likely mediated via altered HNF1B gene expression
Finishing the euchromatic sequence of the human genome
The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∼99% of the euchromatic genome and is accurate to an error rate of ∼1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead
Seventh BMC ecology image competition: the winning images
Abstract The seventh BMC Ecology competition attracted entries from talented ecologists from around the world. Together, they showcase the beauty and diversity of life on our planet as well as providing an insight into the biological interactions found in nature. This editorial celebrates the winning images as selected by the Editor of BMC Ecology and senior members of the journal’s editorial board. Enjoy