711 research outputs found

    Predicting protein function with hierarchical phylogenetic profiles: The Gene3D phylo-tuner method applied to eukaryotic Genomes

    Get PDF
    "Phylogenetic profiling'' is based on the hypothesis that during evolution functionally or physically interacting genes are likely to be inherited or eliminated in a codependent manner. Creating presence-absence profiles of orthologous genes is now a common and powerful way of identifying functionally associated genes. In this approach, correctly determining orthology, as a means of identifying functional equivalence between two genes, is a critical and nontrivial step and largely explains why previous work in this area has mainly focused on using presence-absence profiles in prokaryotic species. Here, we demonstrate that eukaryotic genomes have a high proportion of multigene families whose phylogenetic profile distributions are poor in presence-absence information content. This feature makes them prone to orthology mis-assignment and unsuited to standard profile-based prediction methods. Using CATH structural domain assignments from the Gene3D database for 13 complete eukaryotic genomes, we have developed a novel modification of the phylogenetic profiling method that uses genome copy number of each domain superfamily to predict functional relationships. In our approach, superfamilies are subclustered at ten levels of sequence identity from 30% to 100% - and phylogenetic profiles built at each level. All the profiles are compared using normalised Euclidean distances to identify those with correlated changes in their domain copy number. We demonstrate that two protein families will "auto-tune'' with strong co-evolutionary signals when their profiles are compared at the similarity levels that capture their functional relationship. Our method finds functional relationships that are not detectable by the conventional presence - absence profile comparisons, and it does not require a priori any fixed criteria to define orthologous genes

    Crystal structures of the human Dysferlin inner DysF domain

    Get PDF
    Background: Mutations in dysferlin, the first protein linked with the cell membrane repair mechanism, causes a group of muscular dystrophies called dysferlinopathies. Dysferlin is a type two-anchored membrane protein, with a single C terminal trans-membrane helix, and most of the protein lying in cytoplasm. Dysferlin contains several C2 domains and two DysF domains which are nested one inside the other. Many pathogenic point mutations fall in the DysF domain region. Results: We describe the crystal structure of the human dysferlin inner DysF domain with a resolution of 1.9 Angstroms. Most of the pathogenic mutations are part of aromatic/arginine stacks that hold the domain in a folded conformation. The high resolution of the structure show that these interactions are a mixture of parallel ring/guanadinium stacking, perpendicular H bond stacking and aliphatic chain packing. Conclusions: The high resolution structure of the Dysferlin DysF domain gives a template on which to interpret in detail the pathogenic mutations that lead to disease

    An integrated approach to the interpretation of Single Amino Acid Polymorphisms within the framework of CATH and Gene3D

    Get PDF
    Background: The phenotypic effects of sequence variations in protein-coding regions come about primarily via their effects on the resulting structures, for example by disrupting active sites or affecting structural stability. In order better to understand the mechanisms behind known mutant phenotypes, and predict the effects of novel variations, biologists need tools to gauge the impacts of DNA mutations in terms of their structural manifestation. Although many mutations occur within domains whose structure has been solved, many more occur within genes whose protein products have not been structurally characterized.Results: Here we present 3DSim (3D Structural Implication of Mutations), a database and web application facilitating the localization and visualization of single amino acid polymorphisms (SAAPs) mapped to protein structures even where the structure of the protein of interest is unknown. The server displays information on 6514 point mutations, 4865 of them known to be associated with disease. These polymorphisms are drawn from SAAPdb, which aggregates data from various sources including dbSNP and several pathogenic mutation databases. While the SAAPdb interface displays mutations on known structures, 3DSim projects mutations onto known sequence domains in Gene3D. This resource contains sequences annotated with domains predicted to belong to structural families in the CATH database. Mappings between domain sequences in Gene3D and known structures in CATH are obtained using a MUSCLE alignment. 1210 three-dimensional structures corresponding to CATH structural domains are currently included in 3DSim; these domains are distributed across 396 CATH superfamilies, and provide a comprehensive overview of the distribution of mutations in structural space.Conclusion: The server is publicly available at http://3DSim.bioinfo.cnio.es/. In addition, the database containing the mapping between SAAPdb, Gene3D and CATH is available on request and most of the functionality is available through programmatic web service access

    Gene3D: comprehensive structural and functional annotation of genomes

    Get PDF
    Gene3D provides comprehensive structural and functional annotation of most available protein sequences, including the UniProt, RefSeq and Integr8 resources. The main structural annotation is generated through scanning these sequences against the CATH structural domain database profile-HMM library. CATH is a database of manually derived PDB-based structural domains, placed within a hierarchy reflecting topology, homology and conservation and is able to infer more ancient and divergent homology relationships than sequence-based approaches. This data is supplemented with Pfam-A, other non-domain structural predictions (i.e. coiled coils) and experimental data from UniProt. In order to enhance the investigations possible with this data, we have also incorporated a variety of protein annotation resources, including protein–protein interaction data, GO functional assignments, KEGG pathways, FUNCAT functional descriptions and links to microarray expression data. All of this data can be accessed through a newly re-designed website that has a focus on flexibility and clarity, with searches that can be restricted to a single genome or across the entire sequence database. Currently Gene3D contains over 3.5 million domain assignments for nearly 5 million proteins including 527 completed genomes. This is available at: http://gene3d.biochem.ucl.ac.uk

    Ecological Assessment of Sagebrush Grasslands in Eastern Wyoming

    Get PDF
    An understanding of existing ecosystem conditions is necessary for planning efforts that include formulation of landscape conservation goals and implementation strategies. In support of a landscape planning effort for a 946,000-ac mixed-ownership area in eastern Wyoming, we used remote sensing and field sampling to assess existing ecosystem conditions of terrestrial ecological sites. We used SPOT 5, 33-ft (10-m) multi-spectral satellite imagery combined with NRCS ecological sites to create a geographic information system layer of vegetation cover by ecological site. We then integrated the remote sensing information with field data (571 plots) collected from a stratified random design from 2003 through 2005. The integration of the field data with the satellite mapping provided specific information about each terrestrial ecological site including species composition, productivity, structure, and shrub cover. Western wheatgrass was the most dominant species across all of the terrestrial ecological sites followed by big sagebrush, needle and thread, blue grama, annual brome species and to a lesser extent threadleaf sedge, and six weeks fescue. We found species that typically decrease with grazing (for example green needlegrass, bluebunch wheatgrass, Indian ricegrass) to be lacking or entirely absent from plant communities. Introduced species, especially the annual bromes, were prevalent across all ecological sites. Over 55 percent of the terrestrial ecosystems we sampled had greater than five percent relative cover of introduced plant species. Current ecosystem conditions for many wildlife of the area, as identified by our assessment, had generally lower habitat quality than desired and treatments to improve these conditions are planned

    The Gene3D Web Services: a platform for identifying, annotating and comparing structural domains in protein sequences

    Get PDF
    The Gene3D structural domain database provides domain annotations for 7 million proteins, based on the manually curated structural domain superfamilies in CATH. These annotations are integrated with functional, genomic and molecular information from external resources, such as GO, EC, UniProt and the NCBI Taxonomy database. We have constructed a set of web services that provide programmatic access to this integrated database, as well as the Gene3D domain recognition tool (Gene3DScan) and protein sequence annotation pipeline for analysing novel protein sequences. Example queries include retrieving all curated GO terms for a domain superfamily or all the multi-domain architectures for the human genome. The services can be accessed using simple HTTP calls and are able to return results in a range of formats for quick downloading and easy parsing, graphical rendering and data storage. Hence, they provide a simple, but flexible means of integrating domain annotations and associated data sets into locally run pipelines and analysis software. The services can be found at http://gene3d.biochem.ucl.ac.uk/WebServices/

    Finding the "Dark Matter'' in Human and Yeast Protein Network Prediction and Modelling

    Get PDF
    Accurate modelling of biological systems requires a deeper and more complete knowledge about the molecular components and their functional associations than we currently have. Traditionally, new knowledge on protein associations generated by experiments has played a central role in systems modelling, in contrast to generally less trusted bio-computational predictions. However, we will not achieve realistic modelling of complex molecular systems if the current experimental designs lead to biased screenings of real protein networks and leave large, functionally important areas poorly characterised. To assess the likelihood of this, we have built comprehensive network models of the yeast and human proteomes by using a meta-statistical integration of diverse computationally predicted protein association datasets. We have compared these predicted networks against combined experimental datasets from seven biological resources at different level of statistical significance. These eukaryotic predicted networks resemble all the topological and noise features of the experimentally inferred networks in both species, and we also show that this observation is not due to random behaviour. In addition, the topology of the predicted networks contains information on true protein associations, beyond the constitutive first order binary predictions. We also observe that most of the reliable predicted protein associations are experimentally uncharacterised in our models, constituting the hidden or "dark matter'' of networks by analogy to astronomical systems. Some of this dark matter shows enrichment of particular functions and contains key functional elements of protein networks, such as hubs associated with important functional areas like the regulation of Ras protein signal transduction in human cells. Thus, characterising this large and functionally important dark matter, elusive to established experimental designs, may be crucial for modelling biological systems. In any case, these predictions provide a valuable guide to these experimentally elusive regions

    Interplay between ferromagnetism, surface states, and quantum corrections in a magnetically doped topological insulator

    Full text link
    The breaking of time-reversal symmetry by ferromagnetism is predicted to yield profound changes to the electronic surface states of a topological insulator. Here, we report on a concerted set of structural, magnetic, electrical and spectroscopic measurements of \MBS thin films wherein photoemission and x-ray magnetic circular dichroism studies have recently shown surface ferromagnetism in the temperature range 15 K T100\leq T \leq 100 K, accompanied by a suppressed density of surface states at the Dirac point. Secondary ion mass spectroscopy and scanning tunneling microscopy reveal an inhomogeneous distribution of Mn atoms, with a tendency to segregate towards the sample surface. Magnetometry and anisotropic magnetoresistance measurements are insensitive to the high temperature ferromagnetism seen in surface studies, revealing instead a low temperature ferromagnetic phase at T5T \lesssim 5 K. The absence of both a magneto-optical Kerr effect and anomalous Hall effect suggests that this low temperature ferromagnetism is unlikely to be a homogeneous bulk phase but likely originates in nanoscale near-surface regions of the bulk where magnetic atoms segregate during sample growth. Although the samples are not ideal, with both bulk and surface contributions to electron transport, we measure a magnetoconductance whose behavior is qualitatively consistent with predictions that the opening of a gap in the Dirac spectrum drives quantum corrections to the conductance in topological insulators from the symplectic to the orthogonal class.Comment: To appear in Phys. Rev.

    Gene3D: Multi-domain annotations for protein sequence and comparative genome analysis

    Get PDF
    Gene3D (http://gene3d.biochem.ucl.ac.uk) is a database of protein domain structure annotations for protein sequences. Domains are predicted using a library of profile HMMs from 2738 CATH superfamilies. Gene3D assigns domain annotations to Ensembl and UniProt sequence sets including >6000 cellular genomes and >20 million unique protein sequences. This represents an increase of 45% in the number of protein sequences since our last publication. Thanks to improvements in the underlying data and pipeline, we see large increases in the domain coverage of sequences. We have expanded this coverage by integrating Pfam and SUPERFAMILY domain annotations, and we now resolve domain overlaps to provide highly comprehensive composite multi-domain architectures. To make these data more accessible for comparative genome analyses, we have developed novel search algorithms for searching genomes to identify related multi-domain architectures. In addition to providing domain family annotations, we have now developed a pipeline for 3D homology modelling of domains in Gene3D. This has been applied to the human genome and will be rolled out to other major organisms over the next year
    corecore