    Predicting protein function with hierarchical phylogenetic profiles: The Gene3D phylo-tuner method applied to eukaryotic Genomes

    "Phylogenetic profiling'' is based on the hypothesis that during evolution functionally or physically interacting genes are likely to be inherited or eliminated in a codependent manner. Creating presence-absence profiles of orthologous genes is now a common and powerful way of identifying functionally associated genes. In this approach, correctly determining orthology, as a means of identifying functional equivalence between two genes, is a critical and nontrivial step and largely explains why previous work in this area has mainly focused on using presence-absence profiles in prokaryotic species. Here, we demonstrate that eukaryotic genomes have a high proportion of multigene families whose phylogenetic profile distributions are poor in presence-absence information content. This feature makes them prone to orthology mis-assignment and unsuited to standard profile-based prediction methods. Using CATH structural domain assignments from the Gene3D database for 13 complete eukaryotic genomes, we have developed a novel modification of the phylogenetic profiling method that uses genome copy number of each domain superfamily to predict functional relationships. In our approach, superfamilies are subclustered at ten levels of sequence identity from 30% to 100% - and phylogenetic profiles built at each level. All the profiles are compared using normalised Euclidean distances to identify those with correlated changes in their domain copy number. We demonstrate that two protein families will "auto-tune'' with strong co-evolutionary signals when their profiles are compared at the similarity levels that capture their functional relationship. Our method finds functional relationships that are not detectable by the conventional presence - absence profile comparisons, and it does not require a priori any fixed criteria to define orthologous genes

    NAFTA's Implications for EastAsian exports

    Several studies have quantified the influence of the North American Free Trade Agreement (NAFTA) and the earlier Canada - United States Free Trade Agreement on member countries. Less attention has been paid to their effects on nonmembers. The authors try to quantify NAFTA's third-party effects on East Asia using a partial equilibrium trade model and a gravity flow model. They identify and focus on East Asian export sectors that are especially at risk of trade diversion. Their results suggest that the NAFTA-induced trade diversion losses could range from 380millionto380 million to 700 million. The larger figure represents less than 1 percent of East Asia's nonoil exports to the United States. Their analysis also indicates that losses would be concentrated in a few sectors - such as textiles, clothing, and ferrous metals - where high U.S. trade barriers exist. A larger share of Hong Kong and Macau trade would be diverted than trade in other East Asian economies because textiles and clothing represent a larger share of their exports. Economies specializing in such products as machinery and equipment (Singapore) would have relatively little trade diverted. East Asia's trade losses might be reduced by roughly half once the results of the Uruguay Round are implemented because that will lower the preference margins NAFTA members can extend to each other. To put things in perspective: the trade losses East Asian economies might incur because of NAFTA are roughly 1 percent of the gains they will receive from successful implementation of the Uruguay Round results.Economic Theory&Research,Environmental Economics&Policies,Trade Policy,Trade and Regional Integration,TF054105-DONOR FUNDED OPERATION ADMINISTRATION FEE INCOME AND EXPENSE ACCOUNT

    An integrated approach to the interpretation of Single Amino Acid Polymorphisms within the framework of CATH and Gene3D

    Background: The phenotypic effects of sequence variations in protein-coding regions come about primarily via their effects on the resulting structures, for example by disrupting active sites or affecting structural stability. In order better to understand the mechanisms behind known mutant phenotypes, and predict the effects of novel variations, biologists need tools to gauge the impacts of DNA mutations in terms of their structural manifestation. Although many mutations occur within domains whose structure has been solved, many more occur within genes whose protein products have not been structurally characterized.Results: Here we present 3DSim (3D Structural Implication of Mutations), a database and web application facilitating the localization and visualization of single amino acid polymorphisms (SAAPs) mapped to protein structures even where the structure of the protein of interest is unknown. The server displays information on 6514 point mutations, 4865 of them known to be associated with disease. These polymorphisms are drawn from SAAPdb, which aggregates data from various sources including dbSNP and several pathogenic mutation databases. While the SAAPdb interface displays mutations on known structures, 3DSim projects mutations onto known sequence domains in Gene3D. This resource contains sequences annotated with domains predicted to belong to structural families in the CATH database. Mappings between domain sequences in Gene3D and known structures in CATH are obtained using a MUSCLE alignment. 1210 three-dimensional structures corresponding to CATH structural domains are currently included in 3DSim; these domains are distributed across 396 CATH superfamilies, and provide a comprehensive overview of the distribution of mutations in structural space.Conclusion: The server is publicly available at http://3DSim.bioinfo.cnio.es/. In addition, the database containing the mapping between SAAPdb, Gene3D and CATH is available on request and most of the functionality is available through programmatic web service access

    Coherent Heteroepitaxy of Bi2Se3 on GaAs (111)B

    We report the heteroepitaxy of single crystal thin films of Bi2Se3 on the (111)B surface of GaAs by molecular beam epitaxy. We find that Bi2Se3 grows highly c-axis oriented, with an atomically sharp interface with the GaAs substrate. By optimizing the growth of a very thin GaAs buffer layer before growing the Bi2Se3, we demonstrate the growth of thin films with atomically flat terraces over hundreds of nanometers. Initial time-resolved Kerr rotation measurements herald opportunities for probing coherent spin dynamics at the interface between a candidate topological insulator and a large class of GaAs-based heterostructures.Comment: To appear in Applied Physics Letter

    Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space

    We present an analysis of 203 completed genomes in the Gene3D resource (including 17 eukaryotes), which demonstrates that the number of protein families is continually expanding over time and that singleton-sequences appear to be an intrinsic part of the genomes. A significant proportion of the proteomes can be assigned to fewer than 6000 well-characterized domain families with the remaining domain-like regions belonging to a much larger number of small uncharacterized families that are largely species specific. Our comprehensive domain annotation of 203 genomes enables us to provide more accurate estimates of the number of multi-domain proteins found in the three kingdoms of life than previous calculations. We find that 67% of eukaryotic sequences are multi-domain compared with 56% of sequences in prokaryotes. By measuring the domain coverage of genome sequences, we show that the structural genomics initiatives should aim to provide structures for less than a thousand structurally uncharacterized Pfam families to achieve reasonable structural annotation of the genomes. However, in large families, additional structures should be determined as these would reveal more about the evolution of the family and enable a greater understanding of how function evolves

    Ecological Assessment of Sagebrush Grasslands in Eastern Wyoming

    An understanding of existing ecosystem conditions is necessary for planning efforts that include formulation of landscape conservation goals and implementation strategies. In support of a landscape planning effort for a 946,000-ac mixed-ownership area in eastern Wyoming, we used remote sensing and field sampling to assess existing ecosystem conditions of terrestrial ecological sites. We used SPOT 5, 33-ft (10-m) multi-spectral satellite imagery combined with NRCS ecological sites to create a geographic information system layer of vegetation cover by ecological site. We then integrated the remote sensing information with field data (571 plots) collected from a stratified random design from 2003 through 2005. The integration of the field data with the satellite mapping provided specific information about each terrestrial ecological site including species composition, productivity, structure, and shrub cover. Western wheatgrass was the most dominant species across all of the terrestrial ecological sites followed by big sagebrush, needle and thread, blue grama, annual brome species and to a lesser extent threadleaf sedge, and six weeks fescue. We found species that typically decrease with grazing (for example green needlegrass, bluebunch wheatgrass, Indian ricegrass) to be lacking or entirely absent from plant communities. Introduced species, especially the annual bromes, were prevalent across all ecological sites. Over 55 percent of the terrestrial ecosystems we sampled had greater than five percent relative cover of introduced plant species. Current ecosystem conditions for many wildlife of the area, as identified by our assessment, had generally lower habitat quality than desired and treatments to improve these conditions are planned

    Sufficient conditions for labelled 0-1 laws

    Gene3D: comprehensive structural and functional annotation of genomes

    Gene3D provides comprehensive structural and functional annotation of most available protein sequences, including the UniProt, RefSeq and Integr8 resources. The main structural annotation is generated through scanning these sequences against the CATH structural domain database profile-HMM library. CATH is a database of manually derived PDB-based structural domains, placed within a hierarchy reflecting topology, homology and conservation and is able to infer more ancient and divergent homology relationships than sequence-based approaches. This data is supplemented with Pfam-A, other non-domain structural predictions (i.e. coiled coils) and experimental data from UniProt. In order to enhance the investigations possible with this data, we have also incorporated a variety of protein annotation resources, including protein–protein interaction data, GO functional assignments, KEGG pathways, FUNCAT functional descriptions and links to microarray expression data. All of this data can be accessed through a newly re-designed website that has a focus on flexibility and clarity, with searches that can be restricted to a single genome or across the entire sequence database. Currently Gene3D contains over 3.5 million domain assignments for nearly 5 million proteins including 527 completed genomes. This is available at: http://gene3d.biochem.ucl.ac.uk