3,918 research outputs found

    Graph-Embedding Empowered Entity Retrieval

    Full text link
    In this research, we improve upon the current state of the art in entity retrieval by re-ranking the result list using graph embeddings. The paper shows that graph embeddings are useful for entity-oriented search tasks. We demonstrate empirically that encoding information from the knowledge graph into (graph) embeddings contributes to a higher increase in effectiveness of entity retrieval results than using plain word embeddings. We analyze the impact of the accuracy of the entity linker on the overall retrieval effectiveness. Our analysis further deploys the cluster hypothesis to explain the observed advantages of graph embeddings over the more widely used word embeddings, for user tasks involving ranking entities

    Adaptive HIV-1 evolutionary trajectories are constrained by protein stability

    Get PDF
    Despite the use of combination antiretroviral drugs for the treatment of HIV-1 infection, the emergence of drug resistance remains a problem. Resistance may be conferred either by a single mutation or a concerted set of mutations. The involvement of multiple mutations can arise due to interactions between sites in the amino acid sequence as a consequence of the need to maintain protein structure. To better understand the nature of such epistatic interactions, we reconstructed the ancestral sequences of HIV-1’s Pol protein, and traced the evolutionary trajectories leading to mutations associated with drug resistance. Using contemporary and ancestral sequences we modelled the effects of mutations (i.e. amino acid replacements) on protein structure to understand the functional effects of residue changes. Although the majority of resistance-associated sequences tend to destabilise the protein structure, we find there is a general tendency for protein stability to decrease across HIV-1’s evolutionary history. That a similar pattern is observed in the non-drug resistance lineages indicates that non-resistant mutations, for example, associated with escape from the immune response, also impacts on protein stability. Maintenance of optimal protein structure therefore represents a major constraining factor to the evolution of HIV-1

    Modular Biological Function Is Most Effectively Captured by Combining Molecular Interaction Data Types

    Get PDF
    PublishedLarge-scale molecular interaction data sets have the potential to provide a comprehensive, system-wide understanding of biological function. Although individual molecules can be promiscuous in terms of their contribution to function, molecular functions emerge from the specific interactions of molecules giving rise to modular organisation. As functions often derive from a range of mechanisms, we demonstrate that they are best studied using networks derived from different sources. Implementing a graph partitioning algorithm we identify subnetworks in yeast protein-protein interaction (PPI), genetic interaction and gene co-regulation networks. Among these subnetworks we identify cohesive subgraphs that we expect to represent functional modules in the different data types. We demonstrate significant overlap between the subgraphs generated from the different data types and show these overlaps can represent related functions as represented by the Gene Ontology (GO). Next, we investigate the correspondence between our subgraphs and the Gene Ontology. This revealed varying degrees of coverage of the biological process, molecular function and cellular component ontologies, dependent on the data type. For example, subgraphs from the PPI show enrichment for 84%, 58% and 93% of annotated GO terms, respectively. Integrating the interaction data into a combined network increases the coverage of GO. Furthermore, the different annotation types of GO are not predominantly associated with one of the interaction data types. Collectively our results demonstrate that successful capture of functional relationships by network data depends on both the specific biological function being characterised and the type of network data being used. We identify functions that require integrated information to be accurately represented, demonstrating the limitations of individual data types. Combining interaction subnetworks across data types is therefore essential for fully understanding the complex and emergent nature of biological function.JIM was funded by a Biotechnology and Biological Sciences Research Council (BBSRC) CASE studentship with industry partner Pfizer and RMA by a BBSRC studentship. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

    Gene Duplication and Environmental Adaptation within Yeast Populations

    Get PDF
    PublishedPopulation-level differences in the number of copies of genes resulting from gene duplication and loss have recently been recognized as an important source of variation in eukaryotes. However, except for a small number of cases, the phenotypic effects of this variation are unknown. Data from the Saccharomyces Genome Resequencing Project permit the study of duplication in genome sequences from a set of individuals within the same population. These sequences can be correlated with available information on the environments from which these yeast strains were isolated. We find that yeast show an abundance of duplicate genes that are lineage specific, leading to a large degree of variation in gene content between individual strains. There is a detectable bias for specific functions, indicating that selection is acting to preferentially retain certain duplicates. Most strikingly, we find that sets of over- and underrepresented duplicates correlate with the environment from which they were isolated. Together, these observations indicate that gene duplication can give rise to substantial phenotypic differences within populations that in turn can offer a shortcut to evolutionary adaptation.This work was funded by BBSRC grant BB/F007620/1

    Evolution of the Gene Lineage Encoding the Carbon Dioxide Receptor in Insects

    Get PDF
    A heterodimer of the insect chemoreceptors Gr21a and Gr63a has been shown to be the carbon dioxide receptor in Drosophila melanogaster (Meigen) (Diptera: Drosophilidae). Comparison of the genes encoding these two proteins across the 12 available drosophilid fly genomes allows refined definition of their N-termini. These genes are highly conserved, along with a paralog of Gr21a, in the Anopheles gambiae, Aedes aegypti, and Culex pipiens mosquitoes, as well as in the silk moth Bombyx mori and the red flour beetle Tribolium castaneum. In the latter four species we name these three proteins Gr1, Gr2, and Gr3. Intron evolution within this distinctive three gene lineage is considerable, with at least 13 inferred gains and 39 losses. Surprisingly, this entire ancient gene lineage is absent from all other available more basal insect and related arthropod genomes, specifically the honey bee, parasitoid wasp, human louse, pea aphid, waterflea, and blacklegged tick genomes. At least two of these species can detect carbon dioxide, suggesting that they evolved other means to do so

    The Universal Plausibility Metric (UPM) & Principle (UPP)

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Mere possibility is not an adequate basis for asserting scientific plausibility. A precisely defined universal bound is needed beyond which the assertion of <it>plausibility</it>, particularly in life-origin models, can be considered operationally falsified. But can something so seemingly relative and subjective as plausibility ever be quantified? Amazingly, the answer is, "Yes." A method of objectively measuring the plausibility of any chance hypothesis (The Universal Plausibility Metric [UPM]) is presented. A numerical inequality is also provided whereby any chance hypothesis can be definitively falsified when its UPM metric of ξ is < 1 (The Universal Plausibility Principle [UPP]). Both UPM and UPP pre-exist and are independent of any experimental design and data set.</p> <p>Conclusion</p> <p>No low-probability hypothetical plausibility assertion should survive peer-review without subjection to the UPP inequality standard of formal falsification (ξ < 1).</p

    The elastic constants of MgSiO3 perovskite at pressures and temperatures of the Earth's mantle

    Full text link
    The temperature anomalies in the Earth's mantle associated with thermal convection1 can be inferred from seismic tomography, provided that the elastic properties of mantle minerals are known as a function of temperature at mantle pressures. At present, however, such information is difficult to obtain directly through laboratory experiments. We have therefore taken advantage of recent advances in computer technology, and have performed finite-temperature ab initio molecular dynamics simulations of the elastic properties of MgSiO3 perovskite, the major mineral of the lower mantle, at relevant thermodynamic conditions. When combined with the results from tomographic images of the mantle, our results indicate that the lower mantle is either significantly anelastic or compositionally heterogeneous on large scales. We found the temperature contrast between the coldest and hottest regions of the mantle, at a given depth, to be about 800K at 1000 km, 1500K at 2000 km, and possibly over 2000K at the core-mantle boundary.Comment: Published in: Nature 411, 934-937 (2001

    An isolate of human immunodeficiency virus type 1 originally classified as subtype I represents a complex mosaic comprising three different group M subtypes (A, G, and I)

    Get PDF
    Full-length reference clones and sequences are currently available for eight human immunodeficiency virus type 1 (HIV-1) group M subtypes (A through H), but none have been reported for subtypes I and J, which have only been identified in a few individuals. Phylogenetic information for subtype I, in particular, is limited since only about 400 bp of env gene sequences have been determined for just two epidemiologically linked viruses infecting a couple who were heterosexual intravenous drug users from Cyprus. To characterize subtype I in greater detail, we employed long-range PCR to clone a full-length provirus (94CY032.3) from an isolate obtained from one of the individuals originally reported to be infected with this subtype. Phylogenetic analysis of C2-V3 env gene sequences confirmed that 94CY032.3 was closely related to sequences previously classified as subtype I. However, analysis of the remainder of its genome revealed various regions in which 94CY032.3 was significantly clustered with either subtype A or subtype G. Only sequences located in vpr and nef, as well as the middle portions of pol and env, formed independent lineages roughly equidistant from all other known subtypes. Since these latter regions most likely have a common origin, we classify them all as subtype I. These results thus indicate that the originally reported prototypic subtype I isolate 94CY032 represents a triple recombinant (A/G/I) with at least 11 points of recombination crossover. We also screened HIV-1 recombinants with regions of uncertain subtype assignment for the presence of subtype I sequences. This analysis revealed that two of the earliest mosaics from Africa, Z321B (A/G/?) and MAL (A/D/?), contain short segments of sequence which clustered closely with the subtype I domains of 94CY032.3. Since Z321 was isolated in 1976, subtype I as well as subtypes A and G must have existed in Central Africa prior to that date... (D'après résumé d'auteur
    corecore