236 research outputs found

    Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linked Data

    Get PDF
    To make digital resources on the web verifiable, immutable, and permanent, we propose a technique to include cryptographic hash values in URIs. We call them trusty URIs and we show how they can be used for approaches like nanopublications to make not only specific resources but their entire reference trees verifiable. Digital artifacts can be identified not only on the byte level but on more abstract levels such as RDF graphs, which means that resources keep their hash values even when presented in a different format. Our approach sticks to the core principles of the web, namely openness and decentralized architecture, is fully compatible with existing standards and protocols, and can therefore be used right away. Evaluation of our reference implementations shows that these desired properties are indeed accomplished by our approach, and that it remains practical even for very large files.Comment: Small error corrected in the text (table data was correct) on page 13: "All average values are below 0.8s (0.03s for batch mode). Using Java in batch mode even requires only 1ms per file.

    AI-KG: an Automatically Generated Knowledge Graph of Artificial Intelligence

    Get PDF
    Scientific knowledge has been traditionally disseminated and preserved through research articles published in journals, conference proceedings, and online archives. However, this article-centric paradigm has been often criticized for not allowing to automatically process, categorize, and reason on this knowledge. An alternative vision is to generate a semantically rich and interlinked description of the content of research publications. In this paper, we present the Artificial Intelligence Knowledge Graph (AI-KG), a large-scale automatically generated knowledge graph that describes 820K research entities. AI-KG includes about 14M RDF triples and 1.2M reified statements extracted from 333K research publications in the field of AI, and describes 5 types of entities (tasks, methods, metrics, materials, others) linked by 27 relations. AI-KG has been designed to support a variety of intelligent services for analyzing and making sense of research dynamics, supporting researchers in their daily job, and helping to inform decision-making in funding bodies and research policymakers. AI-KG has been generated by applying an automatic pipeline that extracts entities and relationships using three tools:DyGIE++, Stanford CoreNLP, and the CSO Classifier. It then integrates and filters the resulting triples using a combination of deep learning and semantic technologies in order to produce a high-quality knowledge graph. This pipeline was evaluated on a manually crafted gold standard, yielding competitive results. AI-KG is available under CC BY 4.0 and can be downloaded as a dump or queried via a SPARQL endpoint

    A type 2 diabetes disease module with a high collective influence for Cdk2 and PTPLAD1 is localized in endosomes

    Get PDF
    Despite the identification of many susceptibility genes our knowledge of the underlying mechanisms responsible for complex disease remains limited. Here, we identified a type 2 diabetes disease module in endosomes, and validate it for functional relevance on selected nodes. Using hepatic Golgi/endosomes fractions, we established a proteome of insulin receptor-containing endosomes that allowed the study of physical protein interaction networks on a type 2 diabetes background. The resulting collated network is formed by 313 nodes and 1147 edges with a topology organized around a few major hubs with Cdk2 displaying the highest collective influence. Overall, 88% of the nodes are associated with the type 2 diabetes genetic risk, including 101 new candidates. The Type 2 diabetes module is enriched with cytoskeleton and luminal acidification-dependent processes that are shared with secretion-related mechanisms. We identified new signaling pathways driven by Cdk2 and PTPLAD1 whose expression affects the association of the insulin receptor with TUBA, TUBB, the actin component ACTB and the endosomal sorting markers Rab5c and Rab11a. Therefore, the interactome of internalized insulin receptors reveals the presence of a type 2 diabetes disease module enriched in new layers of feedback loops required for insulin signaling, clearance and islet biology

    Semantic Web integration of Cheminformatics resources with the SADI framework

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The diversity and the largely independent nature of chemical research efforts over the past half century are, most likely, the major contributors to the current poor state of chemical computational resource and database interoperability. While open software for chemical format interconversion and database entry cross-linking have partially addressed database interoperability, computational resource integration is hindered by the great diversity of software interfaces, languages, access methods, and platforms, among others. This has, in turn, translated into limited reproducibility of computational experiments and the need for application-specific computational workflow construction and semi-automated enactment by human experts, especially where emerging interdisciplinary fields, such as systems chemistry, are pursued. Fortunately, the advent of the Semantic Web, and the very recent introduction of RESTful Semantic Web Services (SWS) may present an opportunity to integrate all of the existing computational and database resources in chemistry into a machine-understandable, unified system that draws on the entirety of the Semantic Web.</p> <p>Results</p> <p>We have created a prototype framework of Semantic Automated Discovery and Integration (SADI) framework SWS that exposes the QSAR descriptor functionality of the Chemistry Development Kit. Since each of these services has formal ontology-defined input and output classes, and each service consumes and produces RDF graphs, clients can automatically reason about the services and available reference information necessary to complete a given overall computational task specified through a simple SPARQL query. We demonstrate this capability by carrying out QSAR analysis backed by a simple formal ontology to determine whether a given molecule is drug-like. Further, we discuss parameter-based control over the execution of SADI SWS. Finally, we demonstrate the value of computational resource envelopment as SADI services through service reuse and ease of integration of computational functionality into formal ontologies.</p> <p>Conclusions</p> <p>The work we present here may trigger a major paradigm shift in the distribution of computational resources in chemistry. We conclude that envelopment of chemical computational resources as SADI SWS facilitates interdisciplinary research by enabling the definition of computational problems in terms of ontologies and formal logical statements instead of cumbersome and application-specific tasks and workflows.</p

    WENDI: A tool for finding non-obvious relationships between compounds and biological properties, genes, diseases and scholarly publications

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In recent years, there has been a huge increase in the amount of publicly-available and proprietary information pertinent to drug discovery. However, there is a distinct lack of data mining tools available to harness this information, and in particular for knowledge discovery across multiple information sources. At Indiana University we have an ongoing project with Eli Lilly to develop web-service based tools for integrative mining of chemical and biological information. In this paper, we report on the first of these tools, called WENDI (Web Engine for Non-obvious Drug Information) that attempts to find non-obvious relationships between a query compound and scholarly publications, biological properties, genes and diseases using multiple information sources.</p> <p>Results</p> <p>We have created an aggregate web service that takes a query compound as input, calls multiple web services for computation and database search, and returns an XML file that aggregates this information. We have also developed a client application that provides an easy-to-use interface to this web service. Both the service and client are publicly available.</p> <p>Conclusions</p> <p>Initial testing indicates this tool is useful in identifying potential biological applications of compounds that are not obvious, and in identifying corroborating and conflicting information from multiple sources. We encourage feedback on the tool to help us refine it further. We are now developing further tools based on this model.</p

    Multilayer biological networks to upscale marine research to global change-smart management and sustainable resource use

    Get PDF
    Human activities are having a massive negative impact on biodiversity and ecological processes worldwide. The rate and magnitude of ecological transformations induced by climate change, habitat destruction, overexploitation and pollution are now so substantial that a sixth mass extinction event is currently underway. The biodiversity crisis of the Anthropocene urges scientists to put forward a transformative vision to promote the conservation of biodiversity, and thus indirectly the preservation of ecosystem functions. Here, we identify pressing issues in global change biology research and propose an integrative framework based on multilayer biological networks as a tool to support conservation actions and marine risk assessments in multi-stressor scenarios. Multilayer networks can integrate different levels of environmental and biotic complexity, enabling us to combine information on molecular, physiological and behaviour responses, species interactions and biotic communities. The ultimate aim of this framework is to link human-induced environmental changes to species physiology, fitness, biogeography and ecosystem impacts across vast seascapes and time frames, to help guide solutions to address biodiversity loss and ecological tipping points. Further, we also define our current ability to adopt a widespread use of multilayer networks within ecology, evolution and conservation by providing examples of case-studies. We also assess which approaches are ready to be transferred and which ones require further development before use. We conclude that multilayer biological networks will be crucial to inform (using reliable multi-levels integrative indicators) stakeholders and support their decision-making concerning the sustainable use of resources and marine conservation

    Self-organizing ontology of biochemically relevant small molecules

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The advent of high-throughput experimentation in biochemistry has led to the generation of vast amounts of chemical data, necessitating the development of novel analysis, characterization, and cataloguing techniques and tools. Recently, a movement to publically release such data has advanced biochemical structure-activity relationship research, while providing new challenges, the biggest being the curation, annotation, and classification of this information to facilitate useful biochemical pattern analysis. Unfortunately, the human resources currently employed by the organizations supporting these efforts (e.g. ChEBI) are expanding linearly, while new useful scientific information is being released in a seemingly exponential fashion. Compounding this, currently existing chemical classification and annotation systems are not amenable to automated classification, formal and transparent chemical class definition axiomatization, facile class redefinition, or novel class integration, thus further limiting chemical ontology growth by necessitating human involvement in curation. Clearly, there is a need for the automation of this process, especially for novel chemical entities of biological interest.</p> <p>Results</p> <p>To address this, we present a formal framework based on Semantic Web technologies for the automatic design of chemical ontology which can be used for automated classification of novel entities. We demonstrate the automatic self-assembly of a structure-based chemical ontology based on 60 MeSH and 40 ChEBI chemical classes. This ontology is then used to classify 200 compounds with an accuracy of 92.7%. We extend these structure-based classes with molecular feature information and demonstrate the utility of our framework for classification of functionally relevant chemicals. Finally, we discuss an iterative approach that we envision for future biochemical ontology development.</p> <p>Conclusions</p> <p>We conclude that the proposed methodology can ease the burden of chemical data annotators and dramatically increase their productivity. We anticipate that the use of formal logic in our proposed framework will make chemical classification criteria more transparent to humans and machines alike and will thus facilitate predictive and integrative bioactivity model development.</p
    corecore