3,121 research outputs found

    Infectious Disease Ontology

    Get PDF
    Technological developments have resulted in tremendous increases in the volume and diversity of the data and information that must be processed in the course of biomedical and clinical research and practice. Researchers are at the same time under ever greater pressure to share data and to take steps to ensure that data resources are interoperable. The use of ontologies to annotate data has proven successful in supporting these goals and in providing new possibilities for the automated processing of data and information. In this chapter, we describe different types of vocabulary resources and emphasize those features of formal ontologies that make them most useful for computational applications. We describe current uses of ontologies and discuss future goals for ontology-based computing, focusing on its use in the field of infectious diseases. We review the largest and most widely used vocabulary resources relevant to the study of infectious diseases and conclude with a description of the Infectious Disease Ontology (IDO) suite of interoperable ontology modules that together cover the entire infectious disease domain

    Peer Data Management

    Get PDF
    Peer Data Management (PDM) deals with the management of structured data in unstructured peer-to-peer (P2P) networks. Each peer can store data locally and define relationships between its data and the data provided by other peers. Queries posed to any of the peers are then answered by also considering the information implied by those mappings. The overall goal of PDM is to provide semantically well-founded integration and exchange of heterogeneous and distributed data sources. Unlike traditional data integration systems, peer data management systems (PDMSs) thereby allow for full autonomy of each member and need no central coordinator. The promise of such systems is to provide flexible data integration and exchange at low setup and maintenance costs. However, building such systems raises many challenges. Beside the obvious scalability problem, choosing an appropriate semantics that can deal with arbitrary, even cyclic topologies, data inconsistencies, or updates while at the same time allowing for tractable reasoning has been an area of active research in the last decade. In this survey we provide an overview of the different approaches suggested in the literature to tackle these problems, focusing on appropriate semantics for query answering and data exchange rather than on implementation specific problems

    Intelligent Information Access to Linked Data - Weaving the Cultural Heritage Web

    Get PDF
    The subject of the dissertation is an information alignment experiment of two cultural heritage information systems (ALAP): The Perseus Digital Library and Arachne. In modern societies, information integration is gaining importance for many tasks such as business decision making or even catastrophe management. It is beyond doubt that the information available in digital form can offer users new ways of interaction. Also, in the humanities and cultural heritage communities, more and more information is being published online. But in many situations the way that information has been made publicly available is disruptive to the research process due to its heterogeneity and distribution. Therefore integrated information will be a key factor to pursue successful research, and the need for information alignment is widely recognized. ALAP is an attempt to integrate information from Perseus and Arachne, not only on a schema level, but to also perform entity resolution. To that end, technical peculiarities and philosophical implications of the concepts of identity and co-reference are discussed. Multiple approaches to information integration and entity resolution are discussed and evaluated. The methodology that is used to implement ALAP is mainly rooted in the fields of information retrieval and knowledge discovery. First, an exploratory analysis was performed on both information systems to get a first impression of the data. After that, (semi-)structured information from both systems was extracted and normalized. Then, a clustering algorithm was used to reduce the number of needed entity comparisons. Finally, a thorough matching was performed on the different clusters. ALAP helped with identifying challenges and highlighted the opportunities that arise during the attempt to align cultural heritage information systems

    Hyperset Approach to Semi-structured Databases and the Experimental Implementation of the Query Language Delta

    Full text link
    This thesis presents practical suggestions towards the implementation of the hyperset approach to semi-structured databases and the associated query language Delta. This work can be characterised as part of a top-down approach to semi-structured databases, from theory to practice. The main original part of this work consisted in implementation of the hyperset Delta query language to semi-structured databases, including worked example queries. In fact, the goal was to demonstrate the practical details of this approach and language. The required development of an extended, practical version of the language based on the existing theoretical version, and the corresponding operational semantics. Here we present detailed description of the most essential steps of the implementation. Another crucial problem for this approach was to demonstrate how to deal in reality with the concept of the equality relation between (hyper)sets, which is computationally realised by the bisimulation relation. In fact, this expensive procedure, especially in the case of distributed semi-structured data, required some additional theoretical considerations and practical suggestions for efficient implementation. To this end the 'local/global' strategy for computing the bisimulation relation over distributed semi-structured data was developed and its efficiency was experimentally confirmed.Comment: Technical Report (PhD thesis), University of Liverpool, Englan

    Literature Based Discovery (LBD): Towards Hypothesis Generation and Knowledge Discovery in Biomedical Text Mining

    Full text link
    Biomedical knowledge is growing in an astounding pace with a majority of this knowledge is represented as scientific publications. Text mining tools and methods represents automatic approaches for extracting hidden patterns and trends from this semi structured and unstructured data. In Biomedical Text mining, Literature Based Discovery (LBD) is the process of automatically discovering novel associations between medical terms otherwise mentioned in disjoint literature sets. LBD approaches proven to be successfully reducing the discovery time of potential associations that are hidden in the vast amount of scientific literature. The process focuses on creating concept profiles for medical terms such as a disease or symptom and connecting it with a drug and treatment based on the statistical significance of the shared profiles. This knowledge discovery approach introduced in 1989 still remains as a core task in text mining. Currently the ABC principle based two approaches namely open discovery and closed discovery are mostly explored in LBD process. This review starts with general introduction about text mining followed by biomedical text mining and introduces various literature resources such as MEDLINE, UMLS, MESH, and SemMedDB. This is followed by brief introduction of the core ABC principle and its associated two approaches open discovery and closed discovery in LBD process. This review also discusses the deep learning applications in LBD by reviewing the role of transformer models and neural networks based LBD models and its future aspects. Finally, reviews the key biomedical discoveries generated through LBD approaches in biomedicine and conclude with the current limitations and future directions of LBD.Comment: 43 Pages, 5 Figures, 4 Table

    A combined approach to data mining of textual and structured data to identify cancer-related targets

    Get PDF
    BACKGROUND: We present an effective, rapid, systematic data mining approach for identifying genes or proteins related to a particular interest. A selected combination of programs exploring PubMed abstracts, universal gene/protein databases (UniProt, InterPro, NCBI Entrez), and state-of-the-art pathway knowledge bases (LSGraph and Ingenuity Pathway Analysis) was assembled to distinguish enzymes with hydrolytic activities that are expressed in the extracellular space of cancer cells. Proteins were identified with respect to six types of cancer occurring in the prostate, breast, lung, colon, ovary, and pancreas. RESULTS: The data mining method identified previously undetected targets. Our combined strategy applied to each cancer type identified a minimum of 375 proteins expressed within the extracellular space and/or attached to the plasma membrane. The method led to the recognition of human cancer-related hydrolases (on average, ~35 per cancer type), among which were prostatic acid phosphatase, prostate-specific antigen, and sulfatase 1. CONCLUSION: The combined data mining of several databases overcame many of the limitations of querying a single database and enabled the facile identification of gene products. In the case of cancer-related targets, it produced a list of putative extracellular, hydrolytic enzymes that merit additional study as candidates for cancer radioimaging and radiotherapy. The proposed data mining strategy is of a general nature and can be applied to other biological databases for understanding biological functions and diseases

    K2/Kleisli and GUS: Experiments in Integrated Access to Genomic Data Sources

    Get PDF
    The integration of heterogeneous data sources and software systems is a major issue in the biomed ical community and several approaches have been explored: linking databases, on-the- fly integration through views, and integration through warehousing. In this paper we report on our experiences with two systems that were developed at the University of Pennsylvania: an integration system called K2, which has primarily been used to provide views over multiple external data sources and software systems; and a data warehouse called GUS which downloads, cleans, integrates and annotates data from multiple external data sources. Although the view and warehouse approaches each have their advantages, there is no clear winner . Therefore, users must consider how the data is to be used, what the performance guarantees must be, and how much programmer time and expertise is available to choose the best strategy for a particular application

    Scientometric mapping as a strategic intelligence tool for the governance of emerging technologies

    Get PDF
    How can scientometric mapping function as a tool of ’strategic intelligence’ to aid the governance of emerging technologies? The present paper aims to address this question by focusing on a set of recently developed scientometric techniques, namely overlay mapping. We examine the potential these techniques have to inform, in a timely manner, analysts and decision-makers about relevant dynamics of technical emergence. We investigate the capability of overlay mapping in generating informed perspectives about emergence across three spaces: geographical, social, and cognitive. Our analysis relies on three empirical studies of emerging technologies in the biomedical domain: RNA interference (RNAi), Human Papilloma Virus (HPV) testing technologies for cervical cancer, and Thiopurine Methyltransferase (TPMT) genetic testing. The case-studies are analysed and mapped longitudinally by using publication and patent data. Results show the variety of ’intelligence’ inputs overlay mapping can produce for the governance of emerging technologies. Overlay mapping also confers to the investigation of emergence flexibility and granularity in terms of adaptability to different sources of data and selection of the levels of the analysis, respectively. These features make possible the integration and comparison of results from different contexts and cases, thus providing possibilities for a potentially more ’distributed’ strategic intelligence. The generated perspectives allow triangulation of findings, which is important given the complexity featuring in technical emergence and the limitations associated with the use of single scientometric approaches

    Handbook of Research on Urban and Territorial Systems and the Intangible Dimension: Survey and Representation

    Get PDF
    Surveying has always been closely linked to the definition of cognitive framework to which it is connected. Carrying out a survey has always meant representing the geometry of the context of interest but also thoroughly investigating the historical dynamics, the tangible, behavioral, and performance-based characteristics. The dimensions of comfort, usually associated with the private, domestic environment, now extends to the urban and territorial context too: perhaps going beyond the sense of the threshold referred to by Walter Benjamin when he described the city as a house with its living rooms. A new concept of habitable city has developed, where we can live, according to Ortega y Gasset, not simply a place for estar (being) but for bienestar (wellbeing)
    • …
    corecore