706 research outputs found

    An unsupervised data-driven method to discover equivalent relations in large linked datasets

    Get PDF
    This article addresses a number of limitations of state-of-the-art methods of Ontology Alignment: 1) they primarily address concepts and entities while relations are less well-studied; 2) many build on the assumption of the ‘well-formedness’ of ontologies which is unnecessarily true in the domain of Linked Open Data; 3) few have looked at schema heterogeneity from a single source, which is also a common issue particularly in very large Linked Dataset created automatically from heterogeneous resources, or integrated from multiple datasets. We propose a domain- and language-independent and completely unsupervised method to align equivalent relations across schemata based on their shared instances. We introduce a novel similarity measure able to cope with unbalanced population of schema elements, an unsupervised technique to automatically decide similarity threshold to assert equivalence for a pair of relations, and an unsupervised clustering process to discover groups of equivalent relations across different schemata. Although the method is designed for aligning relations within a single dataset, it can also be adapted for cross-dataset alignment where sameAs links between datasets have been established. Using three gold standards created based on DBpedia, we obtain encouraging results from a thorough evaluation involving four baseline similarity measures and over 15 comparative models based on variants of the proposed method. The proposed method makes significant improvement over baseline models in terms of F1 measure (mostly between 7% and 40%), and it always scores the highest precision and is also among the top performers in terms of recall. We also make public the datasets used in this work, which we believe make the largest collection of gold standards for evaluating relation alignment in the LOD context

    Semantic Web meets Web 2.0 (and vice versa): The Value of the Mundane for the Semantic Web

    No full text
    Web 2.0, not the Semantic Web, has become the face of “the next generation Web” among the tech-literate set, and even among many in the various research communities involved in the Web. Perceptions in these communities of what the Semantic Web is (and who is involved in it) are often misinformed if not misguided. In this paper we identify opportunities for Semantic Web activities to connect with the Web 2.0 community; we explore why this connection is of significant benefit to both groups, and identify how these connections open valuable research opportunities “in the real” for the Semantic Web effort

    Personalization in cultural heritage: the road travelled and the one ahead

    Get PDF
    Over the last 20 years, cultural heritage has been a favored domain for personalization research. For years, researchers have experimented with the cutting edge technology of the day; now, with the convergence of internet and wireless technology, and the increasing adoption of the Web as a platform for the publication of information, the visitor is able to exploit cultural heritage material before, during and after the visit, having different goals and requirements in each phase. However, cultural heritage sites have a huge amount of information to present, which must be filtered and personalized in order to enable the individual user to easily access it. Personalization of cultural heritage information requires a system that is able to model the user (e.g., interest, knowledge and other personal characteristics), as well as contextual aspects, select the most appropriate content, and deliver it in the most suitable way. It should be noted that achieving this result is extremely challenging in the case of first-time users, such as tourists who visit a cultural heritage site for the first time (and maybe the only time in their life). In addition, as tourism is a social activity, adapting to the individual is not enough because groups and communities have to be modeled and supported as well, taking into account their mutual interests, previous mutual experience, and requirements. How to model and represent the user(s) and the context of the visit and how to reason with regard to the information that is available are the challenges faced by researchers in personalization of cultural heritage. Notwithstanding the effort invested so far, a definite solution is far from being reached, mainly because new technology and new aspects of personalization are constantly being introduced. This article surveys the research in this area. Starting from the earlier systems, which presented cultural heritage information in kiosks, it summarizes the evolution of personalization techniques in museum web sites, virtual collections and mobile guides, until recent extension of cultural heritage toward the semantic and social web. The paper concludes with current challenges and points out areas where future research is needed

    Granularity in visualisation of 3D BIM models design-science approach

    Get PDF
    Building Information Modeling (BIM) has gradually grown into one of the key information management platforms in the Architecture, Engineering and Construction industry. With the growing amount of information in and outside the digital construction industry, concepts of information retrieval like relevance and granularity have become relevant in this domain. An increased need for interoperability and easy access to information has caused the industry to look towards concepts like the semantic web and linked data. But within all this, the geometrical visualisation, which is an integral part of the BIM process, has lagged behind on the front of granularity and still seems to be done mainly using the conventional ways. We try to explore ways to introduce granularity in the visualisation of 3D BIM models, and connecting it to the semantic information which is already granular, thus creating a mapping between the two at a granular level. A web-based prototype is implemented and analysed as a proof of the presented concept, with the semantics being represented inside a graph-based data structure. We further present a discussion on the potential applications and use cases of the conceptualised framework in the field of construction and building lifecycle management. The work aims to take the first step towards modularising the visualisation process, and has tried to pave the way for detailed analyses and further improvements that may follow in this direction

    A Mobile Query Service for Integrated Access to Large Numbers of Online Semantic Web Data Sources

    Get PDF
    From the Semantic Web’s inception, a number of concurrent initiatives have given rise to multiple segments: large semantic datasets, exposed by query endpoints; online Semantic Web documents, in the form of RDF files; and semantically annotated web content (e.g., using RDFa), semantic sources in their own right. In various mobile application scenarios, online semantic data has proven to be useful. While query endpoints are most commonly exploited, they are mainly useful to expose large semantic datasets. Alternatively, mobile RDF stores are utilized to query local semantic data, but this requires the design-time identification and replication of relevant data. Instead, we present a mobile query service that supports on-the-fly and integrated querying of semantic data, originating from a largely unused portion of the Semantic Web, comprising online RDF files and semantics embedded in annotated webpages. To that end, our solution performs dynamic identification, retrieval and caching of query-relevant semantic data. We explore several data identification and caching alternatives, and investigate the utility of source metadata in optimizing these tasks. Further, we introduce a novel cache replacement strategy, fine- tuned to the described query dataset, and include explicit support for the Open World Assumption. An extensive experimental validation evaluates the query service and its alternative components

    Exploring a Hybrid of Geospatial Semantic Information in Ubiquitous Computing Environments

    Get PDF
    Nowadays, geospatial information plays a critical role to each and every one of us. Searching and obtaining the right geospatial information, however, is a difficult and often very time-consuming task. The Semantic Web promises to facilitate this process by improving the capability to search for information by better expressing the context and meaning of the search query. Combining the two approaches to create a Geospatial Semantic Web is an idea that is gaining acceptance in both areas of Geospatial Information Science and Semantic Web Services. Here, we present a prototype that promises to prove that the meshing of these two fields is a promising field especially in conjunction with information retrieval and ubiquitous computing environments. The aim of this prototype is to exploit a hybrid of geospatial semantic information retrieved from multiple data sources in a mobile environment. Our prototype uses three geospatial data sources: GeoNames, LinkedGeoData, and DBpedia. Experimental results show how the merging of the three geospatial data sources and the use of more than one level of indexing is more effective in terms of recall and precision in comparison to another system

    A benchmark for biomedical knowledge graph based similarity

    Get PDF
    Tese de mestrado em Bioinformática e Biologia Computacional, Universidade de Lisboa, Faculdade de Ciências, 2020Os grafos de conhecimento biomédicos são cruciais para sustentar aplicações em grandes quantidades de dados nas ciências da vida e saúde. Uma das aplicações mais comuns dos grafos de conhecimento nas ciências da vida é o apoio à comparação de entidades no grafo por meio das suas descrições ontológicas. Estas descrições suportam o cálculo da semelhança semântica entre duas entidades, e encontrar as suas semelhanças e diferenças é uma técnica fundamental para diversas aplicações, desde a previsão de interações proteína-proteína até à descoberta de associações entre doenças e genes, a previsão da localização celular de proteínas, entre outros. Na última década, houve um esforço considerável no desenvolvimento de medidas de semelhança semântica para grafos de conhecimento biomédico mas, até agora, a investigação nessa área tem-se concentrado na comparação de conjuntos de entidades relativamente pequenos. Dada a diversa gama de aplicações para medidas de semelhança semântica, é essencial apoiar a avaliação em grande escala destas medidas. No entanto, fazê-lo não é trivial, uma vez que não há um padrão ouro para a semelhança de entidades biológicas. Uma solução possível é comparar estas medidas com outras medidas ou proxies de semelhança. As entidades biológicas podem ser comparadas através de diferentes ângulos, por exemplo, a semelhança de sequência e estrutural de duas proteínas ou as vias metabólicas afetadas por duas doenças. Estas medidas estão relacionadas com as características relevantes das entidades, portanto podem ajudar a compreender como é que as abordagens de semelhança semântica capturam a semelhança das entidades. O objetivo deste trabalho é desenvolver um benchmark, composto por data sets e métodos de avaliação automatizados. Este benchmark deve sustentar a avaliação em grande escala de medidas de semelhança semântica para entidades biológicas, com base na sua correlação com diferentes propriedades das entidades. Para atingir este objetivo, uma metodologia para o desenvolvimento de data sets de referência para semelhança semântica foi desenvolvida e aplicada a dois grafos de conhecimento: proteínas anotadas com a Gene Ontology e genes anotados com a Human Phenotype Ontology. Este benchmark explora proxies de semelhança com base na semelhança de sequência, função molecular e interações de proteínas e semelhança de genes baseada em fenótipos, e fornece cálculos de semelhança semântica com medidas representativas do estado da arte, para uma avaliação comparativa. Isto resultou num benchmark composto por uma coleção de 21 data sets de referência com tamanhos variados, cobrindo quatro espécies e diferentes níveis de anotação das entidades, e técnicas de avaliação ajustadas aos data sets.Biomedical knowledge graphs are crucial to support data intensive applications in the life sciences and healthcare. One of the most common applications of knowledge graphs in the life sciences is to support the comparison of entities in the graph through their ontological descriptions. These descriptions support the calculation of semantic similarity between two entities, and finding their similarities and differences is a cornerstone technique for several applications, ranging from prediction of protein-protein interactions to the discovering of associations between diseases and genes, the prediction of cellular localization of proteins, among others. In the last decade there has been a considerable effort in developing semantic similarity measures for biomedical knowledge graphs, but the research in this area has so far focused on the comparison of relatively small sets of entities. Given the wide range of applications for semantic similarity measures, it is essential to support the large-scale evaluation of these measures. However, this is not trivial since there is no gold standard for biological entity similarity. One possible solution is to compare these measures to other measures or proxies of similarity. Biological entities can be compared through different lenses, for instance the sequence and structural similarity of two proteins or the metabolic pathways affected by two diseases. These measures relate to relevant characteristics of the underlying entities, so they can help to understand how well semantic similarity approaches capture entity similarity. The goal of this work is to develop a benchmark for semantic similarity measures, composed of data sets and automated evaluation methods. This benchmark should support the large-scale evaluation of semantic similarity measures for biomedical entities, based on their correlation to different properties of biological entities. To achieve this goal, a methodology for the development of benchmark data sets for semantic similarity was developed and applied to two knowledge graphs: proteins annotated with the Gene Ontology and genes annotated with the Human Phenotype Ontology. This benchmark explores proxies of similarity calculated based on protein sequence similarity, protein molecular function similarity, protein-protein interactions and phenotype-based gene similarity, and provides semantic similarity computations with state-of-the-art representative measures, for a comparative evaluation of the measures. This resulted in a benchmark made up of a collection of 21 benchmark data sets with varying sizes, covering four different species at different levels of annotation completion and evaluation techniques fitted to the data sets characteristics

    The what, where, how and why of gene ontology—a primer for bioinformaticians

    Get PDF
    With high-throughput technologies providing vast amounts of data, it has become more important to provide systematic, quality annotations. The Gene Ontology (GO) project is the largest resource for cataloguing gene function. Nonetheless, its use is not yet ubiquitous and is still fraught with pitfalls. In this review, we provide a short primer to the GO for bioinformaticians. We summarize important aspects of the structure of the ontology, describe sources and types of functional annotations, survey measures of GO annotation similarity, review typical uses of GO and discuss other important considerations pertaining to the use of GO in bioinformatics applications
    corecore