11 research outputs found

    In Search of Reusable Educational Resources in the Web

    Full text link
    [EN] Nowadays there is a high demand from teachers to precisely find online learning resources that are free from copyright restrictions or publicly licensed to use, adapt and redistribute in their own courses. This paper investigates the state of the art to support teachers in this search process. Repository based strategies for dissemination of educational resources are discussed and critiqued and the added value of a semantic web approach is shown. The ontology schema.org and its suitability for semantic annotation of educational resources is introduced. Current ways and weaknesses to discover educational resources based on appropriate semantic data are presented. The possibility to use the wisdom of the crowd of learners and teachers defining semantic knowledge about used learning resources is addressed. For demonstration purposes within all sections the course subject ‘Semantic SEO’, dealt in the course ‘SEO – Search Engine Optimization’ held by the author in 2016, is used.Steinberger, C. (2017). In Search of Reusable Educational Resources in the Web. En Proceedings of the 3rd International Conference on Higher Education Advances. Editorial Universitat Politècnica de València. 321-328. https://doi.org/10.4995/HEAD17.2017.518632132

    Il Wikibase data model per la pubblicazione dei dati bibliografici sul web semantico. Una sperimentazione presso la Biblioteca Nazionale Centrale di Firenze

    Get PDF
    Under the supervision of Giovanni Bergamin, Mauro Guerrini and Chiara Storti, the thesis aims to test the potentialities of using the software and the data model of Wikibase for the publication of bibliographic data as LOD, to provide them with the interoperability that MARC formats do not support. As case study we have selected a limited number of UNIMARC elements and a sample of records from the OPAC of the Biblioteca Nazionale Centrale di Firenze. We have operated a syntactical restructuring of each UNIMARC element (in agreement with Giovanni Bergamin, Cristian Bacchi, New ways of creating and sharing bibliographic information: an experiment of using the Wikibase data model for UNIMARC data, «JLIS.it», vol. 9, n. 3, 2018) and a mapping of them with many other ontologies (RDA, BIBFRAME, Dublin Core and Schema.org). The goal is to evaluate the impact of this transformation of bibliographic data in order to optimize their visibility on the semantic web

    Il Wikibase data model per la pubblicazione dei dati bibliografici sul web semantico. Una sperimentazione presso la Biblioteca Nazionale Centrale di Firenze

    Get PDF
    Under the supervision of Giovanni Bergamin, Mauro Guerrini and Chiara Storti, the thesis aims to test the potentialities of using the software and the data model of Wikibase for the publication of bibliographic data as LOD, to provide them with the interoperability that MARC formats do not support. As case study we have selected a limited number of UNIMARC elements and a sample of records from the OPAC of the Biblioteca Nazionale Centrale di Firenze. We have operated a syntactical restructuring of each UNIMARC element (in agreement with Giovanni Bergamin, Cristian Bacchi, New ways of creating and sharing bibliographic information: an experiment of using the Wikibase data model for UNIMARC data, «JLIS.it», vol. 9, n. 3, 2018) and a mapping of them with many other ontologies (RDA, BIBFRAME, Dublin Core and Schema.org). The goal is to evaluate the impact of this transformation of bibliographic data in order to optimize their visibility on the semantic web

    The universal ontology: A vision for conceptual modeling and the semantic web

    Get PDF
    This paper puts forward a vision of a universal ontology (UO) aiming at solving, or at least greatly alleviating, the semantic integration problem in the field of conceptual modeling and the understandability problem in the field of the semantic web. So far it has been assumed that the UO is not feasible in practice, but we think that it is time to revisit that assumption in the light of the current state-of-the-art. This paper aims to be a step in this direction. We try to make an initial proposal of a feasible UO. We present the scope of the UO, the kinds of its concepts, and the elements that could comprise the specification of each concept. We propose a modular structure for the UO consisting of four levels. We argue that the UO needs a complete set of concept composition operators, and we sketch three of them. We also tackle a few issues related to the feasibility of the UO, which we think that they could be surmountable. Finally, we discuss the desirability of the UO, and we explain why we conjecture that there are already organizations that have the knowledge and resources needed to develop it, and that might have an interest in its development in the near future.Peer ReviewedPostprint (author's final draft

    Conceptual Navigation in Large Knowledge Graphs

    Get PDF
    International audienceA growing part of Big Data is made of knowledge graphs. Major knowledge graphs such as Wikidata, DBpedia or the Google Knowledge Graph count millions of entities and billions of semantic links. A major challenge is to enable their exploration and querying by end-users. The SPARQL query language is powerful but provides no support for exploration by endusers. Question answering is user-friendly but is limited in expressivity and reliability. Navigation in concept lattices supports exploration but is limited in expressivity and scalability. In this paper, we introduce a new exploration and querying paradigm, Abstract Conceptual Navigation (ACN), that merges querying and navigation in order to reconcile expressivity, usability, and scalability. ACN is founded on Formal Concept Analysis (FCA) by defining the navigation space as a concept lattice. We then instantiate the ACN paradigm to knowledge graphs (Graph-ACN) by relying on Graph-FCA, an extension of FCA to knowledge graphs. We continue by detailing how Graph-ACN can be efficiently implemented on top of SPARQL endpoints, and how its expressivity can be increased in a modular way. Finally, we present a concrete implementation available online, Sparklis, and a few application cases on large knowledge graphs

    Enriching and validating geographic information on the web

    Get PDF
    The continuous growth of available data on the World Wide Web has led to an unprecedented amount of available information. However, the enormous variance in data quality and trustworthiness of information sources impairs the great potential of the large amount of vacant information. This observation especially applies to geographic information on the Web, i.e., information describing entities that are located on the Earth’s surface. With the advent of mobile devices, the impact of geographic Web information on our everyday life has substantially grown. The mobile devices have also enabled the creation of novel data sources such as OpenStreetMap (OSM), a collaborative crowd-sourced map providing open cartographic information. Today, we use geographic information in many applications, including routing, location recommendation, or geographic question answering. The processing of geographic Web information yields unique challenges. First, the descriptions of geographic entities on the Web are typically not validated. Since not all Web information sources are trustworthy, the correctness of some geographic Web entities is questionable. Second, geographic information sources on the Web are typically isolated from each other. The missing integration of information sources hinders the efficient use of geographic Web information for many applications. Third, the description of geographic entities is typically incomplete. Depending on the application, missing information is a decisive criterion for (not) using a particular data source. Due to the large scale of the Web, the manual correction of these problems is usually not feasible such that automated approaches are required. In this thesis, we tackle these challenges from three different angles. (i) Validation of geographic Web information: We validate geographic Web information by detecting vandalism in OpenStreetMap, for instance, the replacement of a street name with advertisement. To this end, we present the OVID model for automated vandalism detection in OpenStreetMap. (ii) Enrichment of geographic Web information through integration: We integrate OpenStreetMap with other geographic Web information sources, namely knowledge graphs, by identifying entries corresponding to the same world real-world entities in both data sources. We present the OSM2KG model for automated identity link discovery between OSM and knowledge graphs. (iii) Enrichment of missing information in geographic Web information: We consider semantic annotations of geographic entities on Web pages as an additional data source. We exploit existing annotations of categorical properties of Web entities as training data to enrich missing categorical properties in geographic Web information. For all of the proposed models, we conduct extensive evaluations on real-world datasets. Our experimental results confirm that the proposed solutions reliably outperform existing baselines. Furthermore, we demonstrate the utility of geographic Web Information in two application scenarios. (i) Corpus of geographic entity embeddings: We introduce the GeoVectors corpus, a linked open dataset of ready-to-use embeddings of geographic entities. With GeoVectors, we substantially lower the burden to use geographic data in machine learning applications. (ii) Application to event impact prediction: We employ several geographic Web information sources to predict the impact of public events on road traffic. To this end, we use cartographic, event, and event venue information from the Web.Durch die kontinuierliche Zunahme verfügbarer Daten im World Wide Web, besteht heute eine noch nie da gewesene Menge verfügbarer Informationen. Das große Potential dieser Daten wird jedoch durch hohe Schwankungen in der Datenqualität und in der Vertrauenswürdigkeit der Datenquellen geschmälert. Dies kann vor allem am Beispiel von geografischen Web-Informationen beobachtet werden. Geografische Web-Informationen sind Informationen über Entitäten, die über Koordinaten auf der Erdoberfläche verfügen. Die Relevanz von geografischen Web-Informationen für den Alltag ist durch die Verbreitung von internetfähigen, mobilen Endgeräten, zum Beispiel Smartphones, extrem gestiegen. Weiterhin hat die Verfügbarkeit der mobilen Endgeräte auch zur Erstellung neuartiger Datenquellen wie OpenStreetMap (OSM) geführt. OSM ist eine offene, kollaborative Webkarte, die von Freiwilligen dezentral erstellt wird. Mittlerweile ist die Nutzung geografischer Informationen die Grundlage für eine Vielzahl von Anwendungen, wie zum Beispiel Navigation, Reiseempfehlungen oder geografische Frage-Antwort-Systeme. Bei der Verarbeitung geografischer Web-Informationen müssen einzigartige Herausforderungen berücksichtigt werden. Erstens werden die Beschreibungen geografischer Web-Entitäten typischerweise nicht validiert. Da nicht alle Informationsquellen im Web vertrauenswürdig sind, ist die Korrektheit der Darstellung mancher Web-Entitäten fragwürdig. Zweitens sind Informationsquellen im Web oft voneinander isoliert. Die fehlende Integration von Informationsquellen erschwert die effektive Nutzung von geografischen Web-Information in vielen Anwendungsfällen. Drittens sind die Beschreibungen von geografischen Entitäten typischerweise unvollständig. Je nach Anwendung kann das Fehlen von bestimmten Informationen ein entscheidendes Kriterium für die Nutzung einer Datenquelle sein. Da die Größe des Webs eine manuelle Behebung dieser Probleme nicht zulässt, sind automatisierte Verfahren notwendig. In dieser Arbeit nähern wir uns diesen Herausforderungen von drei verschiedenen Richtungen. (i) Validierung von geografischen Web-Informationen: Wir validieren geografische Web-Informationen, indem wir Vandalismus in OpenStreetMap identifizieren, zum Beispiel das Ersetzen von Straßennamen mit Werbetexten. (ii) Anreicherung von geografischen Web-Information durch Integration: Wir integrieren OpenStreetMap mit anderen Informationsquellen im Web (Wissensgraphen), indem wir Einträge in beiden Informationsquellen identifizieren, die den gleichen Echtwelt-Entitäten entsprechen. (iii) Anreicherung von fehlenden geografischen Informationen: Wir nutzen semantische Annotationen von geografischen Entitäten auf Webseiten als weitere Datenquelle. Wir nutzen existierende Annotationen kategorischer Attribute von Web-Entitäten als Trainingsdaten, um fehlende kategorische Attribute in geografischen Web-Informationen zu ergänzen. Wir führen ausführliche Evaluationen für alle beschriebenen Modelle durch. Die vorgestellten Lösungsansätze erzielen verlässlich bessere Ergebnisse als existierende Ansätze. Weiterhin demonstrieren wir den Nutzen von geografischen Web-Informationen in zwei Anwendungsszenarien. (i) Korpus mit Embeddings von geografischen Entitäten: Wir stellen den GeoVectors-Korpus vor, einen verlinkten, offenen Datensatz mit direkt nutzbaren Embeddings von geografischen Web-Entitäten. Der GeoVectors-Korpus erleichtert die Nutzung von geografischen Daten in Anwendungen von maschinellen Lernen erheblich. (ii) Anwendung zur Prognose von Veranstaltungsauswirkungen: Wir nutzen Karten-, Veranstaltungs- und Veranstaltungsstätten-Daten aus dem Web, um die Auswirkungen von Veranstaltungen auf den Straßenverkehr zu prognostizieren

    A proposal for a semantic digital edition of Federico da Montefeltro's biography by Vespasiano da Bisticci

    Get PDF
    Il patrimonio culturale è l’espressione della comunità a cui si riferisce e il digitale può essere un valido strumento per raccontare le storie relative ai beni culturali affinché siano, non solo studiati, ma anche recepiti nel loro significato più profondo da più pubblici. L’inserimento di testi manoscritti sul web utilizzando le tecnologie dei Linked Data facilitano la fruizione del testo da parte dell’utente non specializzato e la creazione di strumenti per la ricerca. La proposta di digitalizzazione della tesi ha come oggetto la vita di Federico da Montefeltro scritta da Vespasiano da Bisticci utilizzando i vocabolari schema.org, FOAF e Relationship per la marcatura del testo e i Content Management System per la pubblicazione dei dati. In questo modo sarà possibile avere un sito web in cui potrà essere curato anche l’aspetto grafico seguendo le regole della user experience e dell’information achitecture per valorizzare le figure del duca di Urbino e del cartolaio fiorentino.Cultural heritage is the expression of the community to which it refers, and digital technologies can be a valid tool for telling stories related to cultural heritage. In this way, the stories are not only studied but also received in their deepest meaning by several audiences. Publishing handwritten texts on the web using Linked Data technologies facilitates the use of the text by the unskilled user and the creation of tools for scholars and researchers. The core of the thesis is a proposal of a semantic digital edition of the biography of Federico da Montefeltro written by Vespasiano da Bisticci using the vocabularies schema.org, FOAF, and Relationship for marking the text, and the Content Management Systems for the publication of the data. In this way it will be possible to have a website where the graphic aspect can also be implemented by following the rules of user experience and information architecture, to enhance the figures of the Duke of Urbino and the Florentine stationer

    Reducing the labeling effort for entity resolution using distant supervision and active learning

    Full text link
    Entity resolution is the task of identifying records in one or more data sources which refer to the same real-world object. It is often treated as a supervised binary classification task in which a labeled set of matching and non-matching record pairs is used for training a machine learning model. Acquiring labeled data for training machine learning models is expensive and time-consuming, as it typically involves one or more human annotators who need to manually inspect and label the data. It is thus considered a major limitation of supervised entity resolution methods. In this thesis, we research two approaches, relying on distant supervision and active learning, for reducing the labeling effort involved in constructing training sets for entity resolution tasks with different profiling characteristics. Our first approach investigates the utility of semantic annotations found in HTML pages as a source of distant supervision. We profile the adoption growth of semantic annotations over multiple years and focus on product-related schema.org annotations. We develop a pipeline for cleansing and grouping semantically annotated offers describing the same products, thus creating the WDC Product Corpus, the largest publicly available training set for entity resolution. The high predictive performance of entity resolution models trained on offer pairs from the WDC Product Corpus clearly demonstrates the usefulness of semantic annotations as distant supervision for product-related entity resolution tasks. Our second approach focuses on active learning techniques, which have been widely used for reducing the labeling effort for entity resolution in related work. Yet, we identify two research gaps: the inefficient initialization of active learning and the lack of active learning methods tailored to multi-source entity resolution. We address the first research gap by developing an unsupervised method for initializing and further assisting the complete active learning workflow. Compared to active learning baselines that use random sampling or transfer learning for initialization, our method guarantees high anytime performance within a limited labeling budget for tasks with different profiling characteristics. We address the second research gap by developing ALMSER, the first active learning method which uses signals inherent to multi-source entity resolution tasks for query selection and model training. Our evaluation results indicate that exploiting such signals for query selection alone has a varying effect on model performance across different multi-source entity resolution tasks. We further investigate this finding by analyzing the impact of the profiling characteristics of multi-source entity resolution tasks on the performance of active learning methods which use different signals for query selection

    Google search and the mediation of digital health information: a case study on unproven stem cell treatments

    Get PDF
    Google Search occupies a unique space within broader discussions of direct-to-consumer marketing of stem cell treatments in digital spaces. For patients, researchers, regulators, and the wider public, the search platform influences the who, what, where, and why of stem cell treatment information online. Ubiquitous and opaque, Google Search mediates which users are presented what types of content when these stakeholders engage in online searches around health information. The platform also sways the activities of content producers and the characteristics of the content they produce. For those seeking and studying information on digital health, this platform influence raises difficult questions around risk, authority, intervention, and oversight. This thesis addresses a critical gap in digital methodologies used in mapping and characterising that influence as part of wider debates around algorithmic accountability within STS and digital health scholarship. By adopting a novel methodological approach to Blackbox auditing and data collection, I provide a unique evidentiary base for the analysis of ads, organic results, and the platform mechanisms of influence on queries related to stem cell treatments. I explore the question: how does Google Search mediate information that people access online about ‘proven’ and ‘unproven’ stem cell treatments? Here I show that, in spite of a general ban on advertisements of stem cell treatments, users continue to be presented with content promoting unproven treatments. The types, frequency, and commercial intent of results related to stem cell treatments shifted across user groups including geography and, more troublingly, those impacted by Parkinson’s Disease and Multiple Sclerosis. Additionally, I find evidence that the technological structure of Google Search itself enables primary and secondary commercial activities around the mediation and dissemination of health information online. It suggests that Google Search’s algorithmically-mediated rendering of search results – including both commercial and non-commercial activities - has critical implications for the present and future of digital health studies
    corecore