89 research outputs found

    BIBFRAME Transformation for Enhanced Discovery

    Get PDF
    With support from an internal innovation grant of the University of Illinois Library at Urbana-Champaign, researchers transformed and enriched nearly 300,000 e-book records in their library catalog from Machine-Readable Cataloging (MARC) records to Bibliographic Framework (BIBFRAME) linked data resources. Researchers indexed the BIBFRAME resources online, and created two search interfaces for the discovery of BIBFRAME linked data. One result of the grant was the incorporation of BIBFRAME resources within an experimental Bento view of the linked library data for e-books. The end goal of this project is to provide enhanced discovery of library data, bringing like sets of content together in contemporary and easy to understand views assisting users in locating sets of associated bibliographic metadata.University of Illinois Library Innovation FundOpe

    Enacting the Semantic Web: Ontological Orderings, Negotiated Standards, and Human-machine Translations

    Get PDF
    Artificial intelligence (AI) that is based upon semantic search has become one of the dominant means for accessing information in recent years. This is particularly the case in mobile contexts, as search based AI are embedded in each of the major mobile operating systems. The implications are such that information is becoming less a matter of choosing between different sets of results, and more of a presentation of a single answer, limiting both the availability of, and exposure to, alternate sources of information. Thus, it is essential to understand how that information comes to be structured and how deterministic systems like search based AI come to understand the indeterminate worlds they are tasked with interrogating. The semantic web, one of the technologies underpinning these systems, creates machine-readable data from the existing web of text and formalizes those machine-readable understandings in ontologies. This study investigates the ways that those semantic assemblages structure, and thus define, the world. In accordance with assemblage theory, it is necessary to study the interactions between the components that make up such data assemblages. As yet, the social sciences have been slow to systematically investigate data assemblages, the semantic web, and the components of these important socio-technical systems. This study investigates one major ontology, Schema.org. It uses netnographic methods to study the construction and use of Schema.org to determine how ontological states are declared and how human-machine translations occur in those development and use processes. This study has two main findings that bear on the relevant literature. First, I find that development and use of the ontology is a product of negotiations with technical standards such that ontologists and users must work around, with, and through the affordances and constraints of standards. Second, these groups adopt a pragmatic and generalizable approach to data modeling and semantic markup that determines ontological context in local and global ways. This first finding is significant in that past work has largely focused on how people work around standards’ limitations, whereas this shows that practitioners also strategically engage with standards to achieve their aims. Second, the particular approach that these groups use in translating human knowledge to machines, differs from the formalized and positivistic approaches described in past work. At a larger level, this study fills a lacuna in the collective understanding of how data assemblages are constructed and operate

    A structural and quantitative analysis of the webof linked data and its components to perform retrieval data

    Get PDF
    Esta investigación consiste en un análisis cuantitativo y estructural de la Web of Linked Data con el fin de mejorar la búsqueda de datos en distintas fuentes. Para obtener métricas cuantitativas de la Web of Linked Data, se aplicarán técnicas estadísticas. En el caso del análisis estructural haremos un Análisis de Redes Sociales (ARS). Para tener una idea de la Web of Linked Data para poder hacer un análisis, nos ayudaremos del diagrama de la Linking Open Data (LOD) cloud. Este es un catálogo online de datasets cuya información ha sido publicada usando técnicas de Linked Data. Los datasets son publicados en un lenguaje llamado Resource Description Framework (RDF), el cual crea enlaces entre ellos para que la información pudiera ser reutilizada. El objetivo de obtener un análisis cuantitativo y estructural de la Web of Linked Data es mejorar las búsquedas de datos. Para ese propósito nosotros nos aprovecharemos del uso del lenguaje de marcado Schema.org y del proyecto Linked Open Vocabularies (LOV). Schema.org es un conjunto de etiquetas cuyo objetivo es que los Webmasters pudieran marcar sus propias páginas Web con microdata. El microdata es usado para ayudar a los motores de búsqueda y otras herramientas Web a entender mejor la información que estas contienen. LOV es un catálogo para registrar los vocabularios que usan los datasets de la Web of Linked Data. Su objetivo es proporcionar un acceso sencillo a dichos vocabularios. En la investigación, vamos a desarrollar un estudio para la obtención de datos de la Web of Linked Data usando las fuentes mencionadas anteriormente con técnicas de “ontology matching”. En nuestro caso, primeros vamos a mapear Schema.org con LOV, y después LOV con la Web of Linked Data. Un ARS de LOV también ha sido realizado. El objetivo de dicho análisis es obtener una idea cuantitativa y cualitativa de LOV. Sabiendo esto podemos concluir cosas como: cuales son los vocabularios más usados o si están especializados en algún campo o no. Estos pueden ser usados para filtrar datasets o reutilizar información

    Development of a Framework for Ontology Population Using Web Scraping in Mechatronics

    Get PDF
    One of the major challenges in engineering contexts is the efficient collection, management, and sharing of data. To address this problem, semantic technologies and ontologies are potent assets, although some tasks, such as ontology population, usually demand high maintenance effort. This thesis proposes a framework to automate data collection from sparse web resources and insert it into an ontology. In the first place, a product ontology is created based on the combination of several reference vocabularies, namely GoodRelations, the Basic Formal Ontology, ECLASS stan- dard, and an information model. Then, this study introduces a general procedure for developing a web scraping agent to collect data from the web. Subsequently, an algorithm based on lexical similarity measures is presented to map the collected data to the concepts of the ontology. Lastly, the collected data is inserted into the ontology. To validate the proposed solution, this thesis implements the previous steps to collect information about microcontrollers from three differ- ent websites. Finally, the thesis evaluates the use case results, draws conclusions, and suggests promising directions for future research

    A Quantitative Analysis of the Use of Microdata for Semantic Annotations on Educational Resources

    Get PDF
    A current trend in the semantic web is the use of embedded markup formats aimed to semantically enrich web content by making it more understandable to search engines and other applications. The deployment of Microdata as a markup format has increased thanks to the widespread of a controlled vocabulary provided by Schema.org. Recently, a set of properties from the Learning Resource Metadata Initiative (LRMI) specification, which describes educational resources, was adopted by Schema.org. These properties, in addition to those related to accessibility and the license of resources included in Schema.org, would enable search engines to provide more relevant results in searching for educational resources for all users, including users with disabilities. In order to obtain a reliable evaluation of the use of Microdata properties related to the LRMI specification, accessibility, and the license of resources, this research conducted a quantitative analysis of the deployment of these properties in large-scale web corpora covering two consecutive years. The corpora contain hundreds of millions of web pages. The results further our understanding of this deployment in addition to highlighting the pending issues and challenges concerning the use of such properties

    A Semantics-based User Interface Model for Content Annotation, Authoring and Exploration

    Get PDF
    The Semantic Web and Linked Data movements with the aim of creating, publishing and interconnecting machine readable information have gained traction in the last years. However, the majority of information still is contained in and exchanged using unstructured documents, such as Web pages, text documents, images and videos. This can also not be expected to change, since text, images and videos are the natural way in which humans interact with information. Semantic structuring of content on the other hand provides a wide range of advantages compared to unstructured information. Semantically-enriched documents facilitate information search and retrieval, presentation, integration, reusability, interoperability and personalization. Looking at the life-cycle of semantic content on the Web of Data, we see quite some progress on the backend side in storing structured content or for linking data and schemata. Nevertheless, the currently least developed aspect of the semantic content life-cycle is from our point of view the user-friendly manual and semi-automatic creation of rich semantic content. In this thesis, we propose a semantics-based user interface model, which aims to reduce the complexity of underlying technologies for semantic enrichment of content by Web users. By surveying existing tools and approaches for semantic content authoring, we extracted a set of guidelines for designing efficient and effective semantic authoring user interfaces. We applied these guidelines to devise a semantics-based user interface model called WYSIWYM (What You See Is What You Mean) which enables integrated authoring, visualization and exploration of unstructured and (semi-)structured content. To assess the applicability of our proposed WYSIWYM model, we incorporated the model into four real-world use cases comprising two general and two domain-specific applications. These use cases address four aspects of the WYSIWYM implementation: 1) Its integration into existing user interfaces, 2) Utilizing it for lightweight text analytics to incentivize users, 3) Dealing with crowdsourcing of semi-structured e-learning content, 4) Incorporating it for authoring of semantic medical prescriptions

    Web-scale profiling of semantic annotations in HTML pages

    Full text link
    The vision of the Semantic Web was coined by Tim Berners-Lee almost two decades ago. The idea describes an extension of the existing Web in which “information is given well-defined meaning, better enabling computers and people to work in cooperation” [Berners-Lee et al., 2001]. Semantic annotations in HTML pages are one realization of this vision which was adopted by large numbers of web sites in the last years. Semantic annotations are integrated into the code of HTML pages using one of the three markup languages Microformats, RDFa, or Microdata. Major consumers of semantic annotations are the search engine companies Bing, Google, Yahoo!, and Yandex. They use semantic annotations from crawled web pages to enrich the presentation of search results and to complement their knowledge bases. However, outside the large search engine companies, little is known about the deployment of semantic annotations: How many web sites deploy semantic annotations? What are the topics covered by semantic annotations? How detailed are the annotations? Do web sites use semantic annotations correctly? Are semantic annotations useful for others than the search engine companies? And how can semantic annotations be gathered from the Web in that case? The thesis answers these questions by profiling the web-wide deployment of semantic annotations. The topic is approached in three consecutive steps: In the first step, two approaches for extracting semantic annotations from the Web are discussed. The thesis evaluates first the technique of focused crawling for harvesting semantic annotations. Afterward, a framework to extract semantic annotations from existing web crawl corpora is described. The two extraction approaches are then compared for the purpose of analyzing the deployment of semantic annotations in the Web. In the second step, the thesis analyzes the overall and markup language-specific adoption of semantic annotations. This empirical investigation is based on the largest web corpus that is available to the public. Further, the topics covered by deployed semantic annotations and their evolution over time are analyzed. Subsequent studies examine common errors within semantic annotations. In addition, the thesis analyzes the data overlap of the entities that are described by semantic annotations from the same and across different web sites. The third step narrows the focus of the analysis towards use case-specific issues. Based on the requirements of a marketplace, a news aggregator, and a travel portal the thesis empirically examines the utility of semantic annotations for these use cases. Additional experiments analyze the capability of product-related semantic annotations to be integrated into an existing product categorization schema. Especially, the potential of exploiting the diverse category information given by the web sites providing semantic annotations is evaluated

    Report of the Stanford Linked Data Workshop

    No full text
    The Stanford University Libraries and Academic Information Resources (SULAIR) with the Council on Library and Information Resources (CLIR) conducted at week-long workshop on the prospects for a large scale, multi-national, multi-institutional prototype of a Linked Data environment for discovery of and navigation among the rapidly, chaotically expanding array of academic information resources. As preparation for the workshop, CLIR sponsored a survey by Jerry Persons, Chief Information Architect emeritus of SULAIR that was published originally for workshop participants as background to the workshop and is now publicly available. The original intention of the workshop was to devise a plan for such a prototype. However, such was the diversity of knowledge, experience, and views of the potential of Linked Data approaches that the workshop participants turned to two more fundamental goals: building common understanding and enthusiasm on the one hand and identifying opportunities and challenges to be confronted in the preparation of the intended prototype and its operation on the other. In pursuit of those objectives, the workshop participants produced:1. a value statement addressing the question of why a Linked Data approach is worth prototyping;2. a manifesto for Linked Libraries (and Museums and Archives and …);3. an outline of the phases in a life cycle of Linked Data approaches;4. a prioritized list of known issues in generating, harvesting & using Linked Data;5. a workflow with notes for converting library bibliographic records and other academic metadata to URIs;6. examples of potential “killer apps” using Linked Data: and7. a list of next steps and potential projects.This report includes a summary of the workshop agenda, a chart showing the use of Linked Data in cultural heritage venues, and short biographies and statements from each of the participants
    corecore