9 research outputs found

    Facilitating Scientometrics in Learning Analytics and Educational Data Mining – the LAK Dataset

    Get PDF
    The Learning Analytics and Knowledge (LAK) Dataset represents an unprecedented corpus which exposes a near complete collection of bibliographic resources for a specific research discipline, namely the connected areas of Learning Analytics and Educational Data Mining. Covering over five years of scientific literature from the most relevant conferences and journals, the dataset provides Linked Data about bibliographic metadata as well as full text of the paper body. The latter was enabled through special licensing agreements with ACM for publications not yet available through open access. The dataset has been designed following established Linked Data pattern, reusing established vocabularies and providing links to established schemas and entity coreferences in related datasets. Given the temporal and topic coverage of the dataset, being a near-complete corpus of research publications of a particular discipline, it facilitates scientometric investigations, for instance, about the evolution of a scientific field over time, or correlations with other disciplines, what is documented through its usage in a wide range of scientific studies and applications

    LODNav – An Interactive Visualization of the Linking Open Data Cloud

    Get PDF
    The emergence of the Linking Open Data Cloud (LODC) is an example of the adoption of Linked Data principles and the creation of a Web of Data. There is an increasing amount of information linked across member datasets of the LODC by means of RDF links, yet there is little support for a human to understand which datasets are connected to one another. This research presents a novel approach for understanding these interconnections with the publicly accessible tool LODNav – Linking Open Data Navigator. LODNav provides a visualization metaphor of the LODC by positioning member datasets of the LODC on a world map based on the geographical location of the dataset. This interactive tool aims to provide a dynamic up-to-date visualization of the LODC and allows the extraction of information about the datasets as well as their interconnections as RDF data

    Uso de datos enlazados para la publicación e integración de datos de índole académico

    Get PDF
    Hoy en día existe una gran cantidad de fuentes de información bibliográfica y de repositorios institucionales abiertos en línea. Estas fuentes, independientes, heterogéneas y distribuidas, suelen representar sus datos de diferente forma y brindar acceso a través de distintos mecanismos o protocolos. Además existe el grave problema de que no es costumbre identificar de forma unívoca a los autores de las publicaciones, a pesar que esto se ha comenzado a solucionar por el uso de ORCID, su utilización no es aún extendida fuera de los ámbitos de algunos servicios de publicación y no es para nada utilizado todavía en los ámbitos educativos. El mayor problema ocurre al integrar datos de fuentes de publicaciones científicas con fuentes como páginas web personales o institucionales o espacios de creaciones de materiales donde acostumbran trabajar los docentes-investigadores. Es en este escenario de docentes-investigadores que este trabajo estudia el ciclo de vida de la publicación de Linked Data (Datos Enlazados) como una forma de resolver el problema de integración de datos de publicaciones científicas. Este trabajo presenta un análisis de los conceptos de la web semántica aplicados a la publicación de Datos Enlazados y una revisión de las metodologías, recomendaciones y buenas prácticas existentes para la publicación de Datos Enlazados en la web. Estas guías y recomendaciones son utilizadas como base para el análisis de dos casos de estudio que se presentan, ambos de características diferentes, como lo son los libros de texto creados en la plataforma CNX.org, y la publicación, integración y análisis de las publicaciones científicas producidas por los docentes del Instituto de Computación de la Facultad de Ingeniería (UdelaR). En este último caso se publicaron como Datos Enlazados la lista de docentes publicada en el sitio web de la institución y las bases bibliográficas disponibles en el sitio web de FIng y en DBLP. Se diseñaron y ejecutaron procesos de detección de enlaces y resolución de identidad entre las tres fuentes y se presenta a la vez un estudio analítico a partir del uso de los Datos Enlazados

    Enabling automatic provenance-based trust assessment of web content

    Get PDF

    Exploring semantic relationships in the web of data

    Get PDF

    Statistical Extraction of Multilingual Natural Language Patterns for RDF Predicates: Algorithms and Applications

    Get PDF
    The Data Web has undergone a tremendous growth period. It currently consists of more then 3300 publicly available knowledge bases describing millions of resources from various domains, such as life sciences, government or geography, with over 89 billion facts. In the same way, the Document Web grew to the state where approximately 4.55 billion websites exist, 300 million photos are uploaded on Facebook as well as 3.5 billion Google searches are performed on average every day. However, there is a gap between the Document Web and the Data Web, since for example knowledge bases available on the Data Web are most commonly extracted from structured or semi-structured sources, but the majority of information available on the Web is contained in unstructured sources such as news articles, blog post, photos, forum discussions, etc. As a result, data on the Data Web not only misses a significant fragment of information but also suffers from a lack of actuality since typical extraction methods are time-consuming and can only be carried out periodically. Furthermore, provenance information is rarely taken into consideration and therefore gets lost in the transformation process. In addition, users are accustomed to entering keyword queries to satisfy their information needs. With the availability of machine-readable knowledge bases, lay users could be empowered to issue more specific questions and get more precise answers. In this thesis, we address the problem of Relation Extraction, one of the key challenges pertaining to closing the gap between the Document Web and the Data Web by four means. First, we present a distant supervision approach that allows finding multilingual natural language representations of formal relations already contained in the Data Web. We use these natural language representations to find sentences on the Document Web that contain unseen instances of this relation between two entities. Second, we address the problem of data actuality by presenting a real-time data stream RDF extraction framework and utilize this framework to extract RDF from RSS news feeds. Third, we present a novel fact validation algorithm, based on natural language representations, able to not only verify or falsify a given triple, but also to find trustworthy sources for it on the Web and estimating a time scope in which the triple holds true. The features used by this algorithm to determine if a website is indeed trustworthy are used as provenance information and therewith help to create metadata for facts in the Data Web. Finally, we present a question answering system that uses the natural language representations to map natural language question to formal SPARQL queries, allowing lay users to make use of the large amounts of data available on the Data Web to satisfy their information need

    Methods for Matching of Linked Open Social Science Data

    Get PDF
    In recent years, the concept of Linked Open Data (LOD), has gained popularity and acceptance across various communities and domains. Science politics and organizations claim that the potential of semantic technologies and data exposed in this manner may support and enhance research processes and infrastructures providing research information and services. In this thesis, we investigate whether these expectations can be met in the domain of the social sciences. In particular, we analyse and develop methods for matching social scientific data that is published as Linked Data, which we introduce as Linked Open Social Science Data. Based on expert interviews and a prototype application, we investigate the current consumption of LOD in the social sciences and its requirements. Following these insights, we first focus on the complete publication of Linked Open Social Science Data by extending and developing domain-specific ontologies for representing research communities, research data and thesauri. In the second part, methods for matching Linked Open Social Science Data are developed that address particular patterns and characteristics of the data typically used in social research. The results of this work contribute towards enabling a meaningful application of Linked Data in a scientific domain
    corecore