319 research outputs found

    Exploring scholarly data with Rexplore.

    Get PDF
    Despite the large number and variety of tools and services available today for exploring scholarly data, current support is still very limited in the context of sensemaking tasks, which go beyond standard search and ranking of authors and publications, and focus instead on i) understanding the dynamics of research areas, ii) relating authors ‘semantically’ (e.g., in terms of common interests or shared academic trajectories), or iii) performing fine-grained academic expert search along multiple dimensions. To address this gap we have developed a novel tool, Rexplore, which integrates statistical analysis, semantic technologies, and visual analytics to provide effective support for exploring and making sense of scholarly data. Here, we describe the main innovative elements of the tool and we present the results from a task-centric empirical evaluation, which shows that Rexplore is highly effective at providing support for the aforementioned sensemaking tasks. In addition, these results are robust both with respect to the background of the users (i.e., expert analysts vs. ‘ordinary’ users) and also with respect to whether the tasks are selected by the evaluators or proposed by the users themselves

    TechMiner: Extracting Technologies from Academic Publications

    Get PDF
    In recent years we have seen the emergence of a variety of scholarly datasets. Typically these capture ‘standard’ scholarly entities and their connections, such as authors, affiliations, venues, publications, citations, and others. However, as the repositories grow and the technology improves, researchers are adding new entities to these repositories to develop a richer model of the scholarly domain. In this paper, we introduce TechMiner, a new approach, which combines NLP, machine learning and semantic technologies, for mining technologies from research publications and generating an OWL ontology describing their relationships with other research entities. The resulting knowledge base can support a number of tasks, such as: richer semantic search, which can exploit the technology dimension to support better retrieval of publications; richer expert search; monitoring the emergence and impact of new technologies, both within and across scientific fields; studying the scholarly dynamics associated with the emergence of new technologies; and others. TechMiner was evaluated on a manually annotated gold standard and the results indicate that it significantly outperforms alternative NLP approaches and that its semantic features improve performance significantly with respect to both recall and precision

    A Survey on Linked Data and the Social Web as facilitators for TEL recommender systems

    Get PDF
    Personalisation, adaptation and recommendation are central features of TEL environments. In this context, information retrieval techniques are applied as part of TEL recommender systems to filter and recommend learning resources or peer learners according to user preferences and requirements. However, the suitability and scope of possible recommendations is fundamentally dependent on the quality and quantity of available data, for instance, metadata about TEL resources as well as users. On the other hand, throughout the last years, the Linked Data (LD) movement has succeeded to provide a vast body of well-interlinked and publicly accessible Web data. This in particular includes Linked Data of explicit or implicit educational nature. The potential of LD to facilitate TEL recommender systems research and practice is discussed in this paper. In particular, an overview of most relevant LD sources and techniques is provided, together with a discussion of their potential for the TEL domain in general and TEL recommender systems in particular. Results from highly related European projects are presented and discussed together with an analysis of prevailing challenges and preliminary solutions.LinkedU

    Deployment of RDFa, Microdata, and Microformats on the Web – A Quantitative Analysis

    Get PDF
    More and more websites embed structured data describing for instance products, reviews, blog posts, people, organizations, events, and cooking recipes into their HTML pages using markup standards such as Microformats, Microdata and RDFa. This development has accelerated in the last two years as major Web companies, such as Google, Facebook, Yahoo!, and Microsoft, have started to use the embedded data within their applications. In this paper, we analyze the adoption of RDFa, Microdata, and Microformats across the Web. Our study is based on a large public Web crawl dating from early 2012 and consisting of 3 billion HTML pages which originate from over 40 million websites. The analysis reveals the deployment of the different markup standards, the main topical areas of the published data as well as the different vocabularies that are used within each topical area to represent data. What distinguishes our work from earlier studies, published by the large Web companies, is that the analyzed crawl as well as the extracted data are publicly available. This allows our findings to be verified and to be used as starting points for further domain-specific investigations as well as for focused information extraction endeavors

    Massive-Scale RDF Processing Using Compressed Bitmap Indexes

    Full text link
    The Resource Description Framework (RDF) is a popular data model for representing linked data sets arising from the web, as well as large scienti#12;c data repositories such as UniProt. RDF data intrinsically represents a labeled and directed multi-graph. SPARQL is a query language for RDF that expresses subgraph pattern-#12;nding queries on this implicit multigraph in a SQL- like syntax. SPARQL queries generate complex intermediate join queries; to compute these joins e#14;ciently, we propose a new strategy based on bitmap indexes. We store the RDF data in column-oriented structures as compressed bitmaps along with two dictionaries. This paper makes three new contributions. (i) We present an e#14;cient parallel strategy for parsing the raw RDF data, building dictionaries of unique entities, and creating compressed bitmap indexes of the data. (ii) We utilize the constructed bitmap indexes to e#14;ciently answer SPARQL queries, simplifying the join evaluations. (iii) To quantify the performance impact of using bitmap indexes, we compare our approach to the state-of-the-art triple-store RDF-3X. We #12;nd that our bitmap index-based approach to answering queries is up to an order of magnitude faster for a variety of SPARQL queries, on gigascale RDF data sets

    SRBench: A streaming RDF/SPARQL benchmark

    Full text link
    We introduce SRBench, a general-purpose benchmark primarily designed for streaming RDF/SPARQL engines, completely based on real-world data sets from the Linked Open Data cloud. With the increasing problem of too much streaming data but not enough tools to gain knowledge from them, researchers have set out for solutions in which Semantic Web technologies are adapted and extended for publishing, sharing, analysing and understanding streaming data. To help researchers and users comparing streaming RDF/SPARQL (strRS) engines in a standardised application scenario, we have designed SRBench, with which one can assess the abilities of a strRS engine to cope with a broad range of use cases typically encountered in real-world scenarios. The data sets used in the benchmark have been carefully chosen, such that they represent a realistic and relevant usage of streaming data. The benchmark defines a concise, yet omprehensive set of queries that cover the major aspects of strRS processing. Finally, our work is complemented with a functional evaluation on three representative strRS engines: SPARQLStream, C-SPARQL and CQELS. The presented results are meant to give a first baseline and illustrate the state-of-the-art

    Key Ingredients for Your Next Semantics Elevator Talk

    Get PDF
    Abstract. 2012 brought a major change to the semantics research com-munity. Discussions on the use and benefits of semantic technologies are shifting away from the why to the how. Surprisingly this more in stake-holder interest is not accompanied by a more detailed understanding of what semantics research is about. Instead of blaming others for their (wrong) expectations, we need to learn how to emphasize the paradigm shift proposed by semantics research while abstracting from technical details and advocate the added value in a way that relates to the im-mediate needs of individual stakeholders without overselling. This paper highlights some of the major ingredients to prepare your next Semantic

    A Survey of Volunteered Open Geo-Knowledge Bases in the Semantic Web

    Full text link
    Over the past decade, rapid advances in web technologies, coupled with innovative models of spatial data collection and consumption, have generated a robust growth in geo-referenced information, resulting in spatial information overload. Increasing 'geographic intelligence' in traditional text-based information retrieval has become a prominent approach to respond to this issue and to fulfill users' spatial information needs. Numerous efforts in the Semantic Geospatial Web, Volunteered Geographic Information (VGI), and the Linking Open Data initiative have converged in a constellation of open knowledge bases, freely available online. In this article, we survey these open knowledge bases, focusing on their geospatial dimension. Particular attention is devoted to the crucial issue of the quality of geo-knowledge bases, as well as of crowdsourced data. A new knowledge base, the OpenStreetMap Semantic Network, is outlined as our contribution to this area. Research directions in information integration and Geographic Information Retrieval (GIR) are then reviewed, with a critical discussion of their current limitations and future prospects

    Documentation FiFoSiM: Integrated Tax Benefit Microsimulation and CGE Model

    Full text link
    ABSTRACT: This paper describes FiFoSiM, the integrated tax benefit microsimulation and computable general equilibrium (CGE) model of the Center of Public Economics at the University of Cologne. FiFoSiM consists of three main parts. The first part is a static tax benefit microsimulation module. The second part adds a behavioural component to the model: an econometrically estimated labour supply model. The third module is a CGE model which allows the user of FiFoSiM to assess the global economic effects of policy measures. Two specific features distinguish FiFoSiM from other tax benefit models: First, the simultaneous use of two databases for the tax benefit module and second, the linkage of the tax benefit model to a CGE model
    corecore