319 research outputs found
Exploring scholarly data with Rexplore.
Despite the large number and variety of tools and services available today for exploring scholarly data, current support is still very limited in the context of sensemaking tasks, which go beyond standard search and ranking of authors and publications, and focus instead on i) understanding the dynamics of research areas, ii) relating authors ‘semantically’ (e.g., in terms of common interests or shared academic trajectories), or iii) performing fine-grained academic expert search along multiple dimensions. To address this gap we have developed a novel tool, Rexplore, which integrates statistical analysis, semantic technologies, and visual analytics to provide effective support for exploring and making sense of scholarly data. Here, we describe the main innovative elements of the tool and we present the results from a task-centric empirical evaluation, which shows that Rexplore is highly effective at providing support for the aforementioned sensemaking tasks. In addition, these results are robust both with respect to the background of the users (i.e., expert analysts vs. ‘ordinary’ users) and also with respect to whether the tasks are selected by the evaluators or proposed by the users themselves
TechMiner: Extracting Technologies from Academic Publications
In recent years we have seen the emergence of a variety of scholarly datasets. Typically these capture ‘standard’ scholarly entities and their connections, such as authors, affiliations, venues, publications, citations, and others. However, as the repositories grow and the technology improves, researchers are adding new entities to these repositories to develop a richer model of the scholarly domain. In this paper, we introduce TechMiner, a new approach, which combines NLP, machine learning and semantic technologies, for mining technologies from research publications and generating an OWL ontology describing their relationships with other research entities. The resulting knowledge base can support a number of tasks, such as: richer semantic search, which can exploit the technology dimension to support better retrieval of publications; richer expert search; monitoring the emergence and impact of new technologies, both within and across scientific fields; studying the scholarly dynamics associated with the emergence of new technologies; and others. TechMiner was evaluated on a manually annotated gold standard and the results indicate that it significantly outperforms alternative NLP approaches and that its semantic features improve performance significantly with respect to both recall and precision
A Survey on Linked Data and the Social Web as facilitators for TEL recommender systems
Personalisation, adaptation and recommendation are central features
of TEL environments. In this context, information retrieval techniques are applied
as part of TEL recommender systems to filter and recommend learning resources
or peer learners according to user preferences and requirements. However,
the suitability and scope of possible recommendations is fundamentally
dependent on the quality and quantity of available data, for instance, metadata
about TEL resources as well as users. On the other hand, throughout the last
years, the Linked Data (LD) movement has succeeded to provide a vast body of
well-interlinked and publicly accessible Web data. This in particular includes
Linked Data of explicit or implicit educational nature. The potential of LD to
facilitate TEL recommender systems research and practice is discussed in this
paper. In particular, an overview of most relevant LD sources and techniques is
provided, together with a discussion of their potential for the TEL domain in
general and TEL recommender systems in particular. Results from highly related
European projects are presented and discussed together with an analysis of
prevailing challenges and preliminary solutions.LinkedU
Deployment of RDFa, Microdata, and Microformats on the Web – A Quantitative Analysis
More and more websites embed structured data describing for instance
products, reviews, blog posts, people, organizations, events, and cooking recipes
into their HTML pages using markup standards such as Microformats, Microdata
and RDFa. This development has accelerated in the last two years as major Web
companies, such as Google, Facebook, Yahoo!, and Microsoft, have started to
use the embedded data within their applications. In this paper, we analyze the
adoption of RDFa, Microdata, and Microformats across the Web. Our study is
based on a large public Web crawl dating from early 2012 and consisting of 3
billion HTML pages which originate from over 40 million websites. The analysis
reveals the deployment of the different markup standards, the main topical areas
of the published data as well as the different vocabularies that are used within each
topical area to represent data. What distinguishes our work from earlier studies,
published by the large Web companies, is that the analyzed crawl as well as the
extracted data are publicly available. This allows our findings to be verified and to
be used as starting points for further domain-specific investigations as well as for
focused information extraction endeavors
Massive-Scale RDF Processing Using Compressed Bitmap Indexes
The Resource Description Framework (RDF) is a popular data model for representing linked data sets arising from the web, as well as large scienti#12;c data repositories such as UniProt. RDF data intrinsically represents a labeled and directed multi-graph. SPARQL is a query language for RDF that expresses subgraph pattern-#12;nding queries on this implicit multigraph in a SQL- like syntax. SPARQL queries generate complex intermediate join queries; to compute these joins e#14;ciently, we propose a new strategy based on bitmap indexes. We store the RDF data in column-oriented structures as compressed bitmaps along with two dictionaries. This paper makes three new contributions. (i) We present an e#14;cient parallel strategy for parsing the raw RDF data, building dictionaries of unique entities, and creating compressed bitmap indexes of the data. (ii) We utilize the constructed bitmap indexes to e#14;ciently answer SPARQL queries, simplifying the join evaluations. (iii) To quantify the performance impact of using bitmap indexes, we compare our approach to the state-of-the-art triple-store RDF-3X. We #12;nd that our bitmap index-based approach to answering queries is up to an order of magnitude faster for a variety of SPARQL queries, on gigascale RDF data sets
SRBench: A streaming RDF/SPARQL benchmark
We introduce SRBench, a general-purpose benchmark primarily designed for streaming RDF/SPARQL engines, completely based on real-world data sets from the Linked Open Data cloud. With the increasing problem of too much streaming data but not enough tools to gain knowledge from them, researchers have set out for solutions in which Semantic Web technologies are adapted and extended for publishing, sharing, analysing and understanding streaming data. To help researchers and users comparing streaming RDF/SPARQL (strRS) engines in a standardised application scenario, we have designed SRBench, with which one can assess the abilities of a strRS engine to cope with a broad range of use cases typically encountered in real-world scenarios. The data sets used in the benchmark have been carefully chosen, such that they represent a realistic and relevant usage of streaming data. The benchmark defines a concise, yet omprehensive set of queries that cover the major aspects of strRS processing. Finally, our work is complemented with a functional evaluation on three representative strRS engines: SPARQLStream, C-SPARQL and CQELS. The presented results are meant to give a first baseline and illustrate the state-of-the-art
Key Ingredients for Your Next Semantics Elevator Talk
Abstract. 2012 brought a major change to the semantics research com-munity. Discussions on the use and benefits of semantic technologies are shifting away from the why to the how. Surprisingly this more in stake-holder interest is not accompanied by a more detailed understanding of what semantics research is about. Instead of blaming others for their (wrong) expectations, we need to learn how to emphasize the paradigm shift proposed by semantics research while abstracting from technical details and advocate the added value in a way that relates to the im-mediate needs of individual stakeholders without overselling. This paper highlights some of the major ingredients to prepare your next Semantic
A Survey of Volunteered Open Geo-Knowledge Bases in the Semantic Web
Over the past decade, rapid advances in web technologies, coupled with
innovative models of spatial data collection and consumption, have generated a
robust growth in geo-referenced information, resulting in spatial information
overload. Increasing 'geographic intelligence' in traditional text-based
information retrieval has become a prominent approach to respond to this issue
and to fulfill users' spatial information needs. Numerous efforts in the
Semantic Geospatial Web, Volunteered Geographic Information (VGI), and the
Linking Open Data initiative have converged in a constellation of open
knowledge bases, freely available online. In this article, we survey these open
knowledge bases, focusing on their geospatial dimension. Particular attention
is devoted to the crucial issue of the quality of geo-knowledge bases, as well
as of crowdsourced data. A new knowledge base, the OpenStreetMap Semantic
Network, is outlined as our contribution to this area. Research directions in
information integration and Geographic Information Retrieval (GIR) are then
reviewed, with a critical discussion of their current limitations and future
prospects
Documentation FiFoSiM: Integrated Tax Benefit Microsimulation and CGE Model
ABSTRACT: This paper describes FiFoSiM, the integrated tax benefit microsimulation and computable general equilibrium (CGE) model of the Center of Public Economics at the University of Cologne. FiFoSiM consists of three main parts. The first part is a static tax benefit microsimulation module. The second part adds a behavioural component to the model: an econometrically estimated labour supply model. The third module is a CGE model which allows the user of FiFoSiM to assess the global economic effects of policy measures. Two specific features distinguish FiFoSiM from other tax benefit models: First, the simultaneous use of two databases for the tax benefit module and second, the linkage of the tax benefit model to a CGE model
- …