756 research outputs found
RDF-TR: Exploiting structural redundancies to boost RDF compression
The number and volume of semantic data have grown impressively over the last decade, promoting compression as an essential tool for RDF preservation, sharing and management. In contrast to universal compressors, RDF compression techniques are able to detect and exploit specific forms of redundancy in RDF data. Thus, state-of-the-art RDF compressors excel at exploiting syntactic and semantic redundancies, i.e., repetitions in the serialization format and information that can be inferred implicitly. However, little attention has been paid to the existence of structural patterns within the RDF dataset; i.e. structural redundancy. In this paper, we analyze structural regularities in real-world datasets, and show three schema-based sources of redundancies that underpin the schema-relaxed nature of RDF. Then, we propose RDF-Tr (RDF Triples Reorganizer), a preprocessing technique that discovers and removes this kind of redundancy before the RDF dataset is effectively compressed. In particular, RDF-Tr groups subjects that are described by the same predicates, and locally re-codes the objects related to these predicates. Finally, we integrate
RDF-Tr with two RDF compressors, HDT and k2-triples. Our experiments show that using RDF-Tr with these compressors improves by up to 2.3 times their original effectiveness, outperforming the most prominent state-of-the-art techniques
Compressed k2-Triples for Full-In-Memory RDF Engines
Current "data deluge" has flooded the Web of Data with very large RDF
datasets. They are hosted and queried through SPARQL endpoints which act as
nodes of a semantic net built on the principles of the Linked Data project.
Although this is a realistic philosophy for global data publishing, its query
performance is diminished when the RDF engines (behind the endpoints) manage
these huge datasets. Their indexes cannot be fully loaded in main memory, hence
these systems need to perform slow disk accesses to solve SPARQL queries. This
paper addresses this problem by a compact indexed RDF structure (called
k2-triples) applying compact k2-tree structures to the well-known
vertical-partitioning technique. It obtains an ultra-compressed representation
of large RDF graphs and allows SPARQL queries to be full-in-memory performed
without decompression. We show that k2-triples clearly outperforms
state-of-the-art compressibility and traditional vertical-partitioning query
resolution, remaining very competitive with multi-index solutions.Comment: In Proc. of AMCIS'201
An Empirical Study of Real-World SPARQL Queries
Understanding how users tailor their SPARQL queries is crucial when designing
query evaluation engines or fine-tuning RDF stores with performance in mind. In
this paper we analyze 3 million real-world SPARQL queries extracted from logs
of the DBPedia and SWDF public endpoints. We aim at finding which are the most
used language elements both from syntactical and structural perspectives,
paying special attention to triple patterns and joins, since they are indeed
some of the most expensive SPARQL operations at evaluation phase. We have
determined that most of the queries are simple and include few triple patterns
and joins, being Subject-Subject, Subject-Object and Object-Object the most
common join types. The graph patterns are usually star-shaped and despite
triple pattern chains exist, they are generally short.Comment: 1st International Workshop on Usage Analysis and the Web of Data
(USEWOD2011) in the 20th International World Wide Web Conference (WWW2011),
Hyderabad, India, March 28th, 201
The central parsecs of M87: jet emission and an elusive accretion disc
We present the first simultaneous spectral energy distribution (SED) of M87
core at a scale of 0.4 arcsec () across the electromagnetic
spectrum. Two separate, quiescent, and active states are sampled that are
characterized by a similar featureless SED of power-law form, and that are thus
remarkably different from that of a canonical active galactic nuclei (AGN) or a
radiatively inefficient accretion source. We show that the emission from a jet
gives an excellent representation of the core of M87 core covering ten orders
of magnitude in frequency for both the active and the quiescent phases. The
inferred total jet power is, however, one to two orders of magnitude lower than
the jet mechanical power reported in the literature. The maximum luminosity of
a thin accretion disc allowed by the data yields an accretion rate of , assuming 10% efficiency. This power
suffices to explain M87 radiative luminosity at the jet-frame, it is however
two to three order of magnitude below that required to account for the jet's
kinetic power. The simplest explanation is variability, which requires the core
power of M87 to have been two to three orders of magnitude higher in the last
200 yr. Alternatively, an extra source of power may derive from black hole
spin. Based on the strict upper limit on the accretion rate, such spin power
extraction requires an efficiency an order of magnitude higher than predicted
from magnetohydrodynamic simulations, currently in the few hundred per cent
range.Comment: 18 pages, 6 figures. Accepted for publication in MNRA
HDTourist: exploring urban data on Android
The Web of Data currently comprises ? 62 billion triples from more than 2,000 different datasets covering many fields of knowledge3. This volume of structured Linked Data can be seen as a particular case of Big Data, referred to as Big Semantic Data [4]. Obviously, powerful computational configurations are tradi- tionally required to deal with the scalability problems arising to Big Semantic Data. It is not surprising that this ?data revolution? has competed in parallel with the growth of mobile computing. Smartphones and tablets are massively used at the expense of traditional computers but, to date, mobile devices have more limited computation resources. Therefore, one question that we may ask ourselves would be: can (potentially large) semantic datasets be consumed natively on mobile devices? Currently, only a few mobile apps (e.g., [1, 9, 2, 8]) make use of semantic data that they store in the mobile devices, while many others access existing SPARQL endpoints or Linked Data directly. Two main reasons can be considered for this fact. On the one hand, in spite of some initial approaches [6, 3], there are no well-established triplestores for mobile devices. This is an important limitation because any po- tential app must assume both RDF storage and SPARQL resolution. On the other hand, the particular features of these devices (little storage space, less computational power or more limited bandwidths) limit the adoption of seman- tic data for different uses and purposes. This paper introduces our HDTourist mobile application prototype. It con- sumes urban data from DBpedia4 to help tourists visiting a foreign city. Although it is a simple app, its functionality allows illustrating how semantic data can be stored and queried with limited resources. Our prototype is implemented for An- droid, but its foundations, explained in Section 2, can be deployed in any other platform. The app is described in Section 3, and Section 4 concludes about our current achievements and devises the future work
MapReduce-based Solutions for Scalable SPARQL Querying
The use of RDF to expose semantic data on the Web has seen a dramatic increase over the last few years. Nowadays, RDF datasets are so big and rconnected that, in fact, classical mono-node solutions present significant scalability problems when trying to manage big semantic data. MapReduce, a standard framework for distributed processing of great quantities of data, is earning a place among the distributed solutions facing RDF scalability issues. In this article, we survey the most important works addressing RDF management and querying through diverse MapReduce approaches, with a focus on their main strategies, optimizations and results
Response of ice cover on shallow lakes of the North Slope of Alaska to contemporary climate conditions (1950–2011): radar remote-sensing and numerical modeling data analysis
Air temperature and winter precipitation changes over the last five decades
have impacted the timing, duration, and thickness of the ice cover on Arctic
lakes as shown by recent studies. In the case of shallow tundra lakes, many
of which are less than 3 m deep, warmer climate conditions could result in
thinner ice covers and consequently, in a smaller fraction of lakes freezing
to their bed in winter. However, these changes have not yet been
comprehensively documented. The analysis of a 20 yr time series of European
remote sensing satellite ERS-1/2 synthetic aperture radar (SAR) data and a
numerical lake ice model were employed to determine the response of ice cover
(thickness, freezing to the bed, and phenology) on shallow lakes of the North
Slope of Alaska (NSA) to climate conditions over the last six decades. Given
the large area covered by these lakes, changes in the regional climate and
weather are related to regime shifts in the ice cover of the lakes. Analysis
of available SAR data from 1991 to 2011, from a sub-region of the NSA near
Barrow, shows a reduction in the fraction of lakes that freeze to the bed in
late winter. This finding is in good agreement with the decrease in ice
thickness simulated with the Canadian Lake Ice Model (CLIMo), a lower
fraction of lakes frozen to the bed corresponding to a thinner ice cover.
Observed changes of the ice cover show a trend toward increasing floating ice
fractions from 1991 to 2011, with the greatest change occurring in April,
when the grounded ice fraction declined by 22% (α = 0.01). Model
results indicate a trend toward thinner ice covers by 18â22 cm (no-snow and
53% snow depth scenarios, α = 0.01) during the 1991â2011 period
and by 21â38 cm (α = 0.001) from 1950 to 2011. The longer trend
analysis (1950â2011) also shows a decrease in the ice cover duration by
~24 days consequent to later freeze-up dates by 5.9 days (α
= 0.1) and earlier break-up dates by 17.7â18.6 days (α
= 0.001)
On the monitoring of surface displacement in connection with volcano reactivation in Tenerife, Canary Islands, using space techniques
Geodetic volcano monitoring in Tenerife has mainly focused on the Las Cañadas Caldera, where a geodetic micronetwork and a levelling profile are located. A sensitivity test of this geodetic network showed that it should be extended to cover the whole island for volcano monitoring purposes. Furthermore, InSAR allowed detecting two unexpected movements that were beyond the scope of the traditional geodetic network. These two facts prompted us to design and observe a GPS network covering the whole of Tenerife that was monitored in August 2000. The results obtained were accurate to one centimetre, and confirm one of the deformations, although they were not definitive enough to confirm the second one. Furthermore, new cases of possible subsidence have been detected in areas where InSAR could not be used to measure deformation due to low coherence. A first modelling attempt has been made using a very simple model and its results seem to indicate that the deformation observed and the groundwater level variation in the island may be related. Future observations will be necessary for further validation and to study the time evolution of the displacements, carry out interpretation work using different types of data (gravity, gases, etc) and develop models that represent the island more closely. The results obtained are important because they might affect the geodetic volcano monitoring on the island, which will only be really useful if it is capable of distinguishing between displacements that might be linked to volcanic activity and those produced by other causes. One important result in this work is that a new geodetic monitoring system based on two complementary techniques, InSAR and GPS, has been set up on Tenerife island. This the first time that the whole surface of any of the volcanic Canary Islands has been covered with a single network for this purpose. This research has displayed the need for further similar studies in the Canary Islands, at least on the islands which pose a greater risk of volcanic reactivation, such as Lanzarote and La Palma, where InSAR techniques have been used already
- âŠ