756 research outputs found

    RDF-TR: Exploiting structural redundancies to boost RDF compression

    Get PDF
    The number and volume of semantic data have grown impressively over the last decade, promoting compression as an essential tool for RDF preservation, sharing and management. In contrast to universal compressors, RDF compression techniques are able to detect and exploit specific forms of redundancy in RDF data. Thus, state-of-the-art RDF compressors excel at exploiting syntactic and semantic redundancies, i.e., repetitions in the serialization format and information that can be inferred implicitly. However, little attention has been paid to the existence of structural patterns within the RDF dataset; i.e. structural redundancy. In this paper, we analyze structural regularities in real-world datasets, and show three schema-based sources of redundancies that underpin the schema-relaxed nature of RDF. Then, we propose RDF-Tr (RDF Triples Reorganizer), a preprocessing technique that discovers and removes this kind of redundancy before the RDF dataset is effectively compressed. In particular, RDF-Tr groups subjects that are described by the same predicates, and locally re-codes the objects related to these predicates. Finally, we integrate RDF-Tr with two RDF compressors, HDT and k2-triples. Our experiments show that using RDF-Tr with these compressors improves by up to 2.3 times their original effectiveness, outperforming the most prominent state-of-the-art techniques

    Compressed k2-Triples for Full-In-Memory RDF Engines

    Get PDF
    Current "data deluge" has flooded the Web of Data with very large RDF datasets. They are hosted and queried through SPARQL endpoints which act as nodes of a semantic net built on the principles of the Linked Data project. Although this is a realistic philosophy for global data publishing, its query performance is diminished when the RDF engines (behind the endpoints) manage these huge datasets. Their indexes cannot be fully loaded in main memory, hence these systems need to perform slow disk accesses to solve SPARQL queries. This paper addresses this problem by a compact indexed RDF structure (called k2-triples) applying compact k2-tree structures to the well-known vertical-partitioning technique. It obtains an ultra-compressed representation of large RDF graphs and allows SPARQL queries to be full-in-memory performed without decompression. We show that k2-triples clearly outperforms state-of-the-art compressibility and traditional vertical-partitioning query resolution, remaining very competitive with multi-index solutions.Comment: In Proc. of AMCIS'201

    An Empirical Study of Real-World SPARQL Queries

    Get PDF
    Understanding how users tailor their SPARQL queries is crucial when designing query evaluation engines or fine-tuning RDF stores with performance in mind. In this paper we analyze 3 million real-world SPARQL queries extracted from logs of the DBPedia and SWDF public endpoints. We aim at finding which are the most used language elements both from syntactical and structural perspectives, paying special attention to triple patterns and joins, since they are indeed some of the most expensive SPARQL operations at evaluation phase. We have determined that most of the queries are simple and include few triple patterns and joins, being Subject-Subject, Subject-Object and Object-Object the most common join types. The graph patterns are usually star-shaped and despite triple pattern chains exist, they are generally short.Comment: 1st International Workshop on Usage Analysis and the Web of Data (USEWOD2011) in the 20th International World Wide Web Conference (WWW2011), Hyderabad, India, March 28th, 201

    The central parsecs of M87: jet emission and an elusive accretion disc

    Full text link
    We present the first simultaneous spectral energy distribution (SED) of M87 core at a scale of 0.4 arcsec (∌32 pc\sim 32\, \rm{pc}) across the electromagnetic spectrum. Two separate, quiescent, and active states are sampled that are characterized by a similar featureless SED of power-law form, and that are thus remarkably different from that of a canonical active galactic nuclei (AGN) or a radiatively inefficient accretion source. We show that the emission from a jet gives an excellent representation of the core of M87 core covering ten orders of magnitude in frequency for both the active and the quiescent phases. The inferred total jet power is, however, one to two orders of magnitude lower than the jet mechanical power reported in the literature. The maximum luminosity of a thin accretion disc allowed by the data yields an accretion rate of <6×10−5 M⊙ yr−1< 6 \times 10^{-5}\, \rm{M_\odot \, yr^{-1}}, assuming 10% efficiency. This power suffices to explain M87 radiative luminosity at the jet-frame, it is however two to three order of magnitude below that required to account for the jet's kinetic power. The simplest explanation is variability, which requires the core power of M87 to have been two to three orders of magnitude higher in the last 200 yr. Alternatively, an extra source of power may derive from black hole spin. Based on the strict upper limit on the accretion rate, such spin power extraction requires an efficiency an order of magnitude higher than predicted from magnetohydrodynamic simulations, currently in the few hundred per cent range.Comment: 18 pages, 6 figures. Accepted for publication in MNRA

    HDTourist: exploring urban data on Android

    Get PDF
    The Web of Data currently comprises ? 62 billion triples from more than 2,000 different datasets covering many fields of knowledge3. This volume of structured Linked Data can be seen as a particular case of Big Data, referred to as Big Semantic Data [4]. Obviously, powerful computational configurations are tradi- tionally required to deal with the scalability problems arising to Big Semantic Data. It is not surprising that this ?data revolution? has competed in parallel with the growth of mobile computing. Smartphones and tablets are massively used at the expense of traditional computers but, to date, mobile devices have more limited computation resources. Therefore, one question that we may ask ourselves would be: can (potentially large) semantic datasets be consumed natively on mobile devices? Currently, only a few mobile apps (e.g., [1, 9, 2, 8]) make use of semantic data that they store in the mobile devices, while many others access existing SPARQL endpoints or Linked Data directly. Two main reasons can be considered for this fact. On the one hand, in spite of some initial approaches [6, 3], there are no well-established triplestores for mobile devices. This is an important limitation because any po- tential app must assume both RDF storage and SPARQL resolution. On the other hand, the particular features of these devices (little storage space, less computational power or more limited bandwidths) limit the adoption of seman- tic data for different uses and purposes. This paper introduces our HDTourist mobile application prototype. It con- sumes urban data from DBpedia4 to help tourists visiting a foreign city. Although it is a simple app, its functionality allows illustrating how semantic data can be stored and queried with limited resources. Our prototype is implemented for An- droid, but its foundations, explained in Section 2, can be deployed in any other platform. The app is described in Section 3, and Section 4 concludes about our current achievements and devises the future work

    MapReduce-based Solutions for Scalable SPARQL Querying

    Get PDF
    The use of RDF to expose semantic data on the Web has seen a dramatic increase over the last few years. Nowadays, RDF datasets are so big and rconnected that, in fact, classical mono-node solutions present significant scalability problems when trying to manage big semantic data. MapReduce, a standard framework for distributed processing of great quantities of data, is earning a place among the distributed solutions facing RDF scalability issues. In this article, we survey the most important works addressing RDF management and querying through diverse MapReduce approaches, with a focus on their main strategies, optimizations and results

    Response of ice cover on shallow lakes of the North Slope of Alaska to contemporary climate conditions (1950&ndash;2011): radar remote-sensing and numerical modeling data analysis

    Get PDF
    Air temperature and winter precipitation changes over the last five decades have impacted the timing, duration, and thickness of the ice cover on Arctic lakes as shown by recent studies. In the case of shallow tundra lakes, many of which are less than 3 m deep, warmer climate conditions could result in thinner ice covers and consequently, in a smaller fraction of lakes freezing to their bed in winter. However, these changes have not yet been comprehensively documented. The analysis of a 20 yr time series of European remote sensing satellite ERS-1/2 synthetic aperture radar (SAR) data and a numerical lake ice model were employed to determine the response of ice cover (thickness, freezing to the bed, and phenology) on shallow lakes of the North Slope of Alaska (NSA) to climate conditions over the last six decades. Given the large area covered by these lakes, changes in the regional climate and weather are related to regime shifts in the ice cover of the lakes. Analysis of available SAR data from 1991 to 2011, from a sub-region of the NSA near Barrow, shows a reduction in the fraction of lakes that freeze to the bed in late winter. This finding is in good agreement with the decrease in ice thickness simulated with the Canadian Lake Ice Model (CLIMo), a lower fraction of lakes frozen to the bed corresponding to a thinner ice cover. Observed changes of the ice cover show a trend toward increasing floating ice fractions from 1991 to 2011, with the greatest change occurring in April, when the grounded ice fraction declined by 22% (&alpha; = 0.01). Model results indicate a trend toward thinner ice covers by 18–22 cm (no-snow and 53% snow depth scenarios, &alpha; = 0.01) during the 1991–2011 period and by 21–38 cm (&alpha; = 0.001) from 1950 to 2011. The longer trend analysis (1950–2011) also shows a decrease in the ice cover duration by ~24 days consequent to later freeze-up dates by 5.9 days (&alpha; = 0.1) and earlier break-up dates by 17.7–18.6 days (&alpha; = 0.001)

    On the monitoring of surface displacement in connection with volcano reactivation in Tenerife, Canary Islands, using space techniques

    Get PDF
    Geodetic volcano monitoring in Tenerife has mainly focused on the Las Cañadas Caldera, where a geodetic micronetwork and a levelling profile are located. A sensitivity test of this geodetic network showed that it should be extended to cover the whole island for volcano monitoring purposes. Furthermore, InSAR allowed detecting two unexpected movements that were beyond the scope of the traditional geodetic network. These two facts prompted us to design and observe a GPS network covering the whole of Tenerife that was monitored in August 2000. The results obtained were accurate to one centimetre, and confirm one of the deformations, although they were not definitive enough to confirm the second one. Furthermore, new cases of possible subsidence have been detected in areas where InSAR could not be used to measure deformation due to low coherence. A first modelling attempt has been made using a very simple model and its results seem to indicate that the deformation observed and the groundwater level variation in the island may be related. Future observations will be necessary for further validation and to study the time evolution of the displacements, carry out interpretation work using different types of data (gravity, gases, etc) and develop models that represent the island more closely. The results obtained are important because they might affect the geodetic volcano monitoring on the island, which will only be really useful if it is capable of distinguishing between displacements that might be linked to volcanic activity and those produced by other causes. One important result in this work is that a new geodetic monitoring system based on two complementary techniques, InSAR and GPS, has been set up on Tenerife island. This the first time that the whole surface of any of the volcanic Canary Islands has been covered with a single network for this purpose. This research has displayed the need for further similar studies in the Canary Islands, at least on the islands which pose a greater risk of volcanic reactivation, such as Lanzarote and La Palma, where InSAR techniques have been used already
    • 

    corecore