18,237 research outputs found
Building a P2P RDF Store for Edge Devices
The Semantic Web technologies have been used in the Internet of Things (IoT)
to facilitate data interoperability and address data heterogeneity issues. The
Resource Description Framework (RDF) model is employed in the integration of
IoT data, with RDF engines serving as gateways for semantic integration.
However, storing and querying RDF data obtained from distributed sources across
a dynamic network of edge devices presents a challenging task. The distributed
nature of the edge shares similarities with Peer-to-Peer (P2P) systems. These
similarities include attributes like node heterogeneity, limited availability,
and resources. The nodes primarily undertake tasks related to data storage and
processing. Therefore, the P2P models appear to present an attractive approach
for constructing distributed RDF stores. Based on P-Grid, a data indexing
mechanism for load balancing and range query processing in P2P systems, this
paper proposes a design for storing and sharing RDF data on P2P networks of
low-cost edge devices. Our design aims to integrate both P-Grid and an
edge-based RDF storage solution, RDF4Led for building an P2P RDF engine. This
integration can maintain RDF data access and query processing while scaling
with increasing data and network size. We demonstrated the scaling behavior of
our implementation on a P2P network, involving up to 16 nodes of Raspberry Pi 4
devices.Comment: Accepted to IoT Conference 202
Mapping Big Data into Knowledge Space with Cognitive Cyber-Infrastructure
Big data research has attracted great attention in science, technology,
industry and society. It is developing with the evolving scientific paradigm,
the fourth industrial revolution, and the transformational innovation of
technologies. However, its nature and fundamental challenge have not been
recognized, and its own methodology has not been formed. This paper explores
and answers the following questions: What is big data? What are the basic
methods for representing, managing and analyzing big data? What is the
relationship between big data and knowledge? Can we find a mapping from big
data into knowledge space? What kind of infrastructure is required to support
not only big data management and analysis but also knowledge discovery, sharing
and management? What is the relationship between big data and science paradigm?
What is the nature and fundamental challenge of big data computing? A
multi-dimensional perspective is presented toward a methodology of big data
computing.Comment: 59 page
Storage Solutions for Big Data Systems: A Qualitative Study and Comparison
Big data systems development is full of challenges in view of the variety of
application areas and domains that this technology promises to serve.
Typically, fundamental design decisions involved in big data systems design
include choosing appropriate storage and computing infrastructures. In this age
of heterogeneous systems that integrate different technologies for optimized
solution to a specific real world problem, big data system are not an exception
to any such rule. As far as the storage aspect of any big data system is
concerned, the primary facet in this regard is a storage infrastructure and
NoSQL seems to be the right technology that fulfills its requirements. However,
every big data application has variable data characteristics and thus, the
corresponding data fits into a different data model. This paper presents
feature and use case analysis and comparison of the four main data models
namely document oriented, key value, graph and wide column. Moreover, a feature
analysis of 80 NoSQL solutions has been provided, elaborating on the criteria
and points that a developer must consider while making a possible choice.
Typically, big data storage needs to communicate with the execution engine and
other processing and visualization technologies to create a comprehensive
solution. This brings forth second facet of big data storage, big data file
formats, into picture. The second half of the research paper compares the
advantages, shortcomings and possible use cases of available big data file
formats for Hadoop, which is the foundation for most big data computing
technologies. Decentralized storage and blockchain are seen as the next
generation of big data storage and its challenges and future prospects have
also been discussed
The NASA Astrophysics Data System: Architecture
The powerful discovery capabilities available in the ADS bibliographic
services are possible thanks to the design of a flexible search and retrieval
system based on a relational database model. Bibliographic records are stored
as a corpus of structured documents containing fielded data and metadata, while
discipline-specific knowledge is segregated in a set of files independent of
the bibliographic data itself.
The creation and management of links to both internal and external resources
associated with each bibliography in the database is made possible by
representing them as a set of document properties and their attributes.
To improve global access to the ADS data holdings, a number of mirror sites
have been created by cloning the database contents and software on a variety of
hardware and software platforms.
The procedures used to create and manage the database and its mirrors have
been written as a set of scripts that can be run in either an interactive or
unsupervised fashion.
The ADS can be accessed at http://adswww.harvard.eduComment: 25 pages, 8 figures, 3 table
EAGLE—A Scalable Query Processing Engine for Linked Sensor Data
Recently, many approaches have been proposed to manage sensor data using semantic web technologies for effective heterogeneous data integration. However, our empirical observations revealed that these solutions primarily focused on semantic relationships and unfortunately paid less attention to spatio–temporal correlations. Most semantic approaches do not have spatio–temporal support. Some of them have attempted to provide full spatio–temporal support, but have poor performance for complex spatio–temporal aggregate queries. In addition, while the volume of sensor data is rapidly growing, the challenge of querying and managing the massive volumes of data generated by sensing devices still remains unsolved. In this article, we introduce EAGLE, a spatio–temporal query engine for querying sensor data based on the linked data model. The ultimate goal of EAGLE is to provide an elastic and scalable system which allows fast searching and analysis with respect to the relationships of space, time and semantics in sensor data. We also extend SPARQL with a set of new query operators in order to support spatio–temporal computing in the linked sensor data context.EC/H2020/732679/EU/ACTivating InnoVative IoT smart living environments for AGEing well/ACTIVAGEEC/H2020/661180/EU/A Scalable and Elastic Platform for Near-Realtime Analytics for The Graph of Everything/SMARTE
Clumping towards a UK National catalogue?
This article presents a clumps-oriented perspective on the idea of a UK national catalogue for HE, arguing that a distributed approach based on Z39.50 has a number of attractive features when compared with the alternative physical union catalogue model, but also noting that the many difficulties currently associated with the distributed approach must be resolved before it can itself be regarded as a practical proposition. It is suggested that the distributed model is sufficiently attractive compared to the physical union model to make the expenditure of additional time, effort and resource worthwhile. 'Dynamic clumping' based on collection level description and other appropriate metadata is seen as the key to user navigation in a distributed national catalogue. Large physical union catalogues like COPAC are assumed to have a role, although updating difficulties and the lack of circulation information may limit its scope
- …