34,499 research outputs found
Data Protection: Combining Fragmentation, Encryption, and Dispersion, a final report
Hardening data protection using multiple methods rather than 'just'
encryption is of paramount importance when considering continuous and powerful
attacks in order to observe, steal, alter, or even destroy private and
confidential information.Our purpose is to look at cost effective data
protection by way of combining fragmentation, encryption, and dispersion over
several physical machines. This involves deriving general schemes to protect
data everywhere throughout a network of machines where they are being
processed, transmitted, and stored during their entire life cycle. This is
being enabled by a number of parallel and distributed architectures using
various set of cores or machines ranging from General Purpose GPUs to multiple
clouds. In this report, we first present a general and conceptual description
of what should be a fragmentation, encryption, and dispersion system (FEDS)
including a number of high level requirements such systems ought to meet. Then,
we focus on two kind of fragmentation. First, a selective separation of
information in two fragments a public one and a private one. We describe a
family of processes and address not only the question of performance but also
the questions of memory occupation, integrity or quality of the restitution of
the information, and of course we conclude with an analysis of the level of
security provided by our algorithms. Then, we analyze works first on general
dispersion systems in a bit wise manner without data structure consideration;
second on fragmentation of information considering data defined along an object
oriented data structure or along a record structure to be stored in a
relational database
Data fragmentation for parallel transitive closure strategies
Addresses the problem of fragmenting a relation to make the parallel computation of the transitive closure efficient, based on the disconnection set approach. To better understand this design problem, the authors focus on transportation networks. These are characterized by loosely interconnected clusters of nodes with a high internal connectivity rate. Three requirements that have to be fulfilled by a fragmentation are formulated, and three different fragmentation strategies are presented, each emphasizing one of these requirements. Some test results are presented to show the performance of the various fragmentation strategie
Review on Fragment Allocation by using Clustering Technique in Distributed Database System
Considerable Progress has been made in the last few years in improving the
performance of the distributed database systems. The development of Fragment
allocation models in Distributed database is becoming difficult due to the
complexity of huge number of sites and their communication considerations.
Under such conditions, simulation of clustering and data allocation is adequate
tools for understanding and evaluating the performance of data allocation in
Distributed databases. Clustering sites and fragment allocation are key
challenges in Distributed database performance, and are considered to be
efficient methods that have a major role in reducing transferred and accessed
data during the execution of applications. In this paper a review on Fragment
allocation by using Clustering technique is given in Distributed Database
System.Comment: 9 pages,3 figure
Partout: A Distributed Engine for Efficient RDF Processing
The increasing interest in Semantic Web technologies has led not only to a
rapid growth of semantic data on the Web but also to an increasing number of
backend applications with already more than a trillion triples in some cases.
Confronted with such huge amounts of data and the future growth, existing
state-of-the-art systems for storing RDF and processing SPARQL queries are no
longer sufficient. In this paper, we introduce Partout, a distributed engine
for efficient RDF processing in a cluster of machines. We propose an effective
approach for fragmenting RDF data sets based on a query log, allocating the
fragments to nodes in a cluster, and finding the optimal configuration. Partout
can efficiently handle updates and its query optimizer produces efficient query
execution plans for ad-hoc SPARQL queries. Our experiments show the superiority
of our approach to state-of-the-art approaches for partitioning and distributed
SPARQL query processing
Privacy-preserving Health Data Sharing for Medical Cyber-Physical Systems
The recent spades of cyber security attacks have compromised end users' data
safety and privacy in Medical Cyber-Physical Systems (MCPS). Traditional
standard encryption algorithms for data protection are designed based on a
viewpoint of system architecture rather than a viewpoint of end users. As such
encryption algorithms are transferring the protection on the data to the
protection on the keys, data safety and privacy will be compromised once the
key is exposed. In this paper, we propose a secure data storage and sharing
method consisted by a selective encryption algorithm combined with
fragmentation and dispersion to protect the data safety and privacy even when
both transmission media (e.g. cloud servers) and keys are compromised. This
method is based on a user-centric design that protects the data on a trusted
device such as end user's smartphone and lets the end user to control the
access for data sharing. We also evaluate the performance of the algorithm on a
smartphone platform to prove the efficiency
Fragment Allocation Configuration in Distributed Database Systems
In distributed database (DDB) management systems, fragment allocation is one
of the most important components that can directly affect the performance of
DDB. In this research work, we will show that declarative programming
languages, e.g. logic programming languages, can be used to represent different
data fragment allocation techniques. Results indicate that, using declarative
programming language significantly simplifies the representation of fragment
allocation algorithm, thus opens door for any further developments and
optimizations. The under consideration case study also show that our approach
can be extended to be used in different areas of distributed systems
The design and implementation of an infrastructure for multimedia digital libraries
We develop an infrastructure for managing, indexing and serving multimedia content in digital libraries. This infrastructure follows the model of the Web, and thereby is distributed in nature. We discuss the design of the Librarian, the component that manages meta data about the content. The management of meta data has been separated from the media servers that manage the content itself. Also, the extraction of the meta data is largely independent of the Librarian. We introduce our extensible data model and the daemon paradigm that are the core pieces of this architecture. We evaluate our initial implementation using a relational database. We conclude with a discussion of the lessons we learned in building this system, and proposals for improving the flexibility, reliability, and performance of the syste
Scalable Reliable SD Erlang Design
This technical report presents the design of Scalable Distributed (SD) Erlang: a set of language-level changes that aims to enable Distributed Erlang to scale for server applications on commodity hardware with at most 100,000 cores. We cover a number of aspects, specifically anticipated architecture, anticipated failures, scalable data structures, and scalable computation. Other two components that guided us in the design of SD Erlang are design principles and typical Erlang applications. The design principles summarise the type of modifications we aim to allow Erlang scalability. Erlang exemplars help us to identify the main Erlang scalability issues and hypothetically validate the SD Erlang design
LSM-based Storage Techniques: A Survey
Recently, the Log-Structured Merge-tree (LSM-tree) has been widely adopted
for use in the storage layer of modern NoSQL systems. Because of this, there
have been a large number of research efforts, from both the database community
and the operating systems community, that try to improve various aspects of
LSM-trees. In this paper, we provide a survey of recent research efforts on
LSM-trees so that readers can learn the state-of-the-art in LSM-based storage
techniques. We provide a general taxonomy to classify the literature of
LSM-trees, survey the efforts in detail, and discuss their strengths and
trade-offs. We further survey several representative LSM-based open-source
NoSQL systems and discuss some potential future research directions resulting
from the survey.Comment: This is a pre-print of an article published in VLDB Journal. The
final authenticated version is available online at:
https://doi.org/10.1007/s00778-019-00555-
TritanDB: Time-series Rapid Internet of Things Analytics
The efficient management of data is an important prerequisite for realising
the potential of the Internet of Things (IoT). Two issues given the large
volume of structured time-series IoT data are, addressing the difficulties of
data integration between heterogeneous Things and improving ingestion and query
performance across databases on both resource-constrained Things and in the
cloud. In this paper, we examine the structure of public IoT data and discover
that the majority exhibit unique flat, wide and numerical characteristics with
a mix of evenly and unevenly-spaced time-series. We investigate the advances in
time-series databases for telemetry data and combine these findings with
microbenchmarks to determine the best compression techniques and storage data
structures to inform the design of a novel solution optimised for IoT data. A
query translation method with low overhead even on resource-constrained Things
allows us to utilise rich data models like the Resource Description Framework
(RDF) for interoperability and data integration on top of the optimised
storage. Our solution, TritanDB, shows an order of magnitude performance
improvement across both Things and cloud hardware on many state-of-the-art
databases within IoT scenarios. Finally, we describe how TritanDB supports
various analyses of IoT time-series data like forecasting
- …