5,246 research outputs found
Creating a Relational Distributed Object Store
In and of itself, data storage has apparent business utility. But when we can
convert data to information, the utility of stored data increases dramatically.
It is the layering of relation atop the data mass that is the engine for such
conversion. Frank relation amongst discrete objects sporadically ingested is
rare, making the process of synthesizing such relation all the more
challenging, but the challenge must be met if we are ever to see an equivalent
business value for unstructured data as we already have with structured data.
This paper describes a novel construct, referred to as a relational distributed
object store (RDOS), that seeks to solve the twin problems of how to
persistently and reliably store petabytes of unstructured data while
simultaneously creating and persisting relations amongst billions of objects.Comment: 12 pages, 5 figure
PaRiS: Causally Consistent Transactions with Non-blocking Reads and Partial Replication
Geo-replicated data platforms are at the backbone of several large-scale
online services. Transactional Causal Consistency (TCC) is an attractive
consistency level for building such platforms. TCC avoids many anomalies of
eventual consistency, eschews the synchronization costs of strong consistency,
and supports interactive read-write transactions. Partial replication is
another attractive design choice for building geo-replicated platforms, as it
increases the storage capacity and reduces update propagation costs. This paper
presents PaRiS, the first TCC system that supports partial replication and
implements non-blocking parallel read operations, whose latency is paramount
for the performance of read-intensive applications. PaRiS relies on a novel
protocol to track dependencies, called Universal Stable Time (UST). By means of
a lightweight background gossip process, UST identifies a snapshot of the data
that has been installed by every DC in the system. Hence, transactions can
consistently read from such a snapshot on any server in any replication site
without having to block. Moreover, PaRiS requires only one timestamp to track
dependencies and define transactional snapshots, thereby achieving resource
efficiency and scalability. We evaluate PaRiS on a large-scale AWS deployment
composed of up to 10 replication sites. We show that PaRiS scales well with the
number of DCs and partitions, while being able to handle larger data-sets than
existing solutions that assume full replication. We also demonstrate a
performance gain of non-blocking reads vs. a blocking alternative (up to 1.47x
higher throughput with 5.91x lower latency for read-dominated workloads and up
to 1.46x higher throughput with 20.56x lower latency for write-heavy
workloads)
Semantics-based selection of everyday concepts in visual lifelogging
Concept-based indexing, based on identifying various semantic concepts appearing in multimedia, is an attractive option for multimedia retrieval and much research tries to bridge the semantic gap between the mediaâs low-level features and high-level semantics. Research into concept-based multimedia retrieval has generally focused on detecting concepts from high quality media such as broadcast TV or movies, but it is not well addressed in other domains like lifelogging where the original data is captured with poorer quality. We argue that in noisy domains such as lifelogging, the management of data needs to include semantic reasoning in order to deduce a set of concepts to represent lifelog content for applications like searching, browsing or summarisation. Using semantic concepts to manage lifelog data relies on the fusion of automatically-detected concepts to provide a better understanding of the lifelog data. In this paper, we investigate the selection of semantic concepts for lifelogging which includes reasoning on semantic networks using a density-based approach. In a series of experiments we compare different semantic reasoning approaches and the experimental evaluations we report on lifelog data show the efficacy of our approach
Integration challenges of pure operation-based CRDTs in redis
Pure operation-based (op-based) Conflict-free Replicated Data Types (CRDTs) are generic and very efficient as they allow for compact solutions in both sent messages and state size. Although the pure op-based model looks promising, it is still not fully understood in terms of practical implementation. In this paper, we explain the challenges faced in implementing pure op-based CRDTs in a real system: the well-known in-memory cache key-value store Redis. Our purpose of choosing Redis is to implement a multi-master replication feature, which the current system lacks. The experience demonstrates that pure op-based CRDTs can be implemented in existing systems with minor changes in the original API.European Union Seventh Framework Program (FP7/2007-2013) under grant agreement 609551, SyncFree project.
Project âTEC4Growth - Pervasive Intelligence, Enhancers and Proofs of Concept with Industrial Impact/NORTE-01-0145-FEDER-000020âis financed by the North Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, and through the European Regional Development Fund (ERDF).info:eu-repo/semantics/publishedVersio
Fisheye Consistency: Keeping Data in Synch in a Georeplicated World
Over the last thirty years, numerous consistency conditions for replicated
data have been proposed and implemented. Popular examples of such conditions
include linearizability (or atomicity), sequential consistency, causal
consistency, and eventual consistency. These consistency conditions are usually
defined independently from the computing entities (nodes) that manipulate the
replicated data; i.e., they do not take into account how computing entities
might be linked to one another, or geographically distributed. To address this
lack, as a first contribution, this paper introduces the notion of proximity
graph between computing nodes. If two nodes are connected in this graph, their
operations must satisfy a strong consistency condition, while the operations
invoked by other nodes are allowed to satisfy a weaker condition. The second
contribution is the use of such a graph to provide a generic approach to the
hybridization of data consistency conditions into the same system. We
illustrate this approach on sequential consistency and causal consistency, and
present a model in which all data operations are causally consistent, while
operations by neighboring processes in the proximity graph are sequentially
consistent. The third contribution of the paper is the design and the proof of
a distributed algorithm based on this proximity graph, which combines
sequential consistency and causal consistency (the resulting condition is
called fisheye consistency). In doing so the paper not only extends the domain
of consistency conditions, but provides a generic provably correct solution of
direct relevance to modern georeplicated systems
- âŠ