1 research outputs found
Hydra -- A Federated Data Repository over NDN
Today's big data science communities manage their data publication and
replication at the application layer. These communities utilize myriad
mechanisms to publish, discover, and retrieve datasets - the result is an
ecosystem of either centralized, or otherwise a collection of ad-hoc data
repositories. Publishing datasets to centralized repositories can be
process-intensive, and those repositories do not accept all datasets. The
ad-hoc repositories are difficult to find and utilize due to differences in
data names, metadata standards, and access methods. To address the problem of
scientific data publication and storage, we have designed Hydra, a secure,
distributed, and decentralized data repository made of a loose federation of
storage servers (nodes) provided by user communities. Hydra runs over Named
Data Networking (NDN) and utilizes the State Vector Sync (SVS) protocol that
lets individual nodes maintain a "global view" of the system. Hydra provides a
scalable and resilient data retrieval service, with data distribution
scalability achieved via NDN's built-in data anycast and in-network caching and
resiliency against individual server failures through automated failure
detection and maintaining a specific degree of replication. Hydra utilizes
"Favor", a locally calculated numerical value to decide which nodes will
replicate a file. Finally, Hydra utilizes data-centric security for data
publication and node authentication. Hydra uses a Network Operation Center
(NOC) to bootstrap trust in Hydra nodes and data publishers. The NOC
distributes user and node certificates and performs the proof-of-possession
challenges.
This technical report serves as the reference for Hydra. It outlines the
design decisions, the rationale behind them, the functional modules, and the
protocol specifications