3,618 research outputs found
Creating a Relational Distributed Object Store
In and of itself, data storage has apparent business utility. But when we can
convert data to information, the utility of stored data increases dramatically.
It is the layering of relation atop the data mass that is the engine for such
conversion. Frank relation amongst discrete objects sporadically ingested is
rare, making the process of synthesizing such relation all the more
challenging, but the challenge must be met if we are ever to see an equivalent
business value for unstructured data as we already have with structured data.
This paper describes a novel construct, referred to as a relational distributed
object store (RDOS), that seeks to solve the twin problems of how to
persistently and reliably store petabytes of unstructured data while
simultaneously creating and persisting relations amongst billions of objects.Comment: 12 pages, 5 figure
A File System Abstraction for Sense and Respond Systems
The heterogeneity and resource constraints of sense-and-respond systems pose
significant challenges to system and application development. In this paper, we
present a flexible, intuitive file system abstraction for organizing and
managing sense-and-respond systems based on the Plan 9 design principles. A key
feature of this abstraction is the ability to support multiple views of the
system via filesystem namespaces. Constructed logical views present an
application-specific representation of the network, thus enabling high-level
programming of the network. Concurrently, structural views of the network
enable resource-efficient planning and execution of tasks. We present and
motivate the design using several examples, outline research challenges and our
research plan to address them, and describe the current state of
implementation.Comment: 6 pages, 3 figures Workshop on End-to-End, Sense-and-Respond Systems,
Applications, and Services In conjunction with MobiSys '0
HDFS File Formats: Study and Performance Comparison
The distributed system Hadoop has become very popular for storing and process large amounts of data (Big Data). As it is composed of many machines, its file system, called
HDFS (Hadoop Distributed File System), is also distributed. But as HDFS is not a traditional
storage system, plenty of new file formats have been developed, to take advantage
of its features. In this work we study that new formats to find out their characteristics,
and being able to decide which ones can be better knowing the needs of our data. For
that goal, we have made a theoretical framework to compare them, and easily recognize
which formats fit our needs. Also we have made an experimental study to find out how the
formats work in some specific situations, selecting two very different datasets and a set of
simple queries, resolved with MapReduce jobs, written with Java or run using Hive tool.
The final goal of this work is to be able to identify the different strengths and weakenesses
of the file formats.Departamento de Informática (Arquitectura y Tecnología de Computadores, Ciencias de la Computación e Inteligencia Artificial, Lenguajes y Sistemas Informáticos)Máster en Investigación en Tecnologías de la Información y las Comunicacione
- …