4,004 research outputs found
Building Large XML Stores in the Amazon Cloud
International audienceIt has been by now widely accepted that an increasing part of the world's interesting data is either shared through the Web or directly produced through and for Web platforms using formats like XML (structured documents). We present a scalable store for managing a large corpora of XML documents built on top of off-the-shelf cloud infrastructure. We implement different indexing strategies to evaluate a query workload over the stored documents in the cloud. Moreover, each strategy presents different trade-offs between efficiency in query answering and cost for storing the index
Storage Solutions for Big Data Systems: A Qualitative Study and Comparison
Big data systems development is full of challenges in view of the variety of
application areas and domains that this technology promises to serve.
Typically, fundamental design decisions involved in big data systems design
include choosing appropriate storage and computing infrastructures. In this age
of heterogeneous systems that integrate different technologies for optimized
solution to a specific real world problem, big data system are not an exception
to any such rule. As far as the storage aspect of any big data system is
concerned, the primary facet in this regard is a storage infrastructure and
NoSQL seems to be the right technology that fulfills its requirements. However,
every big data application has variable data characteristics and thus, the
corresponding data fits into a different data model. This paper presents
feature and use case analysis and comparison of the four main data models
namely document oriented, key value, graph and wide column. Moreover, a feature
analysis of 80 NoSQL solutions has been provided, elaborating on the criteria
and points that a developer must consider while making a possible choice.
Typically, big data storage needs to communicate with the execution engine and
other processing and visualization technologies to create a comprehensive
solution. This brings forth second facet of big data storage, big data file
formats, into picture. The second half of the research paper compares the
advantages, shortcomings and possible use cases of available big data file
formats for Hadoop, which is the foundation for most big data computing
technologies. Decentralized storage and blockchain are seen as the next
generation of big data storage and its challenges and future prospects have
also been discussed
The Family of MapReduce and Large Scale Data Processing Systems
In the last two decades, the continuous increase of computational power has
produced an overwhelming flow of data which has called for a paradigm shift in
the computing architecture and large scale data processing mechanisms.
MapReduce is a simple and powerful programming model that enables easy
development of scalable parallel applications to process vast amounts of data
on large clusters of commodity machines. It isolates the application from the
details of running a distributed program such as issues on data distribution,
scheduling and fault tolerance. However, the original implementation of the
MapReduce framework had some limitations that have been tackled by many
research efforts in several followup works after its introduction. This article
provides a comprehensive survey for a family of approaches and mechanisms of
large scale data processing mechanisms that have been implemented based on the
original idea of the MapReduce framework and are currently gaining a lot of
momentum in both research and industrial communities. We also cover a set of
introduced systems that have been implemented to provide declarative
programming interfaces on top of the MapReduce framework. In addition, we
review several large scale data processing systems that resemble some of the
ideas of the MapReduce framework for different purposes and application
scenarios. Finally, we discuss some of the future research directions for
implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author
Android application for searching cooking recipes by ingredients list
https://www.ester.ee/record=b544781
On the Fly Orchestration of Unikernels: Tuning and Performance Evaluation of Virtual Infrastructure Managers
Network operators are facing significant challenges meeting the demand for
more bandwidth, agile infrastructures, innovative services, while keeping costs
low. Network Functions Virtualization (NFV) and Cloud Computing are emerging as
key trends of 5G network architectures, providing flexibility, fast
instantiation times, support of Commercial Off The Shelf hardware and
significant cost savings. NFV leverages Cloud Computing principles to move the
data-plane network functions from expensive, closed and proprietary hardware to
the so-called Virtual Network Functions (VNFs). In this paper we deal with the
management of virtual computing resources (Unikernels) for the execution of
VNFs. This functionality is performed by the Virtual Infrastructure Manager
(VIM) in the NFV MANagement and Orchestration (MANO) reference architecture. We
discuss the instantiation process of virtual resources and propose a generic
reference model, starting from the analysis of three open source VIMs, namely
OpenStack, Nomad and OpenVIM. We improve the aforementioned VIMs introducing
the support for special-purpose Unikernels and aiming at reducing the duration
of the instantiation process. We evaluate some performance aspects of the VIMs,
considering both stock and tuned versions. The VIM extensions and performance
evaluation tools are available under a liberal open source licence
A Web GIS-based Integration of 3D Digital Models with Linked Open Data for Cultural Heritage Exploration
This PhD project explores how geospatial semantic web concepts, 3D web-based visualisation, digital interactive map, and cloud computing concepts could be integrated to enhance digital cultural heritage exploration; to offer long-term archiving and dissemination of 3D digital cultural heritage models; to better interlink heterogeneous and sparse cultural heritage data.
The research findings were disseminated via four peer-reviewed journal articles and a conference article presented at GISTAM 2020 conference (which received the ‘Best Student Paper Award’)
- …