301,620 research outputs found
Storage Solutions for Big Data Systems: A Qualitative Study and Comparison
Big data systems development is full of challenges in view of the variety of
application areas and domains that this technology promises to serve.
Typically, fundamental design decisions involved in big data systems design
include choosing appropriate storage and computing infrastructures. In this age
of heterogeneous systems that integrate different technologies for optimized
solution to a specific real world problem, big data system are not an exception
to any such rule. As far as the storage aspect of any big data system is
concerned, the primary facet in this regard is a storage infrastructure and
NoSQL seems to be the right technology that fulfills its requirements. However,
every big data application has variable data characteristics and thus, the
corresponding data fits into a different data model. This paper presents
feature and use case analysis and comparison of the four main data models
namely document oriented, key value, graph and wide column. Moreover, a feature
analysis of 80 NoSQL solutions has been provided, elaborating on the criteria
and points that a developer must consider while making a possible choice.
Typically, big data storage needs to communicate with the execution engine and
other processing and visualization technologies to create a comprehensive
solution. This brings forth second facet of big data storage, big data file
formats, into picture. The second half of the research paper compares the
advantages, shortcomings and possible use cases of available big data file
formats for Hadoop, which is the foundation for most big data computing
technologies. Decentralized storage and blockchain are seen as the next
generation of big data storage and its challenges and future prospects have
also been discussed
TWINLATIN: Twinning European and Latin-American river basins for research enabling sustainable water resources management. Combined Report D3.1 Hydrological modelling report and D3.2 Evaluation report
Water use has almost tripled over the past 50 years and in some regions the water demand already
exceeds supply (Vorosmarty et al., 2000). The world is facing a “global water crisis”; in many
countries, current levels of water use are unsustainable, with systems vulnerable to collapse from even
small changes in water availability. The need for a scientifically-based assessment of the potential
impacts on water resources of future changes, as a basis for society to adapt to such changes, is strong
for most parts of the world. Although the focus of such assessments has tended to be climate change,
socio-economic changes can have as significant an impact on water availability across the four main
use sectors i.e. domestic, agricultural, industrial (including energy) and environmental. Withdrawal
and consumption of water is expected to continue to grow substantially over the next 20-50 years
(Cosgrove & Rijsberman, 2002), and consequent changes in availability may drastically affect society
and economies.
One of the most needed improvements in Latin American river basin management is a higher level of
detail in hydrological modelling and erosion risk assessment, as a basis for identification and analysis
of mitigation actions, as well as for analysis of global change scenarios. Flow measurements are too
costly to be realised at more than a few locations, which means that modelled data are required for the
rest of the basin. Hence, TWINLATIN Work Package 3 “Hydrological modelling and extremes” was
formulated to provide methods and tools to be used by other WPs, in particular WP6 on “Pollution
pressure and impact analysis” and WP8 on “Change effects and vulnerability assessment”. With an
emphasis on high and low flows and their impacts, WP3 was originally called “Hydrological
modelling, flooding, erosion, water scarcity and water abstraction”. However, at the TWINLATIN
kick-off meeting it was agreed that some of these issues resided more appropriately in WP6 and WP8,
and so WP3 was renamed to focus on hydrological modelling and hydrological extremes.
The specific objectives of WP3 as set out in the Description of Work are
Deliverable JRA1.1: Evaluation of current network control and management planes for multi-domain network infrastructure
This deliverable includes a compilation and evaluation of available control and management architectures and protocols applicable to a multilayer infrastructure in a multi-domain Virtual Network environment.The scope of this deliverable is mainly focused on the virtualisation of the resources within a network and at processing nodes. The virtualization of the FEDERICA infrastructure allows the provisioning of its available resources to users by means of FEDERICA slices. A slice is seen by the user as a real physical network under his/her domain, however it maps to a logical partition (a virtual instance) of the physical FEDERICA resources. A slice is built to exhibit to the highest degree all the principles applicable to a physical network (isolation, reproducibility, manageability, ...). Currently, there are no standard definitions available for network virtualization or its associated architectures. Therefore, this deliverable proposes the Virtual Network layer architecture and evaluates a set of Management- and Control Planes that can be used for the partitioning and virtualization of the FEDERICA network resources. This evaluation has been performed taking into account an initial set of FEDERICA requirements; a possible extension of the selected tools will be evaluated in future deliverables. The studies described in this deliverable define the virtual architecture of the FEDERICA infrastructure. During this activity, the need has been recognised to establish a new set of basic definitions (taxonomy) for the building blocks that compose the so-called slice, i.e. the virtual network instantiation (which is virtual with regard to the abstracted view made of the building blocks of the FEDERICA infrastructure) and its architectural plane representation. These definitions will be established as a common nomenclature for the FEDERICA project. Other important aspects when defining a new architecture are the user requirements. It is crucial that the resulting architecture fits the demands that users may have. Since this deliverable has been produced at the same time as the contact process with users, made by the project activities related to the Use Case definitions, JRA1 has proposed a set of basic Use Cases to be considered as starting point for its internal studies. When researchers want to experiment with their developments, they need not only network resources on their slices, but also a slice of the processing resources. These processing slice resources are understood as virtual machine instances that users can use to make them behave as software routers or end nodes, on which to download the software protocols or applications they have produced and want to assess in a realistic environment. Hence, this deliverable also studies the APIs of several virtual machine management software products in order to identify which best suits FEDERICA’s needs.Postprint (published version
On the Evaluation of RDF Distribution Algorithms Implemented over Apache Spark
Querying very large RDF data sets in an efficient manner requires a
sophisticated distribution strategy. Several innovative solutions have recently
been proposed for optimizing data distribution with predefined query workloads.
This paper presents an in-depth analysis and experimental comparison of five
representative and complementary distribution approaches. For achieving fair
experimental results, we are using Apache Spark as a common parallel computing
framework by rewriting the concerned algorithms using the Spark API. Spark
provides guarantees in terms of fault tolerance, high availability and
scalability which are essential in such systems. Our different implementations
aim to highlight the fundamental implementation-independent characteristics of
each approach in terms of data preparation, load balancing, data replication
and to some extent to query answering cost and performance. The presented
measures are obtained by testing each system on one synthetic and one
real-world data set over query workloads with differing characteristics and
different partitioning constraints.Comment: 16 pages, 3 figure
IT-Supported Management of Mass Casualty Incidents: The e-Triage Project
Voice, analogue mobile radio, and paper have been successfully used for decades for coordination of emergencies and disasters, but although being simple and robust this approach cannot keep pace with today’s requirements any more. Emerging and established digital communication standards open the door to new applications and services, but the expected benefit needs to be carefully evaluated against robustness, interoperability, and user-friendliness. This paper describes a framework for IT-supported management of mass casualty incidents, which is currently under implementation and study. The four pillars of the concept are handheld devices for use both in daily rescue operations and in disasters, autonomous satellite-based communication infrastructure, a distributed database concept for maximal availability, and psychological acceptance research
- …