301,620 research outputs found

    Storage Solutions for Big Data Systems: A Qualitative Study and Comparison

    Full text link
    Big data systems development is full of challenges in view of the variety of application areas and domains that this technology promises to serve. Typically, fundamental design decisions involved in big data systems design include choosing appropriate storage and computing infrastructures. In this age of heterogeneous systems that integrate different technologies for optimized solution to a specific real world problem, big data system are not an exception to any such rule. As far as the storage aspect of any big data system is concerned, the primary facet in this regard is a storage infrastructure and NoSQL seems to be the right technology that fulfills its requirements. However, every big data application has variable data characteristics and thus, the corresponding data fits into a different data model. This paper presents feature and use case analysis and comparison of the four main data models namely document oriented, key value, graph and wide column. Moreover, a feature analysis of 80 NoSQL solutions has been provided, elaborating on the criteria and points that a developer must consider while making a possible choice. Typically, big data storage needs to communicate with the execution engine and other processing and visualization technologies to create a comprehensive solution. This brings forth second facet of big data storage, big data file formats, into picture. The second half of the research paper compares the advantages, shortcomings and possible use cases of available big data file formats for Hadoop, which is the foundation for most big data computing technologies. Decentralized storage and blockchain are seen as the next generation of big data storage and its challenges and future prospects have also been discussed

    TWINLATIN: Twinning European and Latin-American river basins for research enabling sustainable water resources management. Combined Report D3.1 Hydrological modelling report and D3.2 Evaluation report

    Get PDF
    Water use has almost tripled over the past 50 years and in some regions the water demand already exceeds supply (Vorosmarty et al., 2000). The world is facing a “global water crisis”; in many countries, current levels of water use are unsustainable, with systems vulnerable to collapse from even small changes in water availability. The need for a scientifically-based assessment of the potential impacts on water resources of future changes, as a basis for society to adapt to such changes, is strong for most parts of the world. Although the focus of such assessments has tended to be climate change, socio-economic changes can have as significant an impact on water availability across the four main use sectors i.e. domestic, agricultural, industrial (including energy) and environmental. Withdrawal and consumption of water is expected to continue to grow substantially over the next 20-50 years (Cosgrove & Rijsberman, 2002), and consequent changes in availability may drastically affect society and economies. One of the most needed improvements in Latin American river basin management is a higher level of detail in hydrological modelling and erosion risk assessment, as a basis for identification and analysis of mitigation actions, as well as for analysis of global change scenarios. Flow measurements are too costly to be realised at more than a few locations, which means that modelled data are required for the rest of the basin. Hence, TWINLATIN Work Package 3 “Hydrological modelling and extremes” was formulated to provide methods and tools to be used by other WPs, in particular WP6 on “Pollution pressure and impact analysis” and WP8 on “Change effects and vulnerability assessment”. With an emphasis on high and low flows and their impacts, WP3 was originally called “Hydrological modelling, flooding, erosion, water scarcity and water abstraction”. However, at the TWINLATIN kick-off meeting it was agreed that some of these issues resided more appropriately in WP6 and WP8, and so WP3 was renamed to focus on hydrological modelling and hydrological extremes. The specific objectives of WP3 as set out in the Description of Work are

    Deliverable JRA1.1: Evaluation of current network control and management planes for multi-domain network infrastructure

    Get PDF
    This deliverable includes a compilation and evaluation of available control and management architectures and protocols applicable to a multilayer infrastructure in a multi-domain Virtual Network environment.The scope of this deliverable is mainly focused on the virtualisation of the resources within a network and at processing nodes. The virtualization of the FEDERICA infrastructure allows the provisioning of its available resources to users by means of FEDERICA slices. A slice is seen by the user as a real physical network under his/her domain, however it maps to a logical partition (a virtual instance) of the physical FEDERICA resources. A slice is built to exhibit to the highest degree all the principles applicable to a physical network (isolation, reproducibility, manageability, ...). Currently, there are no standard definitions available for network virtualization or its associated architectures. Therefore, this deliverable proposes the Virtual Network layer architecture and evaluates a set of Management- and Control Planes that can be used for the partitioning and virtualization of the FEDERICA network resources. This evaluation has been performed taking into account an initial set of FEDERICA requirements; a possible extension of the selected tools will be evaluated in future deliverables. The studies described in this deliverable define the virtual architecture of the FEDERICA infrastructure. During this activity, the need has been recognised to establish a new set of basic definitions (taxonomy) for the building blocks that compose the so-called slice, i.e. the virtual network instantiation (which is virtual with regard to the abstracted view made of the building blocks of the FEDERICA infrastructure) and its architectural plane representation. These definitions will be established as a common nomenclature for the FEDERICA project. Other important aspects when defining a new architecture are the user requirements. It is crucial that the resulting architecture fits the demands that users may have. Since this deliverable has been produced at the same time as the contact process with users, made by the project activities related to the Use Case definitions, JRA1 has proposed a set of basic Use Cases to be considered as starting point for its internal studies. When researchers want to experiment with their developments, they need not only network resources on their slices, but also a slice of the processing resources. These processing slice resources are understood as virtual machine instances that users can use to make them behave as software routers or end nodes, on which to download the software protocols or applications they have produced and want to assess in a realistic environment. Hence, this deliverable also studies the APIs of several virtual machine management software products in order to identify which best suits FEDERICA’s needs.Postprint (published version

    On the Evaluation of RDF Distribution Algorithms Implemented over Apache Spark

    Full text link
    Querying very large RDF data sets in an efficient manner requires a sophisticated distribution strategy. Several innovative solutions have recently been proposed for optimizing data distribution with predefined query workloads. This paper presents an in-depth analysis and experimental comparison of five representative and complementary distribution approaches. For achieving fair experimental results, we are using Apache Spark as a common parallel computing framework by rewriting the concerned algorithms using the Spark API. Spark provides guarantees in terms of fault tolerance, high availability and scalability which are essential in such systems. Our different implementations aim to highlight the fundamental implementation-independent characteristics of each approach in terms of data preparation, load balancing, data replication and to some extent to query answering cost and performance. The presented measures are obtained by testing each system on one synthetic and one real-world data set over query workloads with differing characteristics and different partitioning constraints.Comment: 16 pages, 3 figure

    IT-Supported Management of Mass Casualty Incidents: The e-Triage Project

    Get PDF
    Voice, analogue mobile radio, and paper have been successfully used for decades for coordination of emergencies and disasters, but although being simple and robust this approach cannot keep pace with today’s requirements any more. Emerging and established digital communication standards open the door to new applications and services, but the expected benefit needs to be carefully evaluated against robustness, interoperability, and user-friendliness. This paper describes a framework for IT-supported management of mass casualty incidents, which is currently under implementation and study. The four pillars of the concept are handheld devices for use both in daily rescue operations and in disasters, autonomous satellite-based communication infrastructure, a distributed database concept for maximal availability, and psychological acceptance research
    • …
    corecore