Search CORE

391 research outputs found

Backup and Recovery Mechanisms of Cassandra Database: A Review

Author: Bohora Karina
Bothe Amol
Chopade Rupali
Pachghare V. K.
Sheth Damini
Publication venue: (Print) 1558-7215
Publication date: 16/02/2021
Field of study

Cassandra is a NoSQL database having a peer-to-peer, ring-type architecture. Cassandra offers fault-tolerance, data replication for higher availability as well as ensures no single point of failure. Given that Cassandra is a NoSQL database, it is evident that it lacks the amount of research that has gone into comparatively older and more widely and broadly used SQL databases. Cassandra’s growing popularity in recent times gives rise to the need of addressing any security-related or recovery-related concerns associated with its usage. This review paper discusses the existing deletion mechanism in Cassandra and presents some identified issues related to backup and recovery in the Cassandra database. Further, failure detection as well as handling of failures such as node failure or data center failure has been explored in the paper. In addition, several possible solutions to address backup and recovery including recovery in case of disasters have been reviewed

Embry-Riddle Aeronautical University

TailX: Scheduling Heterogeneous Multiget Queries to Improve Tail Latencies in Key-Value Stores

Author: A Lakshman
AO Al-Abbasi
BH Bloom
D Balouek
J Dean
M Mitzenmacher
W Jiang
Publication venue
Publication date: 08/06/2020
Field of study

International audienceUsers of interactive services such as e-commerce platforms have high expectations for the performance and responsiveness of these services. Tail latency, denoting the worst service times, contributes greatly to user dissatisfaction and should be minimized. Maintaining low tail latency for interactive services is challenging because a request is not complete until all its operations are completed. The challenge is to identify bottleneck operations and schedule them on uncoordinated backend servers with minimal overhead, when the duration of these operations are heterogeneous and unpredictable. In this paper, we focus on improving the latency of multiget operations in cloud data stores. We present TailX, a task-aware multiget scheduling algorithm that improves tail latencies under heterogeneous workloads. TailX schedules operations according to an estimation of the size of the corresponding data, and allows itself to procrastinate some operations to give way to higher priority ones. We implement TailX in Cassandra, a widely used key-value store. The result is an improved overall performance of the cloud data stores for a wide variety of heterogeneous workloads. Specifically, our experiments under heterogeneous YCSB workloads show that TailX outperforms state-of-the-art solutions and reduces tail latencies by up to 70% and median latencies by up to 75%

Maastricht University Research Portal

Crossref

Hal - Université Grenoble Alpes

HAL

Hal-Diderot

Towards a Novel Cooperative Logistics Information System Framework

Author: Amanton Laurent
Sanlaville Eric
Zaidi Fares
Publication venue
Publication date: 08/07/2018
Field of study

Supply Chains and Logistics have a growing importance in global economy. Supply Chain Information Systems over the world are heterogeneous and each one can both produce and receive massive amounts of structured and unstructured data in real-time, which are usually generated by information systems, connected objects or manually by humans. This heterogeneity is due to Logistics Information Systems components and processes that are developed by different modelling methods and running on many platforms; hence, decision making process is difficult in such multi-actor environment. In this paper we identify some current challenges and integration issues between separately designed Logistics Information Systems (LIS), and we propose a Distributed Cooperative Logistics Platform (DCLP) framework based on NoSQL, which facilitates real-time cooperation between stakeholders and improves decision making process in a multi-actor environment. We included also a case study of Hospital Supply Chain (HSC), and a brief discussion on perspectives and future scope of work

arXiv.org e-Print Archive

HAL - Normandie Université

Contributions to High-Throughput Computing Based on the Peer-to-Peer Paradigm

Author: Pérez Miguel Carlos
Publication venue
Publication date: 18/06/2015
Field of study

XII, 116 p.This dissertation focuses on High Throughput Computing (HTC) systems and how to build a working HTC system using Peer-to-Peer (P2P) technologies. The traditional HTC systems, designed to process the largest possible number of tasks per unit of time, revolve around a central node that implements a queue used to store and manage submitted tasks. This central node limits the scalability and fault tolerance of the HTC system. A usual solution involves the utilization of replicas of the master node that can replace it. This solution is, however, limited by the number of replicas used. In this thesis, we propose an alternative solution that follows the P2P philosophy: a completely distributed system in which all worker nodes participate in the scheduling tasks, and with a physically distributed task queue implemented on top of a P2P storage system. The fault tolerance and scalability of this proposal is, therefore, limited only by the number of nodes in the system. The proper operation and scalability of our proposal have been validated through experimentation with a real system. The data availability provided by Cassandra, the P2P data management framework used in our proposal, is analysed by means of several stochastic models. These models can be used to make predictions about the availability of any Cassandra deployment, as well as to select the best possible con guration of any Cassandra system. In order to validate the proposed models, an experimentation with real Cassandra clusters is made, showing that our models are good descriptors of Cassandra's availability. Finally, we propose a set of scheduling policies that try to solve a common problem of HTC systems: re-execution of tasks due to a failure in the node where the task was running, without additional resource misspending. In order to reduce the number of re-executions, our proposals try to nd good ts between the reliability of nodes and the estimated length of each task. An extensive simulation-based experimentation shows that our policies are capable of reducing the number of re-executions, improving system performance and utilization of nodes

Archivo Digital para la Docencia y la Investigación