1,492 research outputs found
Blazes: Coordination Analysis for Distributed Programs
Distributed consistency is perhaps the most discussed topic in distributed
systems today. Coordination protocols can ensure consistency, but in practice
they cause undesirable performance unless used judiciously. Scalable
distributed architectures avoid coordination whenever possible, but
under-coordinated systems can exhibit behavioral anomalies under fault, which
are often extremely difficult to debug. This raises significant challenges for
distributed system architects and developers. In this paper we present Blazes,
a cross-platform program analysis framework that (a) identifies program
locations that require coordination to ensure consistent executions, and (b)
automatically synthesizes application-specific coordination code that can
significantly outperform general-purpose techniques. We present two case
studies, one using annotated programs in the Twitter Storm system, and another
using the Bloom declarative language.Comment: Updated to include additional materials from the original technical
report: derivation rules, output stream label
Extended Fault Taxonomy of SOA-Based Systems
Service Oriented Architecture (SOA) is considered as a standard for enterprise software development. The main characteristics of SOA are dynamic discovery and composition of software services in a heterogeneous environment. These properties pose newer challenges in fault management of SOA-based systems (SBS). A proper understanding of different faults in an SBS is very necessary for effective fault handling. A comprehensive three-fold fault taxonomy is presented here that covers distributed, SOA specific and non-functional faults in a holistic manner. A comprehensive fault taxonomy is a key starting point for providing techniques and methods for accessing the quality of a given system. In this paper, an attempt has been made to outline several SBSs faults into a well-structured taxonomy that may assist developers to plan suitable fault repairing strategies. Some commonly emphasized fault recovery strategies are also discussed. Some challenges that may occur during fault handling of SBSs are also mentioned
Storage Solutions for Big Data Systems: A Qualitative Study and Comparison
Big data systems development is full of challenges in view of the variety of
application areas and domains that this technology promises to serve.
Typically, fundamental design decisions involved in big data systems design
include choosing appropriate storage and computing infrastructures. In this age
of heterogeneous systems that integrate different technologies for optimized
solution to a specific real world problem, big data system are not an exception
to any such rule. As far as the storage aspect of any big data system is
concerned, the primary facet in this regard is a storage infrastructure and
NoSQL seems to be the right technology that fulfills its requirements. However,
every big data application has variable data characteristics and thus, the
corresponding data fits into a different data model. This paper presents
feature and use case analysis and comparison of the four main data models
namely document oriented, key value, graph and wide column. Moreover, a feature
analysis of 80 NoSQL solutions has been provided, elaborating on the criteria
and points that a developer must consider while making a possible choice.
Typically, big data storage needs to communicate with the execution engine and
other processing and visualization technologies to create a comprehensive
solution. This brings forth second facet of big data storage, big data file
formats, into picture. The second half of the research paper compares the
advantages, shortcomings and possible use cases of available big data file
formats for Hadoop, which is the foundation for most big data computing
technologies. Decentralized storage and blockchain are seen as the next
generation of big data storage and its challenges and future prospects have
also been discussed
Managing Population and Workload Imbalance in Structured Overlays
Every day the number of data produced by networked devices increases. The current
paradigm is to offload the data produced to data centers to be processed. However as
more and more devices are offloading their data do cloud centers, accessing data becomes
increasingly more challenging. To combat this problem, systems are bringing data closer
to the consumer and distributing network responsibilities among the end devices. We are
witnessing a change in networking paradigm, where data storage and computation that
was once only handled in the cloud, is being processed by Internet of Things (IoT) and
mobile devices, thanks to the ever increasing technological capabilities of these devices.
One approach, leverages devices into a structured overlay network.
Structured Overlays are a common approach to address the organization and distri-
bution of data in peer-to-peer distributed systems. Due to their nature, indexing and
searching for elements of the system becomes trivial, thus structured overlays become
ideal building blocks of resource location based applications.
Such overlays assume that the data is distributed evenly over the peers, and that the
popularity of those data items is also evenly balanced. However in many systems, due to
many factors outside of the system domain, popularity may behave rather randomly, al-
lowing for some nodes to spare more resources looking for the popular items than others.
In this work we intend to exploit the properties of cluster-based structured overlays
propose to address this problem by improving a structure overlay with the mechanisms
to manage the population and workload imbalance and achieve more uniform use of
resources.
Our approach focus on implementing a Group-Based Distributed Hash Table (DHT)
capable of dynamically changing its groups to accommodate the changes in churn in the
network.
With the conclusion of our work we believe that we have indeed created a network
capable of withstanding high levels of churn, while ensuring fairness to all members of
the network.Todos os dias aumenta o número de dados produzidos por dispositivos em rede. O pa-
radigma atual é descarregar os dados produzidos para centros de dados para serem pro-
cessados. No entanto com o aumento do número de dispositivos a descarregar dados
para estes centros, o acesso aos dados torna-se cada vez mais desafiante. Para combater
este problema, os sistemas estão a aproximar os dados dos consumidores e a distribuir
responsabilidades de rede entre os dispositivos. Estamos a assistir a uma mudança no
paradigma de redes, onde o armazenamento de dados e a computação que antes eram da
responsabilidade dos centros de dados, está a ser processado por dispositivos móveis IoT,
graças às crescentes capacidades tecnológicas destes dispositivos. Uma abordagem, junta
os dispositivos em redes estruturadas.
As redes estruturadas são o meio mais comum de organizar e distribuir dados em
redes peer-to-peer. Gradas às suas propriedades, indexar e procurar por elementos torna-
se trivial, assim, as redes estruturadas tornam-se o bloco de construção ideal para sistemas
de procura de ficheiros.
Estas redes assumem que os dados estão distribuídos equitativamente por todos os
participantes e que todos esses dados são igualmente procurados. no entanto em muitos
sistemas, por factores externos a popularidade tem um comportamento volátil e imprevi-
sível sobrecarregando os participantes que guardam os dados mais populares.
Este trabalho tenta explorar as propriedades das redes estruturadas em grupo para
confrontar o problema, vamos equipar uma destas redes com os mecanismos necessários
para coordenar os participantes e a sua carga.
A nossa abordagem focasse na implementação de uma DHT baseado em grupos capaz
de alterar dinamicamente os grupos para acomodar as mudanças de membros da rede.
Com a conclusão de nosso trabalho, acreditamos que criamos uma rede capaz de
suportar altos níveis de instabilidade, enquanto garante justiça a todos os membros da
rede
Self-managing cloud-native applications : design, implementation and experience
Running applications in the cloud efficiently requires much more than deploying software in virtual machines. Cloud applications have to be continuously managed: (1) to adjust their resources to the incoming load and (2) to face transient failures replicating and restarting components to provide resiliency on unreliable infrastructure. Continuous management monitors application and infrastructural metrics to provide automated and responsive reactions to failures (health management) and changing environmental conditions (auto-scaling) minimizing human intervention.
In the current practice, management functionalities are provided as infrastructural or third party services. In both cases they are external to the application deployment. We claim that this approach has intrinsic limits, namely that separating management functionalities from the application prevents them from naturally scaling with the application and requires additional management code and human intervention. Moreover, using infrastructure provider services for management functionalities results in vendor lock-in effectively preventing cloud applications to adapt and run on the most effective cloud for the job.
In this paper we discuss the main characteristics of cloud native applications, propose a novel architecture that enables scalable and resilient self-managing applications in the cloud, and relate on our experience in porting a legacy application to the cloud applying cloud-native principles
Implementation and test of transactional primitives over Cassandra
Dissertação de mestrado em Engenharia InformáticaNoSQL databases opt not to offer important abstractions traditionally
found in relational databases in order to achieve high levels of scalability and
availability: transactional guarantees and strong data consistency. These
limitations bring considerable complexity to the development of client applications
and are therefore an obstacle to the broader adoption of the technology.
In this work we propose a middleware layer over NoSQL databases that
offers transactional guarantees with Snapshot Isolation. The proposed solution
is achieved in a non-intrusive manner, providing to the clients the same
interface as a NoSQL database, simply adding the transactional context. The
transactional context is the focus of our contribution and is modularly based
on a Non Persistent Version Store that holds several versions of elements
and interacts with an external transaction certifier.
In this work, we present an implementation of our system over Apache
Cassandra and by using two representative benchmarks, YCSB and TPC-C,
we measure the cost of adding transactional support with ACID guarantees.As bases de dados NoSQL optam por não oferecer importantes abstrações
tradicionalmente encontradas nas bases de dados relacionais, de modo a
atingir elevada escalabilidade e disponibilidade: garantias transacionais e
critérios de coerência de dados fortes. Estas limitações resultam em maior
complexidade no desenvolvimento de aplicações e são por isso um obstáculo
à ampla adoção do paradigma.
Neste trabalho, propomos uma camada de middleware sobre bases de
dados NoSQL que oferece garantias transacionais com Snapshot Isolation.
A abordagem proposta e não-intrusiva, apresentando aos clientes a mesma
interface NoSQL, acrescendo o contexto transacional. Este contexto transacional
e o cerne da nossa contribuição e assenta modularmente num repositório
de versões não-persistente e num certificador externo de transações concorrentes.
Neste trabalho, apresentamos uma implementação do nosso sistema sobre
Apache Cassandra e, recorrendo a dois benchmarks representativos, YCBS e
TPC-C, medimos o custo do suporte do paradigma transacional com garantias
transacionais ACID.Fundação para a Ciência e a Tecnologia (FCT) - Project Stratus/FCOMP-01-0124-FEDER-015020; within project Pest/
FCOMP-01-0124-FEDER-022701.ERDF - European Regional Development
Fund through the COMPETE Programme (operational programme for competitiveness).European Union Seventh Framework
Programme (FP7) under grant agreement no 257993 (CumuloNimbo)
Seer: Empowering Software Defined Networking with Data Analytics
Network complexity is increasing, making network control and orchestration a
challenging task. The proliferation of network information and tools for data
analytics can provide an important insight into resource provisioning and
optimisation. The network knowledge incorporated in software defined networking
can facilitate the knowledge driven control, leveraging the network
programmability. We present Seer: a flexible, highly configurable data
analytics platform for network intelligence based on software defined
networking and big data principles. Seer combines a computational engine with a
distributed messaging system to provide a scalable, fault tolerant and
real-time platform for knowledge extraction. Our first prototype uses Apache
Spark for streaming analytics and open network operating system (ONOS)
controller to program a network in real-time. The first application we
developed aims to predict the mobility pattern of mobile devices inside a smart
city environment.Comment: 8 pages, 6 figures, Big data, data analytics, data mining, knowledge
centric networking (KCN), software defined networking (SDN), Seer, 2016 15th
International Conference on Ubiquitous Computing and Communications and 2016
International Symposium on Cyberspace and Security (IUCC-CSS 2016
Recommended from our members
The Design and Implementation of Low-Latency Prediction Serving Systems
Machine learning is being deployed in a growing number of applications which demand real- time, accurate, and cost-efficient predictions under heavy query load. These applications employ a variety of machine learning frameworks and models, often composing several models within the same application. However, most machine learning frameworks and systems are optimized for model training and not deployment.In this thesis, I discuss three prediction serving systems designed to meet the needs of modern interactive machine learning applications. The key idea in this work is to utilize a decoupled, layered design that interposes systems on top of training frameworks to build low-latency, scalable serving systems. Velox introduced this decoupled architecture to enable fast online learning and model personalization in response to feedback. Clipper generalized this system architecture to be framework-agnostic and introduced a set of optimizations to reduce and bound prediction latency and improve prediction throughput, accuracy, and robustness without modifying the underlying machine learning frameworks. And InferLine provisions and manages the individual stages of prediction pipelines to minimize cost while meeting end-to-end tail latency constraints
- …