38 research outputs found
Middleware-based Database Replication: The Gaps between Theory and Practice
The need for high availability and performance in data management systems has
been fueling a long running interest in database replication from both academia
and industry. However, academic groups often attack replication problems in
isolation, overlooking the need for completeness in their solutions, while
commercial teams take a holistic approach that often misses opportunities for
fundamental innovation. This has created over time a gap between academic
research and industrial practice.
This paper aims to characterize the gap along three axes: performance,
availability, and administration. We build on our own experience developing and
deploying replication systems in commercial and academic settings, as well as
on a large body of prior related work. We sift through representative examples
from the last decade of open-source, academic, and commercial database
replication systems and combine this material with case studies from real
systems deployed at Fortune 500 customers. We propose two agendas, one for
academic research and one for industrial R&D, which we believe can bridge the
gap within 5-10 years. This way, we hope to both motivate and help researchers
in making the theory and practice of middleware-based database replication more
relevant to each other.Comment: 14 pages. Appears in Proc. ACM SIGMOD International Conference on
Management of Data, Vancouver, Canada, June 200
Enabling Distributed Applications Optimization in Cloud Environment
The past few years have seen dramatic growth in the popularity of public clouds, such as Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS), and Container-as-a-Service (CaaS). In both commercial and scientific fields, quick environment setup and application deployment become a mandatory requirement. As a result, more and more organizations choose cloud environments instead of setting up the environment by themselves from scratch. The cloud computing resources such as server engines, orchestration, and the underlying server resources are served to the users as a service from a cloud provider. Most of the applications that run in public clouds are the distributed applications, also called multi-tier applications, which require a set of servers, a service ensemble, that cooperate and communicate to jointly provide a certain service or accomplish a task. Moreover, a few research efforts are conducting in providing an overall solution for distributed applications optimization in the public cloud.
In this dissertation, we present three systems that enable distributed applications optimization: (1) the first part introduces DocMan, a toolset for detecting containerized application’s dependencies in CaaS clouds, (2) the second part introduces a system to deal with hot/cold blocks in distributed applications, (3) the third part introduces a system named FP4S, a novel fragment-based parallel state recovery mechanism that can handle many simultaneous failures for a large number of concurrently running stream applications
Component replication in application servers
Three-tier middleware architecture is commonly used for hosting large-scale distributed applications. Typically the application is decomposed into three layers: front-end, middle tier and back-end. Front-end ("Web server") is responsible for handling user interactions and acts as a client of the middle tier, while back-end provides storage facilities for applications. Middle tier (' Application server') is usually the place where all computations are performed, so this layer provides middleware services for transactions, security and so forth. The benefit of this architecture is that it allows flexible configuration such as partitioning and clustering for improved performance and scalability. On this architecture, availability measures, such as replication, can be introduced in each tier in an application specific manner. Among the three tier described above, the availability of the middle tier and the back-end tier are the most important, as these tiers provide the computation and the data for the applications. This thesis investigates how replication for availability can be incorporated within the middle and back-end tiers. The replication mechanisms must guarantee exactly once execution of user request despite failures of application and database servers. The thesis develops an approach that requires enhancements to the middle tier only for supporting replication of both the tiers. The design, implementation and performance evaluation of such a middle tier based replication scheme for multi-database transactions on a widely deployed open source application server (1Boss) are presented.EThOS - Electronic Theses Online ServiceQUE Project, Department of Informatics, ITB, Bandung, IndonesiaGBUnited Kingdo
Support Efficient, Scalable, and Online Social Spam Detection in System
The broad success of online social networks (OSNs) has created fertile soil for the emergence and fast spread of social spam. Fake news, malicious URL links, fraudulent advertisements, fake reviews, and biased propaganda are bringing serious consequences for both virtual social networks and human life in the real world. Effectively detecting social spam is a hot topic in both academia and industry. However, traditional social spam detection techniques are limited to centralized processing on top of one specific data source but ignore the social spam correlations of distributed data sources. Moreover, a few research efforts are conducting in integrating the stream system (e.g., Storm, Spark) with the large-scale social spam detection, but they typically ignore the specific details in managing and recovering interim states during the social stream data processing. We observed that social spammers who aim to advertise their products or post victim links are more frequently spreading malicious posts during a very short period of time. They are quite smart to adapt themselves to old models that were trained based on historical records. Therefore, these bring a question: how can we uncover and defend against these online spam activities in an online and scalable manner? In this dissertation, we present there systems that support scalable and online social spam detection from streaming social data: (1) the first part introduces
Oases, a scalable system that can support large-scale online social spam detection, (2) the second part introduces a system named SpamHunter, a novel system that supports efficient online scalable spam detection in social networks. The system gives novel insights in guaranteeing the efficiency of the modern stream applications by leveraging the spam correlations at scale, and (3) the third part refers to the state recovery during social spam detection, it introduces a customizable state recovery framework that provides fast and scalable state recovery mechanisms for protecting large distributed states in social spam detection applications
Scalable and Highly Available Database Systems in the Cloud
Cloud computing allows users to tap into a massive pool of shared computing
resources such as servers, storage, and network. These resources are provided as a
service to the users allowing them to “plug into the cloud” similar to a utility grid.
The promise of the cloud is to free users from the tedious and often complex task of
managing and provisioning computing resources to run applications. At the same
time, the cloud brings several additional benefits including: a pay-as-you-go cost
model, easier deployment of applications, elastic scalability, high availability, and
a more robust and secure infrastructure.
One important class of applications that users are increasingly deploying in
the cloud is database management systems. Database management systems differ
from other types of applications in that they manage large amounts of state that
is frequently updated, and that must be kept consistent at all scales and in the
presence of failure. This makes it difficult to provide scalability and high availability
for database systems in the cloud. In this thesis, we show how we can exploit
cloud technologies and relational database systems to provide a highly available
and scalable database service in the cloud.
The first part of the thesis presents RemusDB, a reliable, cost-effective high
availability solution that is implemented as a service provided by the virtualization
platform. RemusDB can make any database system highly available with little or
no code modifications by exploiting the capabilities of virtualization. In the second
part of the thesis, we present two systems that aim to provide elastic scalability
for database systems in the cloud using two very different approaches. The three
systems presented in this thesis bring us closer to the goal of building a scalable
and reliable transactional database service in the cloud
Model-driven Fault-Tolerance Provisioning for Component-based Distributed Real-time Embedded Systems
MACHS: Mitigating the Achilles Heel of the Cloud through High Availability and Performance-aware Solutions
Cloud computing is continuously growing as a business model for hosting information and communication technology applications. However, many concerns arise regarding the quality of service (QoS) offered by the cloud. One major challenge is the high availability (HA) of cloud-based applications. The key to achieving availability requirements is to develop an approach that is immune to cloud failures while minimizing the service level agreement (SLA) violations. To this end, this thesis addresses the HA of cloud-based applications from different perspectives. First, the thesis proposes a component’s HA-ware scheduler (CHASE) to manage the deployments of carrier-grade cloud applications while maximizing their HA and satisfying the QoS requirements. Second, a Stochastic Petri Net (SPN) model is proposed to capture the stochastic characteristics of cloud services and quantify the expected availability offered by an application deployment. The SPN model is then associated with an extensible policy-driven cloud scoring system that integrates other cloud challenges (i.e. green and cost concerns) with HA objectives. The proposed HA-aware solutions are extended to include a live virtual machine migration model that provides a trade-off between the migration time and the downtime while maintaining HA objective. Furthermore, the thesis proposes a generic input template for cloud simulators, GITS, to facilitate the creation of cloud scenarios while ensuring reusability, simplicity, and portability. Finally, an availability-aware CloudSim extension, ACE, is proposed. ACE extends CloudSim simulator with failure injection, computational paths, repair, failover, load balancing, and other availability-based modules