656 research outputs found

    Methods to enhance content distribution for very large scale online communities

    Get PDF
    The Internet has experienced an exponential growth in the last years, and its number of users far from decay keeps on growing. Popular Web 2.0 services such as Facebook, YouTube or Twitter among others sum millions of users and employ vast infrastructures deployed worldwide. The size of these infrastructures is getting huge in order to support such a massive number of users. This increment of the infrastructure size has brought new problems regarding scalability, power consumption, cooling, hardware lifetime, underutilization, investment recovery, etc. Owning this kind of infrastructures is not always affordable nor convenient. This could be a major handicap for starting projects with a humble budget whose success is based on reaching a large audience. However, current technologies might permit to deploy vast infrastructures reducing their cost. We refer to peer-to-peer networks and cloud computing. Peer-to-peer systems permit users to yield their own resources to distributed infrastructures. These systems have demonstrated to be a valuable choice capable of distributing vast amounts of data to large audiences with a minimal starting infrastructure. Nevertheless, aspects such as content availability cannot be controlled in these systems, whereas classic server infrastructures can improve this aspect. In the recent time, the cloud has been revealed as a promising paradigm for hosting horizontally scalable Web systems. The cloud offers elastic capabilities that permit to save costs by adapting the number of resources to the incoming demand. Additionally, the cloud makes accessible a vast amount of resources that may be employed on peak workloads. However, how to determine the amount of resources to use remains a challenge. In this thesis, we describe a hierarchical architecture that combines both: peer-to-peer and elastic server infrastructures in order to enhance content distribution. The peer-topeer infrastructure brings a scalable solution that reduces the workload in the servers, while the server infrastructure assures availability and reduces costs varying its size when necessary. We propose a distributed collaborative caching infrastructure that employs a clusterbased locality-aware self-organizing P2P system. This system, leverages collaborative data classification in order to improve content locality. Our evaluation demonstrates that incrementing data locality permits to improve data search while reducing traffic. We explore the utilization of elastic server infrastructures addressing three issues: system sizing, data grouping and content distribution. We propose novel multi-model techniques for hierarchical workload prediction. These predictions are employed to determine the system size and request distribution policies. Additionally, we propose novel techniques for adaptive control that permit to identify inaccurate models and redefine them. Our evaluation using traces extracted from real systems indicate that the utilization of a hierarchy of multiple models increases prediction accuracy. This hierarchy in conjunction with our adaptive control techniques increments the accuracy during unexpected workload variations. Finally, we demonstrate that locality-aware request distribution policies can take advantage of prediction models to adequate content distribution independently of the system size

    A Middleware framework for self-adaptive large scale distributed services

    Get PDF
    Modern service-oriented applications demand the ability to adapt to changing conditions and unexpected situations while maintaining a required QoS. Existing self-adaptation approaches seem inadequate to address this challenge because many of their assumptions are not met on the large-scale, highly dynamic infrastructures where these applications are generally deployed on. The main motivation of our research is to devise principles that guide the construction of large scale self-adaptive distributed services. We aim to provide sound modeling abstractions based on a clear conceptual background, and their realization as a middleware framework that supports the development of such services. Taking the inspiration from the concepts of decentralized markets in economics, we propose a solution based on three principles: emergent self-organization, utility driven behavior and model-less adaptation. Based on these principles, we designed Collectives, a middleware framework which provides a comprehensive solution for the diverse adaptation concerns that rise in the development of distributed systems. We tested the soundness and comprehensiveness of the Collectives framework by implementing eUDON, a middleware for self-adaptive web services, which we then evaluated extensively by means of a simulation model to analyze its adaptation capabilities in diverse settings. We found that eUDON exhibits the intended properties: it adapts to diverse conditions like peaks in the workload and massive failures, maintaining its QoS and using efficiently the available resources; it is highly scalable and robust; can be implemented on existing services in a non-intrusive way; and do not require any performance model of the services, their workload or the resources they use. We can conclude that our work proposes a solution for the requirements of self-adaptation in demanding usage scenarios without introducing additional complexity. In that sense, we believe we make a significant contribution towards the development of future generation service-oriented applications.Las Aplicaciones Orientadas a Servicios modernas demandan la capacidad de adaptarse a condiciones variables y situaciones inesperadas mientras mantienen un cierto nivel de servio esperado (QoS). Los enfoques de auto-adaptación existentes parecen no ser adacuados debido a sus supuestos no se cumplen en infrastructuras compartidas de gran escala. La principal motivación de nuestra investigación es inerir un conjunto de principios para guiar el desarrollo de servicios auto-adaptativos de gran escala. Nuesto objetivo es proveer abstraciones de modelaje apropiadas, basadas en un marco conceptual claro, y su implemetnacion en un middleware que soporte el desarrollo de estos servicios. Tomando como inspiración conceptos económicos de mercados decentralizados, hemos propuesto una solución basada en tres principios: auto-organización emergente, comportamiento guiado por la utilidad y adaptación sin modelos. Basados en estos principios diseñamos Collectives, un middleware que proveer una solución exhaustiva para los diversos aspectos de adaptación que surgen en el desarrollo de sistemas distribuidos. La adecuación y completitud de Collectives ha sido provada por medio de la implementación de eUDON, un middleware para servicios auto-adaptativos, el ha sido evaluado de manera exhaustiva por medio de un modelo de simulación, analizando sus propiedades de adaptación en diversos escenarios de uso. Hemos encontrado que eUDON exhibe las propiedades esperadas: se adapta a diversas condiciones como picos en la carga de trabajo o fallos masivos, mateniendo su calidad de servicio y haciendo un uso eficiente de los recusos disponibles. Es altamente escalable y robusto; puedeoo ser implementado en servicios existentes de manera no intrusiva; y no requiere la obtención de un modelo de desempeño para los servicios. Podemos concluir que nuestro trabajo nos ha permitido desarrollar una solucion que aborda los requerimientos de auto-adaptacion en escenarios de uso exigentes sin introducir complejidad adicional. En este sentido, consideramos que nuestra propuesta hace una contribución significativa hacia el desarrollo de la futura generación de aplicaciones orientadas a servicios.Postprint (published version

    A decentralized framework for cross administrative domain data sharing

    Get PDF
    Federation of messaging and storage platforms located in remote datacenters is an essential functionality to share data among geographically distributed platforms. When systems are administered by the same owner data replication reduces data access latency bringing data closer to applications and enables fault tolerance to face disaster recovery of an entire location. When storage platforms are administered by different owners data replication across different administrative domains is essential for enterprise application data integration. Contents and services managed by different software platforms need to be integrated to provide richer contents and services. Clients may need to share subsets of data in order to enable collaborative analysis and service integration. Platforms usually include proprietary federation functionalities and specific APIs to let external software and platforms access their internal data. These different techniques may not be applicable to all environments and networks due to security and technological restrictions. Moreover the federation of dispersed nodes under a decentralized administration scheme is still a research issue. This thesis is a contribution along this research direction as it introduces and describes a framework, called \u201cWideGroups\u201d, directed towards the creation and the management of an automatic federation and integration of widely dispersed platform nodes. It is based on groups to exchange messages among distributed applications located in different remote datacenters. Groups are created and managed using client side programmatic configuration without touching servers. WideGroups enables the extension of the software platform services to nodes belonging to different administrative domains in a wide area network environment. It lets different nodes form ad-hoc overlay networks on-the-fly depending on message destinations located in distinct administrative domains. It supports multiple dynamic overlay networks based on message groups, dynamic discovery of nodes and automatic setup of overlay networks among nodes with no server-side configuration. I designed and implemented platform connectors to integrate the framework as the federation module of Message Oriented Middleware and Key Value Store platforms, which are among the most widespread paradigms supporting data sharing in distributed systems

    Single system image servers on top of clusters of PCs

    Get PDF

    A STUDY ON DATA STREAMING IN FOG COMPUTING ENVIRONMENT

    Get PDF
    In lately years, data streaming is become more important day by day, considering technologies employed to servethat manner and share number of terminals within the system either direct or indirect interacting with them.Smart devices now play active role in the data streaming environment as well as fog and cloud compatibility. It is affectingthe data collectivity and appears clearly with the new technologies provided and the increase for the number of theusers of such systems. This is due to the number of the users and resources available system start to employ the computationalpower to the fog for moving the computational power to the network edge. It is adopted to connect system that streamed dataas an object. Those inter-connected objects are expected to be producing more significant data streams, which are produced atunique rates, in some cases for being analyzed nearly in real time. In the presented paper a survey of data streaming systemstechnologies is introduced. It clarified the main notions behind big data stream concepts as well as fog computing. From thepresented study, the industrial and research communities are capable of gaining information about requirements for creatingFog computing environment with a clearer view about managing resources in the Fog.The main objective of this paper is to provide short brief and information about Data Streaming in Fog ComputingEnvironment with explaining the major research field within this meaning

    Combining Mobile Agents and Process-based Coordination to Achieve Software Adaptation

    Get PDF
    We have developed a model and a platform for end-to-end run-time monitoring, behavior and performance analysis, and consequent dynamic adaptation of distributed applications. This paper concentrates on how we coordinate and actuate the potentially multi-part adaptation, operating externally to the target systems, that is, without requiring any a priori built-in adaptation facilities on the part of said target systems. The actual changes are performed on the fly onto the target by communities of mobile software agents, coordinated by a decentralized process engine. These changes can be coarse-grained, such as replacing entire components or rearranging the connections among components, or fine-grained, such as changing the operational parameters, internal state and functioning logic of individual components. We discuss our successful experience using our approach in dynamic adaptation of a large-scale commercial application, which requires both coarse and fine grained modifications

    Scalable and Distributed Resource Management Protocols for Cloud and Big Data Clusters

    Get PDF
    Cloud data centers require an operating system to manage resources and satisfy operational requirements and management objectives. The growth of popularity in cloud services causes the appearance of a new spectrum of services with sophisticated workload and resource management requirements. Also, data centers are growing by addition of various type of hardware to accommodate the ever-increasing requests of users. Nowadays a large percentage of cloud resources are executing data-intensive applications which need continuously changing workload fluctuations and specific resource management. To this end, cluster computing frameworks are shifting towards distributed resource management for better scalability and faster decision making. Such systems benefit from the parallelization of control and are resilient to failures. Throughout this thesis we investigate algorithms, protocols and techniques to address these challenges in large-scale data centers. We introduce a distributed resource management framework which consolidates virtual machine to as few servers as possible to reduce the energy consumption of data center and hence decrease the cost of cloud providers. This framework can characterize the workload of virtual machines and hence handle trade-off energy consumption and Service Level Agreement (SLA) of customers efficiently. The algorithm is highly scalable and requires low maintenance cost with dynamic workloads and it tries to minimize virtual machines migration costs. We also introduce a scalable and distributed probe-based scheduling algorithm for Big data analytics frameworks. This algorithm can efficiently address the problem job heterogeneity in workloads that has appeared after increasing the level of parallelism in jobs. The algorithm is massively scalable and can reduce significantly average job completion times in comparison with the-state of-the-art. Finally, we propose a probabilistic fault-tolerance technique as part of the scheduling algorithm

    A distributed platform for the volunteer execution of workflows on a local area network

    Get PDF
    Thesis submitted in fulfilment of the requirements for the Degree of Master of Science in Computer ScienceAlbatroz Engineering has developed a framework for over-head power lines inspection data acquisition and analysis, which includes hardware and software. The framework’s software components include inspection data analysis and reporting tools, commonly known as PLMI2 application/platform. In PLMI2, the analysis of over-head power line maintenance inspection data consists of a sequence of Automatic Tasks (ATs) interleaved with Manual Tasks (MTs). An AT consists of a set of algorithms that receives as input one or more datasets, processes them and returns new datasets. In turn, an MT enables human supervisors (also known as lines inspection operators) to correct, improve and validate the results of ATs. ATs run faster than MTs and in the overall work cycle, ATs take less than 10% of total processing time, but still take a few minutes. There is data flow dependency among tasks, which can be modelled with a workflow and even if MTs are omitted from this workflow, it is possible to carry the sequence of ATs, postponing MTs. In fact, if the computing cost and waiting time are negligible, it may be advantageous to run ATs earlier in the workflow, prior to validation. To address this opportunity, Albatroz Engineering has invested in a new procedure to stream the data through all ATs fully unattended. Considering these scenarios, it could be useful to have a system capable of detecting available workstations at a given instant and subsequently distribute the ATs to them. In this way, operators could schedule the execution of future ATs for a given inspection data, while they are performing MTs of another. The requirements of the system to implement fall within the field Volunteer Computing Systems and we will address some of the challenges posed by these kinds of systems, namely the hosts volatility and failures. Volunteer Computing is a type of distributed computing which exploits idle CPU cycles from computing resources donated by volunteers and connected through the Internet/Intranet to compute large-scale simulations. This thesis proposes and designs a new distributed task scheduling system in the context of Volunteer Computing Systems, able to schedule the ATs of PLMI2 and exploit idle CPU cycles from workstations within the company’s local area network (LAN) to accelerate the data analysis, being aware of data flow interdependencies. To evaluate the proposed system, a prototype has been implemented, and the simulations results have shown that it is scalable and supports fault-tolerance of tasks execution, by employing the rescheduling mechanism
    corecore