99 research outputs found
Grid Global Behavior Prediction
Complexity has always been one of the most important issues in distributed computing. From the first clusters to grid and now cloud computing, dealing correctly and efficiently with system complexity is the key to taking technology a step further. In this sense, global behavior modeling is an innovative methodology aimed at understanding the grid behavior. The main objective of this methodology is to synthesize the grid's vast, heterogeneous nature into a simple but powerful behavior model, represented in the form of a single, abstract entity, with a global state. Global behavior modeling has proved to be very useful in effectively managing grid complexity but, in many cases, deeper knowledge is needed. It generates a descriptive model that could be greatly improved if extended not only to explain behavior, but also to predict it. In this paper we present a prediction methodology whose objective is to define the techniques needed to create global behavior prediction models for grid systems. This global behavior prediction can benefit grid management, specially in areas such as fault tolerance or job scheduling. The paper presents experimental results obtained in real scenarios in order to validate this approach
Enhanced Failure Detection Mechanism in MapReduce
The popularity of MapReduce programming model has increased interest in the research community for its improvement. Among the other directions, the point of fault tolerance, concretely the failure detection issue seems to be a crucial one, but that until now has not reached its satisfying level. Motivated by this, I decided to devote my main research during this period into having a prototype system architecture of MapReduce framework with a new failure detection service, containing both analytical (theoretical) and implementation part. I am confident that this work should lead the way for further contributions in detecting failures to any NoSQL App frameworks, and cloud storage systems in general
Arquitectura multiagente para E/S de alto rendimiento en clusters
La E/S constituye en la actualidad uno de los principales cuellos de botella de los sistemas distribuidos
de propósito general, debido al desequilibrio existente entre el tiempo de cómputo y de E/S. Una de
las soluciones propuestas para este problema ha sido el uso de la E/S paralela. En esta área, se han
originado un gran número de bibliotecas de E/S paralela y sistemas de ficheros paralelos.
Este tipo de sistemas adolecen de algunos defectos y carencias. Muchos de ellos están concebidos
para máquinas paralelas y no se integran adecuadamente en entornos distribuidos y clusters. El uso
intensivo de clusters de estaciones de trabajo durante estos últimos años hace que este tipo de sistemas
no sean adecuados en el escenario de computación actual.
Otros sistemas, que se adaptan a este tipo de entornos, no incluyen capacidades de reconfiguración
dinámica, por lo que tienen una funcionalidad limitada.
Por último, la mayoría de los sistemas de E/S que utilizan diferentes optimizaciones de E/S, no
ofrecen flexibilidad a las aplicaciones para hacer uso de las mismas, intentando ocultar al usuario
este tipo de técnicas. No obstante, a fin de optimizar las operaciones de E/S, es importante que las
aplicaciones sean capaces de describir sus patrones de acceso, interactuando con el sistema de E/S.
En otro ámbito, dentro del área de los sistemas distribuidos se encuentra el paradigma de agentes,
que permite dotar a las aplicaciones de un conjunto de propiedades muy adecuadas para su adaptación
a entornos complejos y dinámicos. Las características de este paradigma lo hacen a priori prometedor
para abordar algunos de los problemas existentes en el campo de la E/S paralela.
Esta tesis propone una solución a la problemática actual de E/S a través de tres líneas principales:
(i) el uso de la teoría de agentes en sistemas de E/S de alto rendimiento, (ii) la definición de un
formalismo que permita la reconfiguración dinámica de nodos de almacenamiento en un cluster y
(iii) el uso de técnicas de optimización de E/S configurables y orientadas a las aplicaciones
Parallel and Distributed Data Management. Introduction
The manipulation and handling of an ever increasing volume of data by current data-intensive applications require novel techniques for e?cient data management. Despite recent advances in every aspect of data management (storage, access, querying, analysis, mining), future applications are expected to scale to even higher degrees, not only in terms of volumes of data handled but also in terms of users and resources, often making use of multiple, pre-existing autonomous, distributed or heterogeneous resources
Harmony: Towards automated self-adaptive consistency in cloud storage
In just a few years cloud computing has become a very popular paradigm and a business success story, with storage being one of the key features. To achieve high data availability, cloud storage services rely on replication. In this context, one major challenge is data consistency. In contrast to traditional approaches that are mostly based on strong consistency, many cloud storage services opt for weaker consistency models in order to achieve better availability and performance. This comes at the cost of a high probability of stale data being read, as the replicas involved in the reads may not always have the most recent write. In this paper, we propose a novel approach, named Harmony, which adaptively tunes the consistency level at run-time according to the application requirements. The key idea behind Harmony is an intelligent estimation model of stale reads, allowing to elastically scale up or down the number of replicas involved in read operations to maintain a low (possibly zero) tolerable fraction of stale reads. As a result, Harmony can meet the desired consistency of the applications while achieving good performance. We have implemented Harmony and performed extensive evaluations with the Cassandra cloud storage on Grid?5000 testbed and on Amazon EC2. The results show that Harmony can achieve good performance without exceeding the tolerated number of stale reads. For instance, in contrast to the static eventual consistency used in Cassandra, Harmony reduces the stale data being read by almost 80% while adding only minimal latency. Meanwhile, it improves the throughput of the system by 45% while maintaining the desired consistency requirements of the applications when compared to the strong consistency model in Cassandra
Using Global Behavior Modeling to Improve QoS in Cloud Data Storage Services
The cloud computing model aims to make large-scale data-intensive computing affordable even for users with limited financial resources, that cannot invest into expensive infrastructures necesssary to run them. In this context, MapReduce is emerging as a highly scalable programming paradigm that enables high-throughput data-intensive processing as a cloud service. Its performance is highly dependent on the underlying storage service, responsible to efficiently support massively parallel data accesses by guaranteeing a high throughput under heavy access concurrency. In this context, quality of service plays a crucial role: the storage service needs to sustain a stable throughput for each individual accesss, in addition to achieving a high aggregated throughput under concurrency. In this paper we propose a technique to address this problem using component monitoring, application-side feedback and behavior pattern analysis to automatically infer useful knowledge about the causes of poor quality of service and provide an easy way to reason in about potential improvements. We apply our proposal to Blob Seer, a representative data storage service specifically designed to achieve high aggregated throughputs and show through extensive experimentation substantial improvements in the stability of individual data read accesses under MapReduce workloads
GMonE: a complete approach to cloud monitoring
The inherent complexity of modern cloud infrastructures has created the need for innovative monitoring approaches, as state-of-the-art solutions used for other large-scale
environments do not address specific cloud features. Although cloud monitoring is nowadays an active research field, a comprehensive study covering all its aspects has
not been presented yet. This paper provides a deep insight into cloud monitoring. It proposes a unified cloud monitoring taxonomy, based on which it defines a layered
cloud monitoring architecture. To illustrate it, we have implemented GMonE, a general-purpose cloud monitoring tool which covers all aspects of cloud monitoring by specifically addressing the needs of modern cloud infrastructures. Furthermore, we have evaluated the performance, scalability and overhead of GMonE with Yahoo
Cloud Serving Benchmark (YCSB), by using the OpenNebula cloud middleware on the Grid’5000 experimental testbed. The results of this evaluation demonstrate the benefits of our approach, surpassing the monitoring performance and capabilities of cloud monitoring alternatives such as those present in state-of-the-art systems such as Amazon EC2 and OpenNebula
An autonomic framework for enhancing the quality of data grid services
Data grid services have been used to deal with the increasing needs of applications in terms of data volume and throughput. The large scale, heterogeneity and dynamism of grid environments often make management and tuning of these data services very complex. Furthermore, current high-performance I/O approaches are characterized by their high complexity and specific features that usually require specialized administrator skills. Autonomic computing can help manage this complexity. The present paper describes an autonomic subsystem intended to provide self-management features aimed at efficiently reducing the I/O problem in a grid environment, thereby enhancing the quality of service (QoS) of data access and storage services in the grid. Our proposal takes into account that data produced in an I/O system is not usually immediately required. Therefore, performance improvements are related not only to current but also to any future I/O access, as the actual data access usually occurs later on. Nevertheless, the exact time of the next I/O operations is unknown. Thus, our approach proposes a long-term prediction designed to forecast the future workload of grid components. This enables the autonomic subsystem to determine the optimal data placement to improve both current and future I/O operations
Towards efficient localization of dynamic replicas for Geo-Distributed data stores
Large-scale scientific experiments increasingly rely on geo- distributed clouds to serve relevant data to scientists world- wide with minimal latency. State-of-the-art caching systems often require the client to access the data through a caching proxy, or to contact a metadata server to locate the closest available copy of the desired data. Also, such caching sys- tems are inconsistent with the design of distributed hash- table databases such as Dynamo, which focus on allowing clients to locate data independently. We argue there is a gap between existing state-of-the-art solutions and the needs of geographically distributed applications, which require fast access to popular objects while not degrading access latency for the rest of the data. In this paper, we introduce a proba- bilistic algorithm allowing the user to locate the closest copy of the data e?ciently and independently with minimal over- head, allowing low-latency access to non-cached data. Also, we propose a network-e?cient technique to identify the most popular data objects in the cluster and trigger their replica- tion close to the clients. Experiments with a real-world data set show that these principles allow clients to locate the clos- est available copy of data with small memory footprint and low error-rate, thus improving read-latency for non-cached data and allowing hot data to be read locally
Estrategia de formación ambiental del Centro de Investigaciones y Servicios Ambientales Ecovida. Validación
From the need to improve the environmental training processes in the Center for Environmental Research and Services, Ecovida an investigation was developed based on the problem identified: How to contribute from the popular environmental education to the improvement of the environmental training of Ecovida for its incidence in strategic social actors and scientific processes that are managed ?; outlining the following objective: to base a formative strategy supported in the theoretical-methodological conception of the environmental popular education directed to the groups of strategic social actors and scientific processes of the institution. This strategy opted for the integration of knowledge, disciplines and processes from a participatory methodology, which allowed to complete cycles in synergy with training, research, and introduction of scientific results, favoring critical thinking and a systemic approach in environmental training. For its development and implementation, empirical methods were used such as: Research-action-participation, documentary analysis, Delphis method and techniques such as: discussion groups, Ishikawa, the SWOT matrix, the semantic differential and the focus group that facilitated, analysis, processing of the information that was produced, collective construction and validation of the strategy. The results obtained contributed to the satisfaction of learning demands of the groups of strategic actors, the integration of scientific processes in the institution and the strengthening of the capacities and abilities of the students for the development of participatory environmental management practices with clarity of roles and political commitment.Desde la necesidad de perfeccionar los procesos de formación ambiental en el Centro de Investigaciones y Servicios Ambientales, Ecovida se desarrolló una investigación, a partir del problema identificado: ¿Cómo contribuir desde la educación popular ambiental al perfeccionamiento de la formación ambiental de Ecovida para su incidencia en actores sociales estratégicos y procesos científicos que se gestan?; trazándose el objetivo de: fundamentar una estrategia formativa sustentada en la concepción teórico-metodológica de la educación popular ambiental dirigida a los grupos de actores sociales estratégicos y procesos científicos de la institución. Esta estrategia apostó por la integración de saberes, disciplinas y procesos desde una metodología participativa, que permitió completar ciclos en sinergia con la formación, investigación, e introducción de resultados científicos, favoreciendo el pensamiento crítico y el enfoque sistémico en la formación ambiental. Para su desarrollo e implementación, se utilizaron métodos empíricos como: la Investigación-acción-participación, el análisis documental, método Delphis y técnicas como: grupos de discusión, Ishikawa, la matriz FODA, el diferencial semántico y el grupo focal que facilitaron, análisis, procesamiento de la información que se produjo, construcción colectiva y validación de la estrategia. Los resultados obtenidostributaron a la satisfacción de demandas de aprendizajes de los grupos de actores estratégicos, la integración de los procesos científicos en la institución y el fortalecimiento de las capacidades y habilidades de los educandos para el desarrollo de prácticas de gestión ambiental participativa con claridades de roles y compromiso político
- …