480 research outputs found
MOON: MapReduce On Opportunistic eNvironments
Abstract—MapReduce offers a flexible programming model for processing and generating large data sets on dedicated resources, where only a small fraction of such resources are every unavailable at any given time. In contrast, when MapReduce is run on volunteer computing systems, which opportunistically harness idle desktop computers via frameworks like Condor, it results in poor performance due to the volatility of the resources, in particular, the high rate of node unavailability. Specifically, the data and task replication scheme adopted by existing MapReduce implementations is woefully inadequate for resources with high unavailability. To address this, we propose MOON, short for MapReduce On Opportunistic eNvironments. MOON extends Hadoop, an open-source implementation of MapReduce, with adaptive task and data scheduling algorithms in order to offer reliable MapReduce services on a hybrid resource architecture, where volunteer computing systems are supplemented by a small set of dedicated nodes. The adaptive task and data scheduling algorithms in MOON distinguish between (1) different types of MapReduce data and (2) different types of node outages in order to strategically place tasks and data on both volatile and dedicated nodes. Our tests demonstrate that MOON can deliver a 3-fold performance improvement to Hadoop in volatile, volunteer computing environments
Big Data and Large-scale Data Analytics: Efficiency of Sustainable Scalability and Security of Centralized Clouds and Edge Deployment Architectures
One of the significant shifts of the next-generation computing technologies will certainly be in
the development of Big Data (BD) deployment architectures. Apache Hadoop, the BD
landmark, evolved as a widely deployed BD operating system. Its new features include
federation structure and many associated frameworks, which provide Hadoop 3.x with the
maturity to serve different markets. This dissertation addresses two leading issues involved in
exploiting BD and large-scale data analytics realm using the Hadoop platform. Namely,
(i)Scalability that directly affects the system performance and overall throughput using
portable Docker containers. (ii) Security that spread the adoption of data protection practices
among practitioners using access controls. An Enhanced Mapreduce Environment (EME),
OPportunistic and Elastic Resource Allocation (OPERA) scheduler, BD Federation Access Broker
(BDFAB), and a Secure Intelligent Transportation System (SITS) of multi-tiers architecture for
data streaming to the cloud computing are the main contribution of this thesis study
PER-MARE: Adaptive Deployment of MapReduce over Pervasive Grids
International audienceMapReduce is a parallel programming paradigm successfully used to perform computations on massive amounts of data, being widely deployed on clusters, grid, and cloud infrastructures. Interestingly, while the emergence of cloud in- frastructures has opened new perspectives, several enterprises hesitate to put sensible data on the cloud and prefer to rely on internal resources. In this paper we introduce the PER- MARE initiative, which aims at proposing scalable techniques to support existent MapReduce data-intensive applications in the context of loosely coupled networks such as pervasive and desktop grids. By relying on the MapReduce programming model, PER-MARE proposes to explore the potential advan- tages of using free unused resources available at enterprises as pervasive grids, alone or in a hybrid environment. This paper presents the main lines that orient the PER-MARE approach and some preliminary results
PS-Sim: A Framework for Scalable Simulation of Participatory Sensing Data
Emergence of smartphone and the participatory sensing (PS) paradigm have
paved the way for a new variant of pervasive computing. In PS, human user
performs sensing tasks and generates notifications, typically in lieu of
incentives. These notifications are real-time, large-volume, and multi-modal,
which are eventually fused by the PS platform to generate a summary. One major
limitation with PS is the sparsity of notifications owing to lack of active
participation, thus inhibiting large scale real-life experiments for the
research community. On the flip side, research community always needs ground
truth to validate the efficacy of the proposed models and algorithms. Most of
the PS applications involve human mobility and report generation following
sensing of any event of interest in the adjacent environment. This work is an
attempt to study and empirically model human participation behavior and event
occurrence distributions through development of a location-sensitive data
simulation framework, called PS-Sim. From extensive experiments it has been
observed that the synthetic data generated by PS-Sim replicates real
participation and event occurrence behaviors in PS applications, which may be
considered for validation purpose in absence of the groundtruth. As a
proof-of-concept, we have used real-life dataset from a vehicular traffic
management application to train the models in PS-Sim and cross-validated the
simulated data with other parts of the same dataset.Comment: Published and Appeared in Proceedings of IEEE International
Conference on Smart Computing (SMARTCOMP-2018
MAPREDUCE CHALLENGES ON PERVASIVE GRIDS
International audienceThis study presents the advances on designing and implementing scalable techniques to support the development and execution of MapReduce application in pervasive distributed computing infrastructures, in the context of the PER-MARE project. A pervasive framework for MapReduce applications is very useful in practice, especially in those scientific, enterprises and educational centers which have many unused or underused computing resources, which can be fully exploited to solve relevant problems that demand large computing power, such as scientific computing applications, big data processing, etc. In this study, we pro-pose the study of multiple techniques to support volatility and heterogeneity on MapReduce, by applying two complementary approaches: Improving the Apache Hadoop middleware by including context-awareness and fault-tolerance features; and providing an alternative pervasive grid implementation, fully adapted to dynamic environments. The main design and implementation decisions for both alternatives are described and validated through experiments, demonstrating that our approaches provide high reliability when executing on pervasive environments. The analysis of the experiments also leads to several insights on the requirements and constraints from dynamic and volatile systems, reinforcing the importance of context-aware information and advanced fault-tolerance features to provide efficient and reliable MapReduce services on pervasive grids
Towards Distributed Mobile Computing
In the latest years, we observed an exponential growth of the market of the mobile devices. In this scenario, it assumes a particular relevance the rate at which mobile devices are replaced. According to the International Telecommunicaton Union in fact, smart-phone owners replace their device every 20 months, on average. The side effect of this trend is to deal with the disposal of an increasing amount of electronic devices which, in many cases, arestill working. We believe that it is feasible to recover such an unexploited computational power. Through a change of paradigm in fact, it is possible to achieve a two-fold objective: 1) extend the mobile devices lifetime, 2) enable a new opportunity to speed up mobile applications. In this paper we aim at providing a survey of state-of-art solutions aim at going in the direction of a Distributed Mobile Computing paradigm. We put in evidence the challenges to be addressed in order to implement this paradigm and we propose some possible future improvements
Resource provisioning in Science Clouds: Requirements and challenges
Cloud computing has permeated into the information technology industry in the
last few years, and it is emerging nowadays in scientific environments. Science
user communities are demanding a broad range of computing power to satisfy the
needs of high-performance applications, such as local clusters,
high-performance computing systems, and computing grids. Different workloads
are needed from different computational models, and the cloud is already
considered as a promising paradigm. The scheduling and allocation of resources
is always a challenging matter in any form of computation and clouds are not an
exception. Science applications have unique features that differentiate their
workloads, hence, their requirements have to be taken into consideration to be
fulfilled when building a Science Cloud. This paper will discuss what are the
main scheduling and resource allocation challenges for any Infrastructure as a
Service provider supporting scientific applications
Energy-efficient Transitional Near-* Computing
Studies have shown that communication networks, devices accessing the Internet, and data centers account for 4.6% of the worldwide electricity consumption.
Although data centers, core network equipment, and mobile devices are getting more energy-efficient, the amount of data that is being processed, transferred, and stored is vastly increasing.
Recent computer paradigms, such as fog and edge computing, try to improve this situation by processing data near the user, the network, the devices, and the data itself.
In this thesis, these trends are summarized under the new term near-* or near-everything computing.
Furthermore, a novel paradigm designed to increase the energy efficiency of near-* computing is proposed: transitional computing.
It transfers multi-mechanism transitions, a recently developed paradigm for a highly adaptable future Internet, from the field of communication systems to computing systems.
Moreover, three types of novel transitions are introduced to achieve gains in energy efficiency in near-* environments, spanning from private Infrastructure-as-a-Service (IaaS) clouds, Software-defined Wireless Networks (SDWNs) at the edge of the network, Disruption-Tolerant Information-Centric Networks (DTN-ICNs) involving mobile devices, sensors, edge devices as well as programmable components on a mobile System-on-a-Chip (SoC).
Finally, the novel idea of transitional near-* computing for emergency response applications is presented
to assist rescuers and affected persons during an emergency event or a disaster, although connections to cloud services and social networks might be disturbed by network outages, and network bandwidth and battery power of mobile devices might be limited
- …