477 research outputs found
JUMMP: Job Uninterrupted Maneuverable MapReduce Platform
In this paper, we present JUMMP, the Job Uninterrupted
Maneuverable MapReduce Platform, an automated
scheduling platform that provides a customized Hadoop environment
within a batch-scheduled cluster environment. JUMMP
enables an interactive pseudo-persistent MapReduce platform
within the existing administrative structure of an academic high
performance computing center by “jumping” between nodes with
minimal administrative effort. Jumping is implemented by the
synchronization of stopping and starting daemon processes on
different nodes in the cluster. Our experimental evaluation shows
that JUMMP can be as efficient as a persistent Hadoop cluster
on dedicated computing resources, depending on the jump time.
Additionally, we show that the cluster remains stable, with good
performance, in the presence of jumps that occur as frequently
as the average length of reduce tasks of the currently executing
MapReduce job. JUMMP provides an attractive solution to
academic institutions that desire to integrate Hadoop into their
current computing environment within their financial, technical,
and administrative constraints
Scalable Audience Reach Estimation in Real-time Online Advertising
Online advertising has been introduced as one of the most efficient methods
of advertising throughout the recent years. Yet, advertisers are concerned
about the efficiency of their online advertising campaigns and consequently,
would like to restrict their ad impressions to certain websites and/or certain
groups of audience. These restrictions, known as targeting criteria, limit the
reachability for better performance. This trade-off between reachability and
performance illustrates a need for a forecasting system that can quickly
predict/estimate (with good accuracy) this trade-off. Designing such a system
is challenging due to (a) the huge amount of data to process, and, (b) the need
for fast and accurate estimates. In this paper, we propose a distributed fault
tolerant system that can generate such estimates fast with good accuracy. The
main idea is to keep a small representative sample in memory across multiple
machines and formulate the forecasting problem as queries against the sample.
The key challenge is to find the best strata across the past data, perform
multivariate stratified sampling while ensuring fuzzy fall-back to cover the
small minorities. Our results show a significant improvement over the uniform
and simple stratified sampling strategies which are currently widely used in
the industry
Designing, Building, and Modeling Maneuverable Applications within Shared Computing Resources
Extending the military principle of maneuver into war-fighting domain of cyberspace, academic and military researchers have produced many theoretical and strategic works, though few have focused on researching actual applications and systems that apply this principle. We present our research in designing, building and modeling maneuverable applications in order to gain the system advantages of resource provisioning, application optimization, and cybersecurity improvement. We have coined the phrase “Maneuverable Applications” to be defined as distributed and parallel application that take advantage of the modification, relocation, addition or removal of computing resources, giving the perception of movement. Our work with maneuverable applications has been within shared computing resources, such as the Clemson University Palmetto cluster, where multiple users share access and time to a collection of inter-networked computers and servers. In this dissertation, we describe our implementation and analytic modeling of environments and systems to maneuver computational nodes, network capabilities, and security enhancements for overcoming challenges to a cyberspace platform. Specifically we describe our work to create a system to provision a big data computational resource within academic environments. We also present a computing testbed built to allow researchers to study network optimizations of data centers. We discuss our Petri Net model of an adaptable system, which increases its cybersecurity posture in the face of varying levels of threat from malicious actors. Lastly, we present work and investigation into integrating these technologies into a prototype resource manager for maneuverable applications and validating our model using this implementation
MaxHadoop: An Efficient Scalable Emulation Tool to Test SDN Protocols in Emulated Hadoop Environments
AbstractThis paper presents MaxHadoop, a flexible and scalable emulation tool, which allows the efficient and accurate emulation of Hadoop environments over Software Defined Networks (SDNs). Hadoop has been designed to manage endless data-streams over networks, making it a tailored candidate to support the new class of network services belonging to Big Data. The development of Hadoop is contemporary with the evolution of networks towards the new architectures "Software Defined." To create our emulation environment, tailored to SDNs, we employ MaxiNet, given its capability of emulating large-scale SDNs. We make it possible to emulate realistic Hadoop scenarios on large-scale SDNs using low-cost commodity hardware, by resolving a few key limitations of MaxiNet through appropriate configuration settings. We validate the MaxHadoop emulator by executing two benchmarks, namely WordCount and TeraSort, to evaluate a set of Key Performance Indicators. The tests' outcomes evidence that MaxHadoop outperforms other existing emulation tools running over commodity hardware. Finally, we show the potentiality of MaxHadoop by utilizing it to perform a comparison of SDN-based network protocols
Recommended from our members
Novel information and data exchange within power systems using enhanced blockchain technologies
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonCurrent energy systems are primarily designed for centralized power generation and supplying bulk electricity to users with stable and predictable usage patterns. However, with the increasing penetration of renewable energy sources (RES), future energy systems will require greater flexibility and wider distribution of both demand and supply. Integrating RES on a large scale poses challenges to the hosting capacity of distribution systems. To address these challenges, the digitalization of energy systems through novel Information and Communication Technologies (ICT) infrastructure is essential. The shift from centralized to highly distributed systems necessitates increased coordination and communication efforts. This is because a distributed system is composed of multiple independent entities that need to communicate and collaborate effectively to accomplish a shared objective. Coordination and communication are necessary to ensure that the system is operating efficiently and effectively.
Traditional centralized cloud-based data exchange schemes depend on a single trusted third party, this may lead to single-point failure and lack of data privacy and access control. To overcome these issues, a novel approach is proposed for exchanging data within power systems using blockchain technology. This approach enables users to securely exchange data while maintaining ownership. The experiments conducted demonstrate that the proposed approach can handle more users and enables information and data exchange within power systems.
Secondly, this thesis proposes an Artificial Neural Network (ANN) based prediction model to optimize the performance of the blockchain-enabled data exchange approach. A use case for exchanging data within the power system is implemented on the proposed platform using various performance metrics. The results of the proposed approach are compared to two other schemes: the baseline scheme and an optimized scheme. The evaluation results indicate that the proposed approach can enhance network performance when compared to the baseline and optimized schemes.
In summary, the proposed novel approach to ICT infrastructure for successfully exchanging information and data within power systems entities. The performance of the novel approach is evaluated based on the ability to handle multiple users, scalability, reliability, and security
- …