67 research outputs found

    Big Data and Large-scale Data Analytics: Efficiency of Sustainable Scalability and Security of Centralized Clouds and Edge Deployment Architectures

    Get PDF
    One of the significant shifts of the next-generation computing technologies will certainly be in the development of Big Data (BD) deployment architectures. Apache Hadoop, the BD landmark, evolved as a widely deployed BD operating system. Its new features include federation structure and many associated frameworks, which provide Hadoop 3.x with the maturity to serve different markets. This dissertation addresses two leading issues involved in exploiting BD and large-scale data analytics realm using the Hadoop platform. Namely, (i)Scalability that directly affects the system performance and overall throughput using portable Docker containers. (ii) Security that spread the adoption of data protection practices among practitioners using access controls. An Enhanced Mapreduce Environment (EME), OPportunistic and Elastic Resource Allocation (OPERA) scheduler, BD Federation Access Broker (BDFAB), and a Secure Intelligent Transportation System (SITS) of multi-tiers architecture for data streaming to the cloud computing are the main contribution of this thesis study

    Adaptive Dispatching of Tasks in the Cloud

    Full text link
    The increasingly wide application of Cloud Computing enables the consolidation of tens of thousands of applications in shared infrastructures. Thus, meeting the quality of service requirements of so many diverse applications in such shared resource environments has become a real challenge, especially since the characteristics and workload of applications differ widely and may change over time. This paper presents an experimental system that can exploit a variety of online quality of service aware adaptive task allocation schemes, and three such schemes are designed and compared. These are a measurement driven algorithm that uses reinforcement learning, secondly a "sensible" allocation algorithm that assigns jobs to sub-systems that are observed to provide a lower response time, and then an algorithm that splits the job arrival stream into sub-streams at rates computed from the hosts' processing capabilities. All of these schemes are compared via measurements among themselves and with a simple round-robin scheduler, on two experimental test-beds with homogeneous and heterogeneous hosts having different processing capacities.Comment: 10 pages, 9 figure

    Cloud Services and Application Opportunities

    Get PDF
    This paper presents a latest vision of cloud computing and identities various commercially available cloud services promising to deliver the infrastructure on demand; detines Cloud computing and provides the architectural detail and different types of clouds such as Blue Cloud built on IBM's massive scale computing initiatives, Google Cloud claimed that business can get started using Google Apps online pretty much instantly, salesforce.com cloud architecture consists of development as services, a set of development tools and APIs that enablesenterprisedeveloperstoeasilyharnessthepromise of the cloud computing. Cloud computing is changing the way we provision hardware and software for on-demand capacity fultillment and changing the way we develop web applications and make businessdecisions

    Optimizing the performance of optimization in the cloud environment–An intelligent auto-scaling approach

    Get PDF
    The cloud computing paradigm has gained wide acceptance in the scientific community, taking a significant share from fields previously reserved exclusively for High Performance Computing (HPC). On-demand access to a large amount of computing resources provided by Cloud makes it ideal for executing large-scale optimizations using evolutionary algorithms without the need for owning any computing infrastructure. In this regard, we extended WoBinGO, an existing parallel software framework for genetic algorithm based optimization, to be used in Cloud. With these extensions, the framework is capable of elastically and frugally utilizing the underlying cloud computing infrastructure for performing computationally expensive fitness evaluations. We studied two issues that are pertinent when dealing with large-scale optimization in the elastic cloud environment: the computing instance launching overhead and the price of engaging Cloud for solving optimization problems, in terms of the instances’ cumulative uptime. To explain the usability limits of WoBinGO framework running in the IaaS environment, a comprehensive analysis of the framework’s performance was given. Optimization of both total optimization time and total cumulative uptime, leads to minimizing the cost of cloud resources utilization. In this way, we are proposing an intelligent decision support engine based on artificial neural networks and metaheuristics to provide the user with an assessment of the framework’s behavior on the underlying infrastructure in terms of optimization duration and the cost of resource consumption. According to a given assessment, the user can decide upon faster delivery of results or lower infrastructure costs. The proposed software framework has been used to solve a complex real-world optimization problem of a subsurface rock mass model calibration. The results obtained from the private OpenStack deployment show that by using the proposed decision support engine, significant savings can be achieved in both optimization time and optimization cost

    M-Grid : A distributed framework for multidimensional indexing and querying of location based big data

    Get PDF
    The widespread use of mobile devices and the real time availability of user-location information is facilitating the development of new personalized, location-based applications and services (LBSs). Such applications require multi-attribute query processing, handling of high access scalability, support for millions of users, real time querying capability and analysis of large volumes of data. Cloud computing aided a new generation of distributed databases commonly known as key-value stores. Key-value stores were designed to extract value from very large volumes of data while being highly available, fault-tolerant and scalable, hence providing much needed features to support LBSs. However complex queries on multidimensional data cannot be processed efficiently as they do not provide means to access multiple attributes. In this thesis we present MGrid, a unifying indexing framework which enables key-value stores to support multidimensional queries. We organize a set of nodes in a P-Grid overlay network which provides fault-tolerance and efficient query processing. We use Hilbert Space Filling Curve based linearization technique which preserves the data locality to efficiently manage multi-dimensional data in a key-value store. We propose algorithms to dynamically process range and k nearest neighbor (kNN) queries on linearized values. This removes the overhead of maintaining a separate index table. Our approach is completely independent from the underlying storage layer and can be implemented on any cloud infrastructure. Experiments on Amazon EC2 show that MGrid achieves a performance improvement of three orders of magnitude in comparison to MapReduce and four times to that of MDHBase scheme --Abstract, pages iii-iv

    Efficient resource sharing for big data applications in shared clusters

    Get PDF
    Modern data centers are shifting to shared clusters where the resources are shared among multiple users and frameworks. A key enabler for such shared clusters is a cluster resource management system which allocates resources among different frameworks. One key problem in these shared clusters is how to efficiently share cluster resources between multiple applications and users in an elastic and non-disruptive manner. Current cluster schedulers typically utilize kill-based preemption to coordinate resource sharing, achieve fairness and satisfy SLOs during resource contention by simply killing low priority jobs and restarting them later when resources are available. This simple preemption policy ensures fast service times of high priority jobs and prevents a single user/application from occupying too many resources and starving others; however, without saving the progress of preempted jobs, this policy causes significant resource waste and delays the response time of long running or low priority jobs. The issue of dynamic resource sharing becomes even more problematic when there are different types of applications running on the same cluster (e.g., batch processing systems running alongside real-time streaming systems). Different application types will often have varying quality of service metrics (e.g., higher throughput versus lower latency) which can make resource sharing among these applications contentious. In this dissertation, we show the impact of kill-based preemption in modern shared clusters and propose two solutions to more efficiently share resources in shared cluster environments by utilizing checkpoint-based preemption and supporting elasticity in distributed data stream processing systems.Ph.D
    corecore