46,993 research outputs found

    Big Data Application and System Co-optimization in Cloud and HPC Environment

    Get PDF
    The emergence of big data requires powerful computational resources and memory subsystems that can be scaled efficiently to accommodate its demands. Cloud is a new well-established computing paradigm that can offer customized computing and memory resources to meet the scalable demands of big data applications. In addition, the flexible pay-as-you-go pricing model offers opportunities for using large scale of resources with low cost and no infrastructure maintenance burdens. High performance computing (HPC) on the other hand also has powerful infrastructure that has potential to support big data applications. In this dissertation, we explore the application and system co-optimization opportunities to support big data in both cloud and HPC environments. Specifically, we explore the unique features of both application and system to seek overlooked optimization opportunities or tackle challenges that are difficult to be addressed by only looking at the application or system individually. Based on the characteristics of the workloads and their underlying systems to derive the optimized deployment and runtime schemes, we divide the workflow into four categories: 1) memory intensive applications; 2) compute intensive applications; 3) both memory and compute intensive applications; 4) I/O intensive applications.When deploying memory intensive big data applications to the public clouds, one important yet challenging problem is selecting a specific instance type whose memory capacity is large enough to prevent out-of-memory errors while the cost is minimized without violating performance requirements. In this dissertation, we propose two techniques for efficient deployment of big data applications with dynamic and intensive memory footprint in the cloud. The first approach builds a performance-cost model that can accurately predict how, and by how much, virtual memory size would slow down the application and consequently, impact the overall monetary cost. The second approach employs a lightweight memory usage prediction methodology based on dynamic meta-models adjusted by the application's own traits. The key idea is to eliminate the periodical checkpointing and migrate the application only when the predicted memory usage exceeds the physical allocation. When applying compute intensive applications to the clouds, it is critical to make the applications scalable so that it can benefit from the massive cloud resources. In this dissertation, we first use the Kirchhoff law, which is one of the most widely used physical laws in many engineering principles, as an example workload for our study. The key challenge of applying the Kirchhoff law to real-world applications at scale lies in the high, if not prohibitive, computational cost to solve a large number of nonlinear equations. In this dissertation, we propose a high-performance deep-learning-based approach for Kirchhoff analysis, namely HDK. HDK employs two techniques to improve the performance: (i) early pruning of unqualified input candidates which simplify the equation and select a meaningful input data range; (ii) parallelization of forward labelling which execute steps of the problem in parallel. When it comes to both memory and compute intensive applications in clouds, we use blockchain system as a benchmark. Existing blockchain frameworks exhibit a technical barrier for many users to modify or test out new research ideas in blockchains. To make it worse, many advantages of blockchain systems can be demonstrated only at large scales, which are not always available to researchers. In this dissertation, we develop an accurate and efficient emulating system to replay the execution of large-scale blockchain systems on tens of thousands of nodes in the cloud. For I/O intensive applications, we observe one important yet often neglected side effect of lossy scientific data compression. Lossy compression techniques have demonstrated promising results in significantly reducing the scientific data size while guaranteeing the compression error bounds, but the compressed data size is often highly skewed and thus impact the performance of parallel I/O. Therefore, we believe it is critical to pay more attention to the unbalanced parallel I/O caused by lossy scientific data compression

    Cloud adoption for organisations in the eThekwini area.

    Get PDF
    Masters Degree. University of KwaZulu-Natal, Durban.Cloud computing is a computing model that enables developing countries to open new business ventures without having to spend extensive amounts of money in upfront capital investment; “cloud computing is a practical approach to experience direct cost benefits, and it has the potential to transform a data centre from a capital-intensive set up to a variable priced environment. The main character of cloud computing is in the virtualization, distribution and dynamically extendibility” (Chauhan, 2012, p. 1). Of all the models that utilise the network as means for delivering computing resources, cloud computing is the best one yet; the cloud is more scalable and allows consumers to add and remove resources as their computational needs change without impacting business processes (Nuseibeh, 2011). There are other opportunities that organisations stand to benefit from cloud computing adoption, but in spite of all the opportunities, the rate at which organisations are adopting cloud Computing is increasing at a slower pace than expected in South Africa. From the statistics released in 2018 by the Business Software Alliance (BSA) and Global Cloud Computing Scorecard, it was highlighted that South Africa had fallen behind in its efforts to adopt Cloud Computing and different reasons were highlighted as a cause of this lag (BSA, 2018). This research study aimed to investigate potential issues that impacted the organisation's desire to adopt the cloud resulting in the low adoption rate. The technology-organisation-environment (TOE) framework was the framework that was used in this research study. Four research questions were developed as part of achieving the objectives for this research study. A sample of organisations in the KwaZulu-Natal province was identified for this research study using the convenience sampling technique and an online survey hosted in Survey Monkey was sent out to the selected organisations. The collected data were analysed using SPSS tools. After analysis was performed on the data, it was found that most challenges that organisations faced were from external factors like infrastructure readiness, which organisations had no control over. Internal challenges also affected the organisation’s adoption and usage of the cloud, but when data was grouped according to either belonging to the internal or external group, it was found that external issues affected organisations more than internal issues

    Parallel programming paradigms and frameworks in big data era

    Get PDF
    With Cloud Computing emerging as a promising new approach for ad-hoc parallel data processing, major companies have started to integrate frameworks for parallel data processing in their product portfolio, making it easy for customers to access these services and to deploy their programs. We have entered the Era of Big Data. The explosion and profusion of available data in a wide range of application domains rise up new challenges and opportunities in a plethora of disciplines-ranging from science and engineering to biology and business. One major challenge is how to take advantage of the unprecedented scale of data-typically of heterogeneous nature-in order to acquire further insights and knowledge for improving the quality of the offered services. To exploit this new resource, we need to scale up and scale out both our infrastructures and standard techniques. Our society is already data-rich, but the question remains whether or not we have the conceptual tools to handle it. In this paper we discuss and analyze opportunities and challenges for efficient parallel data processing. Big Data is the next frontier for innovation, competition, and productivity, and many solutions continue to appear, partly supported by the considerable enthusiasm around the MapReduce paradigm for large-scale data analysis. We review various parallel and distributed programming paradigms, analyzing how they fit into the Big Data era, and present modern emerging paradigms and frameworks. To better support practitioners interesting in this domain, we end with an analysis of on-going research challenges towards the truly fourth generation data-intensive science.Peer ReviewedPostprint (author's final draft

    E-MDAV: A Framework for Developing Data-Intensive Web Applications

    Get PDF
    The ever-increasing adoption of innovative technologies, such as big data and cloud computing, provides significant opportunities for organizations operating in the IT domain, but also introduces considerable challenges. Such innovations call for development processes that better align with stakeholders needs and expectations. In this respect, this paper introduces a development framework based on the OMG's Model Driven Architecture (MDA) that aims to support the development lifecycle of data-intensive web applications. The proposed framework, named E-MDAV (Extended MDA-VIEW), defines a methodology that exploits a chain of model transformations to effectively cope with both forward- and reverse-engineering aspects. In addition, E-MDAV includes the specification of a reference architecture for driving the implementation of a tool that supports the various professional roles involved in the development and maintenance of data-intensive web applications. In order to evaluate the effectiveness of the proposed E-MDAV framework, a tool prototype has been developed. E-MDAV has then been applied to two different application scenarios and the obtained results have been compared with historical data related to the implementation of similar development projects, in order to measure and discuss the benefits of the proposed approach

    A massive simultaneous cloud computing platform for OpenFOAM

    Get PDF
    Today the field of numerical simulation in is faced with increasing demands for data-intensive investigations. On the one hand Engineering tasks call for parameter-studies, sensitivity analysis and optimization runs of ever-increasing size and magnitude. In addition the field of Artificial Intelligence (AI) with its notorious hunger for data, urges to provide ever more extensive, numerically derived learning-, testing- and validation input for training e.g. Artificial Neural Networks (ANN). On the other hand the current ‘age of cloud computing’ has set the stage such that nowadays any user of simulation software has access to potentially limitless hardware resources. In the light of these challenges and opportunities, Zurich University of Applied Sciences (ZHAW) and Kaleidosim Technologies AG (Kaleidosim) have developed a publically available Massive Simultaneous Cloud Computing (MSCC) platform for OpenFOAM. The platform is specifically tailored to yield vast amounts of simulation data in minimal Wall Clock Time (WCT). Spanning approximately nine-man-years of development effort the platform now features: • An instructive web-browser-based user interface (Web Interface); • An Application Programming Interface (API); • A Self-Compile option enabling users to run self-composed OpenFOAM applications directly in the cloud; • The Massive Simultaneous Cloud Computing (MSCC) feature which allows the orchestration of up to 500 cloud-based OpenFOAM simulation runs simultaneously; • The option to run Paraview in Batch Mode such that (semi-) automated cloud-based post-processing can be performed; • The Katana File Downloader (KFD) allowing the selective download of specific output dat

    Fog based intelligent transportation big data analytics in the internet of vehicles environment: motivations, architecture, challenges, and critical issues

    Get PDF
    The intelligent transportation system (ITS) concept was introduced to increase road safety, manage traffic efficiently, and preserve our green environment. Nowadays, ITS applications are becoming more data-intensive and their data are described using the '5Vs of Big Data'. Thus, to fully utilize such data, big data analytics need to be applied. The Internet of vehicles (IoV) connects the ITS devices to cloud computing centres, where data processing is performed. However, transferring huge amount of data from geographically distributed devices creates network overhead and bottlenecks, and it consumes the network resources. In addition, following the centralized approach to process the ITS big data results in high latency which cannot be tolerated by the delay-sensitive ITS applications. Fog computing is considered a promising technology for real-time big data analytics. Basically, the fog technology complements the role of cloud computing and distributes the data processing at the edge of the network, which provides faster responses to ITS application queries and saves the network resources. However, implementing fog computing and the lambda architecture for real-time big data processing is challenging in the IoV dynamic environment. In this regard, a novel architecture for real-time ITS big data analytics in the IoV environment is proposed in this paper. The proposed architecture merges three dimensions including intelligent computing (i.e. cloud and fog computing) dimension, real-time big data analytics dimension, and IoV dimension. Moreover, this paper gives a comprehensive description of the IoV environment, the ITS big data characteristics, the lambda architecture for real-time big data analytics, several intelligent computing technologies. More importantly, this paper discusses the opportunities and challenges that face the implementation of fog computing and real-time big data analytics in the IoV environment. Finally, the critical issues and future research directions section discusses some issues that should be considered in order to efficiently implement the proposed architecture
    corecore