61,083 research outputs found

    An elastic software architecture for extreme-scale big data analytics

    Get PDF
    This chapter describes a software architecture for processing big-data analytics considering the complete compute continuum, from the edge to the cloud. The new generation of smart systems requires processing a vast amount of diverse information from distributed data sources. The software architecture presented in this chapter addresses two main challenges. On the one hand, a new elasticity concept enables smart systems to satisfy the performance requirements of extreme-scale analytics workloads. By extending the elasticity concept (known at cloud side) across the compute continuum in a fog computing environment, combined with the usage of advanced heterogeneous hardware architectures at the edge side, the capabilities of the extreme-scale analytics can significantly increase, integrating both responsive data-in-motion and latent data-at-rest analytics into a single solution. On the other hand, the software architecture also focuses on the fulfilment of the non-functional properties inherited from smart systems, such as real-time, energy-efficiency, communication quality and security, that are of paramount importance for many application domains such as smart cities, smart mobility and smart manufacturing.The research leading to these results has received funding from the European Union’s Horizon 2020 Programme under the ELASTIC Project (www.elastic-project.eu), grant agreement No 825473.Peer ReviewedPostprint (published version

    Reproducible and Portable Big Data Analytics in the Cloud

    Full text link
    Cloud computing has become a major approach to help reproduce computational experiments because it supports on-demand hardware and software resource provisioning. Yet there are still two main difficulties in reproducing big data applications in the cloud. The first is how to automate end-to-end execution of analytics including environment provisioning, analytics pipeline description, pipeline execution, and resource termination. The second is that an application developed for one cloud is difficult to be reproduced in another cloud, a.k.a. vendor lock-in problem. To tackle these problems, we leverage serverless computing and containerization techniques for automated scalable execution and reproducibility, and utilize the adapter design pattern to enable application portability and reproducibility across different clouds. We propose and develop an open-source toolkit that supports 1) fully automated end-to-end execution and reproduction via a single command, 2) automated data and configuration storage for each execution, 3) flexible client modes based on user preferences, 4) execution history query, and 5) simple reproduction of existing executions in the same environment or a different environment. We did extensive experiments on both AWS and Azure using four big data analytics applications that run on virtual CPU/GPU clusters. The experiments show our toolkit can achieve good execution performance, scalability, and efficient reproducibility for cloud-based big data analytics

    Navigating in Numerous Video Data: User Interface Design for an On-Camera Video Analytics Engine

    Get PDF
    Video analytics powered by artificial intelligence shows high promise in making our society smarter. Harnessing large amounts of video data, however, requires the development of processing systems demonstrating high performance and high efficiency. To this end, this work has contributed to a video analytics system powered by artificial intelligence for object detection and recognition. Rather than streaming all the video frames to the cloud, the system analyzes images on-camera and only returns those of interest to the cloud. This edge analytics research-grade software is available, but it lacks a simple web interface for general use by scientists, engineers, and other experts. To make the system versatile and user-friendly, a web interface was developed for receiving user queries (including parameters such as camera geolocations, video time span, and object class of interest) and presenting the query results. This web interface will make our video analytics system more accessible to domain experts (those in law enforcement, health care, environmental monitoring, etc.)

    E2: a framework for NFV applications

    Get PDF
    By moving network appliance functionality from proprietary hardware to software, Network Function Virtualization promises to bring the advantages of cloud computing to network packet processing. However, the evolution of cloud computing (particularly for data analytics) has greatly bene- fited from application-independent methods for scaling and placement that achieve high efficiency while relieving programmers of these burdens. NFV has no such general management solutions. In this paper, we present a scalable and application-agnostic scheduling framework for packet processing, and compare its performance to current approaches

    Towards a flexible data stream analytics platform based on the GCM autonomous software component technology

    Get PDF
    International audienceBig data stream analytics platforms not only need to support performance-dictated elasticity benefiting for instance from Cloud environments. They should also support analytics that can evolve dynamically from the application viewpoint, given data nature can change so the necessary treatments on them. The benefit is that this can avoid to undeploy the current analytics, modify it off-line, redeploy the new version, and resume the analysis, missing data that arrived in the meantime. We also believe that such evolution should better be driven by autonomic behaviors whenever possible. We argue that a software component based technology, as the one we have developed so far, GCM/ProActive, can be a good fit to these needs. Using it, we present our solution, still under development, named GCM-streaming, which to our knowledge seems to be quite original

    MERRA Analytic Services: Meeting the Big Data Challenges of Climate Science Through Cloud-enabled Climate Analytics-as-a-service

    Get PDF
    Climate science is a Big Data domain that is experiencing unprecedented growth. In our efforts to address the Big Data challenges of climate science, we are moving toward a notion of Climate Analytics-as-a-Service (CAaaS). We focus on analytics, because it is the knowledge gained from our interactions with Big Data that ultimately produce societal benefits. We focus on CAaaS because we believe it provides a useful way of thinking about the problem: a specialization of the concept of business process-as-a-service, which is an evolving extension of IaaS, PaaS, and SaaS enabled by Cloud Computing. Within this framework, Cloud Computing plays an important role; however, we it see it as only one element in a constellation of capabilities that are essential to delivering climate analytics as a service. These elements are essential because in the aggregate they lead to generativity, a capacity for self-assembly that we feel is the key to solving many of the Big Data challenges in this domain. MERRA Analytic Services (MERRAAS) is an example of cloud-enabled CAaaS built on this principle. MERRAAS enables MapReduce analytics over NASAs Modern-Era Retrospective Analysis for Research and Applications (MERRA) data collection. The MERRA reanalysis integrates observational data with numerical models to produce a global temporally and spatially consistent synthesis of 26 key climate variables. It represents a type of data product that is of growing importance to scientists doing climate change research and a wide range of decision support applications. MERRAAS brings together the following generative elements in a full, end-to-end demonstration of CAaaS capabilities: (1) high-performance, data proximal analytics, (2) scalable data management, (3) software appliance virtualization, (4) adaptive analytics, and (5) a domain-harmonized API. The effectiveness of MERRAAS has been demonstrated in several applications. In our experience, Cloud Computing lowers the barriers and risk to organizational change, fosters innovation and experimentation, facilitates technology transfer, and provides the agility required to meet our customers' increasing and changing needs. Cloud Computing is providing a new tier in the data services stack that helps connect earthbound, enterprise-level data and computational resources to new customers and new mobility-driven applications and modes of work. For climate science, Cloud Computing's capacity to engage communities in the construction of new capabilies is perhaps the most important link between Cloud Computing and Big Data
    • …
    corecore