61,083 research outputs found
An elastic software architecture for extreme-scale big data analytics
This chapter describes a software architecture for processing big-data analytics considering the complete compute continuum, from the edge to the cloud. The new generation of smart systems requires processing a vast amount of diverse information from distributed data sources. The software architecture presented in this chapter addresses two main challenges. On the one hand, a new elasticity concept enables smart systems to satisfy the performance requirements of extreme-scale analytics workloads. By extending the elasticity concept (known at cloud side) across the compute continuum in a fog computing environment, combined with the usage of advanced heterogeneous hardware architectures at the edge side, the capabilities of the extreme-scale analytics can significantly increase, integrating both responsive data-in-motion and latent data-at-rest analytics into a single solution. On the other hand, the software architecture also focuses on the fulfilment of the non-functional properties inherited from smart systems, such as real-time, energy-efficiency, communication quality and security, that are of paramount importance for many application domains such as smart cities, smart mobility and smart manufacturing.The research leading to these results has received funding from the European Union’s Horizon 2020 Programme under the ELASTIC Project (www.elastic-project.eu), grant agreement No 825473.Peer ReviewedPostprint (published version
Reproducible and Portable Big Data Analytics in the Cloud
Cloud computing has become a major approach to help reproduce computational
experiments because it supports on-demand hardware and software resource
provisioning. Yet there are still two main difficulties in reproducing big data
applications in the cloud. The first is how to automate end-to-end execution of
analytics including environment provisioning, analytics pipeline description,
pipeline execution, and resource termination. The second is that an application
developed for one cloud is difficult to be reproduced in another cloud, a.k.a.
vendor lock-in problem. To tackle these problems, we leverage serverless
computing and containerization techniques for automated scalable execution and
reproducibility, and utilize the adapter design pattern to enable application
portability and reproducibility across different clouds. We propose and develop
an open-source toolkit that supports 1) fully automated end-to-end execution
and reproduction via a single command, 2) automated data and configuration
storage for each execution, 3) flexible client modes based on user preferences,
4) execution history query, and 5) simple reproduction of existing executions
in the same environment or a different environment. We did extensive
experiments on both AWS and Azure using four big data analytics applications
that run on virtual CPU/GPU clusters. The experiments show our toolkit can
achieve good execution performance, scalability, and efficient reproducibility
for cloud-based big data analytics
Navigating in Numerous Video Data: User Interface Design for an On-Camera Video Analytics Engine
Video analytics powered by artificial intelligence shows high promise in making our society smarter. Harnessing large amounts of video data, however, requires the development of processing systems demonstrating high performance and high efficiency. To this end, this work has contributed to a video analytics system powered by artificial intelligence for object detection and recognition. Rather than streaming all the video frames to the cloud, the system analyzes images on-camera and only returns those of interest to the cloud. This edge analytics research-grade software is available, but it lacks a simple web interface for general use by scientists, engineers, and other experts. To make the system versatile and user-friendly, a web interface was developed for receiving user queries (including parameters such as camera geolocations, video time span, and object class of interest) and presenting the query results. This web interface will make our video analytics system more accessible to domain experts (those in law enforcement, health care, environmental monitoring, etc.)
E2: a framework for NFV applications
By moving network appliance functionality from proprietary
hardware to software, Network Function Virtualization
promises to bring the advantages of cloud computing to
network packet processing. However, the evolution of cloud
computing (particularly for data analytics) has greatly bene-
fited from application-independent methods for scaling and
placement that achieve high efficiency while relieving programmers
of these burdens. NFV has no such general management
solutions. In this paper, we present a scalable and
application-agnostic scheduling framework for packet processing,
and compare its performance to current approaches
Towards a flexible data stream analytics platform based on the GCM autonomous software component technology
International audienceBig data stream analytics platforms not only need to support performance-dictated elasticity benefiting for instance from Cloud environments. They should also support analytics that can evolve dynamically from the application viewpoint, given data nature can change so the necessary treatments on them. The benefit is that this can avoid to undeploy the current analytics, modify it off-line, redeploy the new version, and resume the analysis, missing data that arrived in the meantime. We also believe that such evolution should better be driven by autonomic behaviors whenever possible. We argue that a software component based technology, as the one we have developed so far, GCM/ProActive, can be a good fit to these needs. Using it, we present our solution, still under development, named GCM-streaming, which to our knowledge seems to be quite original
MERRA Analytic Services: Meeting the Big Data Challenges of Climate Science Through Cloud-enabled Climate Analytics-as-a-service
Climate science is a Big Data domain that is experiencing unprecedented growth. In our efforts to address the Big Data challenges of climate science, we are moving toward a notion of Climate Analytics-as-a-Service (CAaaS). We focus on analytics, because it is the knowledge gained from our interactions with Big Data that ultimately produce societal benefits. We focus on CAaaS because we believe it provides a useful way of thinking about the problem: a specialization of the concept of business process-as-a-service, which is an evolving extension of IaaS, PaaS, and SaaS enabled by Cloud Computing. Within this framework, Cloud Computing plays an important role; however, we it see it as only one element in a constellation of capabilities that are essential to delivering climate analytics as a service. These elements are essential because in the aggregate they lead to generativity, a capacity for self-assembly that we feel is the key to solving many of the Big Data challenges in this domain. MERRA Analytic Services (MERRAAS) is an example of cloud-enabled CAaaS built on this principle. MERRAAS enables MapReduce analytics over NASAs Modern-Era Retrospective Analysis for Research and Applications (MERRA) data collection. The MERRA reanalysis integrates observational data with numerical models to produce a global temporally and spatially consistent synthesis of 26 key climate variables. It represents a type of data product that is of growing importance to scientists doing climate change research and a wide range of decision support applications. MERRAAS brings together the following generative elements in a full, end-to-end demonstration of CAaaS capabilities: (1) high-performance, data proximal analytics, (2) scalable data management, (3) software appliance virtualization, (4) adaptive analytics, and (5) a domain-harmonized API. The effectiveness of MERRAAS has been demonstrated in several applications. In our experience, Cloud Computing lowers the barriers and risk to organizational change, fosters innovation and experimentation, facilitates technology transfer, and provides the agility required to meet our customers' increasing and changing needs. Cloud Computing is providing a new tier in the data services stack that helps connect earthbound, enterprise-level data and computational resources to new customers and new mobility-driven applications and modes of work. For climate science, Cloud Computing's capacity to engage communities in the construction of new capabilies is perhaps the most important link between Cloud Computing and Big Data
- …