Search CORE

31,848 research outputs found

ECHO: An Adaptive Orchestration Platform for Hybrid Dataflows across Cloud and Edge

Author: Khochare Aakash
Ravindra Pushkara
Reddy Siva Prakash
Sharma Sarthak
Simmhan Yogesh
Varshney Prateeksha
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/07/2017
Field of study

The Internet of Things (IoT) is offering unprecedented observational data that are used for managing Smart City utilities. Edge and Fog gateway devices are an integral part of IoT deployments to acquire real-time data and enact controls. Recently, Edge-computing is emerging as first-class paradigm to complement Cloud-centric analytics. But a key limitation is the lack of a platform-as-a-service for applications spanning Edge and Cloud. Here, we propose ECHO, an orchestration platform for dataflows across distributed resources. ECHO's hybrid dataflow composition can operate on diverse data models -- streams, micro-batches and files, and interface with native runtime engines like TensorFlow and Storm to execute them. It manages the application's lifecycle, including container-based deployment and a registry for state management. ECHO can schedule the dataflow on different Edge, Fog and Cloud resources, and also perform dynamic task migration between resources. We validate the ECHO platform for executing video analytics and sensor streams for Smart Traffic and Smart Utility applications on Raspberry Pi, NVidia TX1, ARM64 and Azure Cloud VM resources, and present our results.Comment: 17 pages, 5 figures, 2 tables, submitted to ICSOC-201

arXiv.org e-Print Archive

Real World Applications of Machine Learning Techniques over Large Mobile Subscriber Datasets

Author: Chaudhury Santanu
Kachappilly Chitharanj
Kapadia Prateek
Mohan Rakesh
Soman Arun
Wilson Jobin
Publication venue
Publication date: 08/02/2015
Field of study

Communication Service Providers (CSPs) are in a unique position to utilize their vast transactional data assets generated from interactions of subscribers with network elements as well as with other subscribers. CSPs could leverage its data assets for a gamut of applications such as service personalization, predictive offer management, loyalty management, revenue forecasting, network capacity planning, product bundle optimization and churn management to gain significant competitive advantage. However, due to the sheer data volume, variety, velocity and veracity of mobile subscriber datasets, sophisticated data analytics techniques and frameworks are necessary to derive actionable insights in a useable timeframe. In this paper, we describe our journey from a relational database management system (RDBMS) based campaign management solution which allowed data scientists and marketers to use hand-written rules for service personalization and targeted promotions to a distributed Big Data Analytics platform, capable of performing large scale machine learning and data mining to deliver real time service personalization, predictive modelling and product optimization. Our work involves a careful blend of technology, processes and best practices, which facilitate man-machine collaboration and continuous experimentation to derive measurable economic value from data. Our platform has a reach of more than 500 million mobile subscribers worldwide, delivering over 1 billion personalized recommendations annually, processing a total data volume of 64 Petabytes, corresponding to 8.5 trillion events.Comment: SE4ML: Software Engineering for Machine Learning (NIPS 2014 Workshop) https://sites.google.com/site/software4ml/accepted-paper

arXiv.org e-Print Archive

Edge Intelligence: Paving the Last Mile of Artificial Intelligence with Edge Computing

Author: Chen Xu
Li En
Luo Ke
Zeng Liekang
Zhang Junshan
Zhou Zhi
Publication venue
Publication date: 24/05/2019
Field of study

With the breakthroughs in deep learning, the recent years have witnessed a booming of artificial intelligence (AI) applications and services, spanning from personal assistant to recommendation systems to video/audio surveillance. More recently, with the proliferation of mobile computing and Internet-of-Things (IoT), billions of mobile and IoT devices are connected to the Internet, generating zillions Bytes of data at the network edge. Driving by this trend, there is an urgent need to push the AI frontiers to the network edge so as to fully unleash the potential of the edge big data. To meet this demand, edge computing, an emerging paradigm that pushes computing tasks and services from the network core to the network edge, has been widely recognized as a promising solution. The resulted new inter-discipline, edge AI or edge intelligence, is beginning to receive a tremendous amount of interest. However, research on edge intelligence is still in its infancy stage, and a dedicated venue for exchanging the recent advances of edge intelligence is highly desired by both the computer system and artificial intelligence communities. To this end, we conduct a comprehensive survey of the recent research efforts on edge intelligence. Specifically, we first review the background and motivation for artificial intelligence running at the network edge. We then provide an overview of the overarching architectures, frameworks and emerging key technologies for deep learning model towards training/inference at the network edge. Finally, we discuss future research opportunities on edge intelligence. We believe that this survey will elicit escalating attentions, stimulate fruitful discussions and inspire further research ideas on edge intelligence.Comment: Zhi Zhou, Xu Chen, En Li, Liekang Zeng, Ke Luo, and Junshan Zhang, "Edge Intelligence: Paving the Last Mile of Artificial Intelligence with Edge Computing," Proceedings of the IEE

arXiv.org e-Print Archive

The Role of Big Data Analytics in Industrial Internet of Things

Author: Imran Muhammad
Jayaraman Prem Prakash
Perera Charith
Rehman Muhammad Habib ur
Salah Khaled
Yaqoob Ibrar
Publication venue
Publication date: 11/04/2019
Field of study

Big data production in industrial Internet of Things (IIoT) is evident due to the massive deployment of sensors and Internet of Things (IoT) devices. However, big data processing is challenging due to limited computational, networking and storage resources at IoT device-end. Big data analytics (BDA) is expected to provide operational- and customer-level intelligence in IIoT systems. Although numerous studies on IIoT and BDA exist, only a few studies have explored the convergence of the two paradigms. In this study, we investigate the recent BDA technologies, algorithms and techniques that can lead to the development of intelligent IIoT systems. We devise a taxonomy by classifying and categorising the literature on the basis of important parameters (e.g. data sources, analytics tools, analytics techniques, requirements, industrial analytics applications and analytics types). We present the frameworks and case studies of the various enterprises that have benefited from BDA. We also enumerate the considerable opportunities introduced by BDA in IIoT.We identify and discuss the indispensable challenges that remain to be addressed as future research directions as well

arXiv.org e-Print Archive

Role of Apache Software Foundation in Big Data Projects

Author: Akhtar Aleem
Publication venue
Publication date: 05/05/2020
Field of study

With the increase in amount of Big Data being generated each year, tools and technologies developed and used for the purpose of storing, processing and analyzing Big Data has also improved. Open-Source software has been an important factor in the success and innovation in the field of Big Data while Apache Software Foundation (ASF) has played a crucial role in this success and innovation by providing a number of state-of-the-art projects, free and open to the public. ASF has classified its project in different categories. In this report, projects listed under Big Data category are deeply analyzed and discussed with reference to one-of-the seven sub-categories defined. Our investigation has shown that many of the Apache Big Data projects are autonomous but some are built based on other Apache projects and some work in conjunction with other projects to improve and ease development in Big Data space

arXiv.org e-Print Archive

Characterizing Application Scheduling on Edge, Fog and Cloud Computing Resources

Author: Simmhan Yogesh
Varshney Prateeksha
Publication venue: 'Wiley'
Publication date: 22/04/2019
Field of study

Cloud computing has grown to become a popular distributed computing service offered by commercial providers. More recently, Edge and Fog computing resources have emerged on the wide-area network as part of Internet of Things (IoT) deployments. These three resource abstraction layers are complementary, and provide distinctive benefits. Scheduling applications on clouds has been an active area of research, with workflow and dataflow models serving as a flexible abstraction to specify applications for execution. However, the application programming and scheduling models for edge and fog are still maturing, and can benefit from learnings on cloud resources. At the same time, there is also value in using these resources cohesively for application execution. In this article, we present a taxonomy of concepts essential for specifying and solving the problem of scheduling applications on edge, for and cloud computing resources. We first characterize the resource capabilities and limitations of these infrastructure, and design a taxonomy of application models, Quality of Service (QoS) constraints and goals, and scheduling techniques, based on a literature review. We also tabulate key research prototypes and papers using this taxonomy. This survey benefits developers and researchers on these distributed resources in designing and categorizing their applications, selecting the relevant computing abstraction(s), and developing or selecting the appropriate scheduling algorithm. It also highlights gaps in literature where open problems remain.Comment: Pre-print of journal article: Varshney P, Simmhan Y. Characterizing application scheduling on edge, fog, and cloud computing resources. Softw: Pract Exper. 2019; 1--37. https://doi.org/10.1002/spe.269

arXiv.org e-Print Archive

MacroBase: Prioritizing Attention in Fast Data

Author: Bailis Peter
Gan Edward
Madden Samuel
Narayanan Deepak
Rong Kexin
Suri Sahaana
Publication venue
Publication date: 24/03/2017
Field of study

As data volumes continue to rise, manual inspection is becoming increasingly untenable. In response, we present MacroBase, a data analytics engine that prioritizes end-user attention in high-volume fast data streams. MacroBase enables efficient, accurate, and modular analyses that highlight and aggregate important and unusual behavior, acting as a search engine for fast data. MacroBase is able to deliver order-of-magnitude speedups over alternatives by optimizing the combination of explanation and classification tasks and by leveraging a new reservoir sampler and heavy-hitters sketch specialized for fast data streams. As a result, MacroBase delivers accurate results at speeds of up to 2M events per second per query on a single core. The system has delivered meaningful results in production, including at a telematics company monitoring hundreds of thousands of vehicles.Comment: SIGMOD 201

arXiv.org e-Print Archive

Scaling Video Analytics Systems to Large Camera Deployments

Author: Ananthanarayanan Ganesh
Gonzalez Joseph E.
Jain Samvit
Jiang Junchen
Shu Yuanchao
Publication venue
Publication date: 05/07/2019
Field of study

Driven by advances in computer vision and the falling costs of camera hardware, organizations are deploying video cameras en masse for the spatial monitoring of their physical premises. Scaling video analytics to massive camera deployments, however, presents a new and mounting challenge, as compute cost grows proportionally to the number of camera feeds. This paper is driven by a simple question: can we scale video analytics in such a way that cost grows sublinearly, or even remains constant, as we deploy more cameras, while inference accuracy remains stable, or even improves. We believe the answer is yes. Our key observation is that video feeds from wide-area camera deployments demonstrate significant content correlations (e.g. to other geographically proximate feeds), both in space and over time. These spatio-temporal correlations can be harnessed to dramatically reduce the size of the inference search space, decreasing both workload and false positive rates in multi-camera video analytics. By discussing use-cases and technical challenges, we propose a roadmap for scaling video analytics to large camera networks, and outline a plan for its realization.Comment: HotMobile 201

arXiv.org e-Print Archive

Infrastructure for Usable Machine Learning: The Stanford DAWN Project

Author: Bailis Peter
Olukotun Kunle
Re Christopher
Zaharia Matei
Publication venue
Publication date: 08/06/2017
Field of study

Despite incredible recent advances in machine learning, building machine learning applications remains prohibitively time-consuming and expensive for all but the best-trained, best-funded engineering organizations. This expense comes not from a need for new and improved statistical models but instead from a lack of systems and tools for supporting end-to-end machine learning application development, from data preparation and labeling to productionization and monitoring. In this document, we outline opportunities for infrastructure supporting usable, end-to-end machine learning applications in the context of the nascent DAWN (Data Analytics for What's Next) project at Stanford

arXiv.org e-Print Archive

Big Data Systems Meet Machine Learning Challenges: Towards Big Data Science as a Service

Author: Elshawi Radwa
Sakr Sherif
Publication venue
Publication date: 21/09/2017
Field of study

Recently, we have been witnessing huge advancements in the scale of data we routinely generate and collect in pretty much everything we do, as well as our ability to exploit modern technologies to process, analyze and understand this data. The intersection of these trends is what is called, nowadays, as Big Data Science. Cloud computing represents a practical and cost-effective solution for supporting Big Data storage, processing and for sophisticated analytics applications. We analyze in details the building blocks of the software stack for supporting big data science as a commodity service for data scientists. We provide various insights about the latest ongoing developments and open challenges in this domain

arXiv.org e-Print Archive