31,848 research outputs found
ECHO: An Adaptive Orchestration Platform for Hybrid Dataflows across Cloud and Edge
The Internet of Things (IoT) is offering unprecedented observational data
that are used for managing Smart City utilities. Edge and Fog gateway devices
are an integral part of IoT deployments to acquire real-time data and enact
controls. Recently, Edge-computing is emerging as first-class paradigm to
complement Cloud-centric analytics. But a key limitation is the lack of a
platform-as-a-service for applications spanning Edge and Cloud. Here, we
propose ECHO, an orchestration platform for dataflows across distributed
resources. ECHO's hybrid dataflow composition can operate on diverse data
models -- streams, micro-batches and files, and interface with native runtime
engines like TensorFlow and Storm to execute them. It manages the application's
lifecycle, including container-based deployment and a registry for state
management. ECHO can schedule the dataflow on different Edge, Fog and Cloud
resources, and also perform dynamic task migration between resources. We
validate the ECHO platform for executing video analytics and sensor streams for
Smart Traffic and Smart Utility applications on Raspberry Pi, NVidia TX1, ARM64
and Azure Cloud VM resources, and present our results.Comment: 17 pages, 5 figures, 2 tables, submitted to ICSOC-201
Real World Applications of Machine Learning Techniques over Large Mobile Subscriber Datasets
Communication Service Providers (CSPs) are in a unique position to utilize
their vast transactional data assets generated from interactions of subscribers
with network elements as well as with other subscribers. CSPs could leverage
its data assets for a gamut of applications such as service personalization,
predictive offer management, loyalty management, revenue forecasting, network
capacity planning, product bundle optimization and churn management to gain
significant competitive advantage. However, due to the sheer data volume,
variety, velocity and veracity of mobile subscriber datasets, sophisticated
data analytics techniques and frameworks are necessary to derive actionable
insights in a useable timeframe. In this paper, we describe our journey from a
relational database management system (RDBMS) based campaign management
solution which allowed data scientists and marketers to use hand-written rules
for service personalization and targeted promotions to a distributed Big Data
Analytics platform, capable of performing large scale machine learning and data
mining to deliver real time service personalization, predictive modelling and
product optimization. Our work involves a careful blend of technology,
processes and best practices, which facilitate man-machine collaboration and
continuous experimentation to derive measurable economic value from data. Our
platform has a reach of more than 500 million mobile subscribers worldwide,
delivering over 1 billion personalized recommendations annually, processing a
total data volume of 64 Petabytes, corresponding to 8.5 trillion events.Comment: SE4ML: Software Engineering for Machine Learning (NIPS 2014 Workshop)
https://sites.google.com/site/software4ml/accepted-paper
Edge Intelligence: Paving the Last Mile of Artificial Intelligence with Edge Computing
With the breakthroughs in deep learning, the recent years have witnessed a
booming of artificial intelligence (AI) applications and services, spanning
from personal assistant to recommendation systems to video/audio surveillance.
More recently, with the proliferation of mobile computing and
Internet-of-Things (IoT), billions of mobile and IoT devices are connected to
the Internet, generating zillions Bytes of data at the network edge. Driving by
this trend, there is an urgent need to push the AI frontiers to the network
edge so as to fully unleash the potential of the edge big data. To meet this
demand, edge computing, an emerging paradigm that pushes computing tasks and
services from the network core to the network edge, has been widely recognized
as a promising solution. The resulted new inter-discipline, edge AI or edge
intelligence, is beginning to receive a tremendous amount of interest. However,
research on edge intelligence is still in its infancy stage, and a dedicated
venue for exchanging the recent advances of edge intelligence is highly desired
by both the computer system and artificial intelligence communities. To this
end, we conduct a comprehensive survey of the recent research efforts on edge
intelligence. Specifically, we first review the background and motivation for
artificial intelligence running at the network edge. We then provide an
overview of the overarching architectures, frameworks and emerging key
technologies for deep learning model towards training/inference at the network
edge. Finally, we discuss future research opportunities on edge intelligence.
We believe that this survey will elicit escalating attentions, stimulate
fruitful discussions and inspire further research ideas on edge intelligence.Comment: Zhi Zhou, Xu Chen, En Li, Liekang Zeng, Ke Luo, and Junshan Zhang,
"Edge Intelligence: Paving the Last Mile of Artificial Intelligence with Edge
Computing," Proceedings of the IEE
The Role of Big Data Analytics in Industrial Internet of Things
Big data production in industrial Internet of Things (IIoT) is evident due to
the massive deployment of sensors and Internet of Things (IoT) devices.
However, big data processing is challenging due to limited computational,
networking and storage resources at IoT device-end. Big data analytics (BDA) is
expected to provide operational- and customer-level intelligence in IIoT
systems. Although numerous studies on IIoT and BDA exist, only a few studies
have explored the convergence of the two paradigms. In this study, we
investigate the recent BDA technologies, algorithms and techniques that can
lead to the development of intelligent IIoT systems. We devise a taxonomy by
classifying and categorising the literature on the basis of important
parameters (e.g. data sources, analytics tools, analytics techniques,
requirements, industrial analytics applications and analytics types). We
present the frameworks and case studies of the various enterprises that have
benefited from BDA. We also enumerate the considerable opportunities introduced
by BDA in IIoT.We identify and discuss the indispensable challenges that remain
to be addressed as future research directions as well
Role of Apache Software Foundation in Big Data Projects
With the increase in amount of Big Data being generated each year, tools and
technologies developed and used for the purpose of storing, processing and
analyzing Big Data has also improved. Open-Source software has been an
important factor in the success and innovation in the field of Big Data while
Apache Software Foundation (ASF) has played a crucial role in this success and
innovation by providing a number of state-of-the-art projects, free and open to
the public. ASF has classified its project in different categories. In this
report, projects listed under Big Data category are deeply analyzed and
discussed with reference to one-of-the seven sub-categories defined. Our
investigation has shown that many of the Apache Big Data projects are
autonomous but some are built based on other Apache projects and some work in
conjunction with other projects to improve and ease development in Big Data
space
Characterizing Application Scheduling on Edge, Fog and Cloud Computing Resources
Cloud computing has grown to become a popular distributed computing service
offered by commercial providers. More recently, Edge and Fog computing
resources have emerged on the wide-area network as part of Internet of Things
(IoT) deployments. These three resource abstraction layers are complementary,
and provide distinctive benefits. Scheduling applications on clouds has been an
active area of research, with workflow and dataflow models serving as a
flexible abstraction to specify applications for execution. However, the
application programming and scheduling models for edge and fog are still
maturing, and can benefit from learnings on cloud resources. At the same time,
there is also value in using these resources cohesively for application
execution. In this article, we present a taxonomy of concepts essential for
specifying and solving the problem of scheduling applications on edge, for and
cloud computing resources. We first characterize the resource capabilities and
limitations of these infrastructure, and design a taxonomy of application
models, Quality of Service (QoS) constraints and goals, and scheduling
techniques, based on a literature review. We also tabulate key research
prototypes and papers using this taxonomy. This survey benefits developers and
researchers on these distributed resources in designing and categorizing their
applications, selecting the relevant computing abstraction(s), and developing
or selecting the appropriate scheduling algorithm. It also highlights gaps in
literature where open problems remain.Comment: Pre-print of journal article: Varshney P, Simmhan Y. Characterizing
application scheduling on edge, fog, and cloud computing resources. Softw:
Pract Exper. 2019; 1--37. https://doi.org/10.1002/spe.269
MacroBase: Prioritizing Attention in Fast Data
As data volumes continue to rise, manual inspection is becoming increasingly
untenable. In response, we present MacroBase, a data analytics engine that
prioritizes end-user attention in high-volume fast data streams. MacroBase
enables efficient, accurate, and modular analyses that highlight and aggregate
important and unusual behavior, acting as a search engine for fast data.
MacroBase is able to deliver order-of-magnitude speedups over alternatives by
optimizing the combination of explanation and classification tasks and by
leveraging a new reservoir sampler and heavy-hitters sketch specialized for
fast data streams. As a result, MacroBase delivers accurate results at speeds
of up to 2M events per second per query on a single core. The system has
delivered meaningful results in production, including at a telematics company
monitoring hundreds of thousands of vehicles.Comment: SIGMOD 201
Scaling Video Analytics Systems to Large Camera Deployments
Driven by advances in computer vision and the falling costs of camera
hardware, organizations are deploying video cameras en masse for the spatial
monitoring of their physical premises. Scaling video analytics to massive
camera deployments, however, presents a new and mounting challenge, as compute
cost grows proportionally to the number of camera feeds. This paper is driven
by a simple question: can we scale video analytics in such a way that cost
grows sublinearly, or even remains constant, as we deploy more cameras, while
inference accuracy remains stable, or even improves. We believe the answer is
yes. Our key observation is that video feeds from wide-area camera deployments
demonstrate significant content correlations (e.g. to other geographically
proximate feeds), both in space and over time. These spatio-temporal
correlations can be harnessed to dramatically reduce the size of the inference
search space, decreasing both workload and false positive rates in multi-camera
video analytics. By discussing use-cases and technical challenges, we propose a
roadmap for scaling video analytics to large camera networks, and outline a
plan for its realization.Comment: HotMobile 201
Infrastructure for Usable Machine Learning: The Stanford DAWN Project
Despite incredible recent advances in machine learning, building machine
learning applications remains prohibitively time-consuming and expensive for
all but the best-trained, best-funded engineering organizations. This expense
comes not from a need for new and improved statistical models but instead from
a lack of systems and tools for supporting end-to-end machine learning
application development, from data preparation and labeling to
productionization and monitoring. In this document, we outline opportunities
for infrastructure supporting usable, end-to-end machine learning applications
in the context of the nascent DAWN (Data Analytics for What's Next) project at
Stanford
Big Data Systems Meet Machine Learning Challenges: Towards Big Data Science as a Service
Recently, we have been witnessing huge advancements in the scale of data we
routinely generate and collect in pretty much everything we do, as well as our
ability to exploit modern technologies to process, analyze and understand this
data. The intersection of these trends is what is called, nowadays, as Big Data
Science. Cloud computing represents a practical and cost-effective solution for
supporting Big Data storage, processing and for sophisticated analytics
applications. We analyze in details the building blocks of the software stack
for supporting big data science as a commodity service for data scientists. We
provide various insights about the latest ongoing developments and open
challenges in this domain
- …