3,052 research outputs found
Big Data Caching for Networking: Moving from Cloud to Edge
In order to cope with the relentless data tsunami in wireless networks,
current approaches such as acquiring new spectrum, deploying more base stations
(BSs) and increasing nodes in mobile packet core networks are becoming
ineffective in terms of scalability, cost and flexibility. In this regard,
context-aware G networks with edge/cloud computing and exploitation of
\emph{big data} analytics can yield significant gains to mobile operators. In
this article, proactive content caching in G wireless networks is
investigated in which a big data-enabled architecture is proposed. In this
practical architecture, vast amount of data is harnessed for content popularity
estimation and strategic contents are cached at the BSs to achieve higher
users' satisfaction and backhaul offloading. To validate the proposed solution,
we consider a real-world case study where several hours of mobile data traffic
is collected from a major telecom operator in Turkey and a big data-enabled
analysis is carried out leveraging tools from machine learning. Based on the
available information and storage capacity, numerical studies show that several
gains are achieved both in terms of users' satisfaction and backhaul
offloading. For example, in the case of BSs with of content ratings
and Gbyte of storage size ( of total library size), proactive
caching yields of users' satisfaction and offloads of the
backhaul.Comment: accepted for publication in IEEE Communications Magazine, Special
Issue on Communications, Caching, and Computing for Content-Centric Mobile
Network
ShenZhen transportation system (SZTS): a novel big data benchmark suite
Data analytics is at the core of the supply chain for both products and services in modern economies and societies. Big data workloads, however, are placing unprecedented demands on computing technologies, calling for a deep understanding and characterization of these emerging workloads. In this paper, we propose ShenZhen Transportation System (SZTS), a novel big data Hadoop benchmark suite comprised of real-life transportation analysis applications with real-life input data sets from Shenzhen in China. SZTS uniquely focuses on a specific and real-life application domain whereas other existing Hadoop benchmark suites, such as HiBench and CloudRank-D, consist of generic algorithms with synthetic inputs. We perform a cross-layer workload characterization at the microarchitecture level, the operating system (OS) level, and the job level, revealing unique characteristics of SZTS compared to existing Hadoop benchmarks as well as general-purpose multi-core PARSEC benchmarks. We also study the sensitivity of workload behavior with respect to input data size, and we propose a methodology for identifying representative input data sets
Deep Learning in the Automotive Industry: Applications and Tools
Deep Learning refers to a set of machine learning techniques that utilize
neural networks with many hidden layers for tasks, such as image
classification, speech recognition, language understanding. Deep learning has
been proven to be very effective in these domains and is pervasively used by
many Internet services. In this paper, we describe different automotive uses
cases for deep learning in particular in the domain of computer vision. We
surveys the current state-of-the-art in libraries, tools and infrastructures
(e.\,g.\ GPUs and clouds) for implementing, training and deploying deep neural
networks. We particularly focus on convolutional neural networks and computer
vision use cases, such as the visual inspection process in manufacturing plants
and the analysis of social media data. To train neural networks, curated and
labeled datasets are essential. In particular, both the availability and scope
of such datasets is typically very limited. A main contribution of this paper
is the creation of an automotive dataset, that allows us to learn and
automatically recognize different vehicle properties. We describe an end-to-end
deep learning application utilizing a mobile app for data collection and
process support, and an Amazon-based cloud backend for storage and training.
For training we evaluate the use of cloud and on-premises infrastructures
(including multiple GPUs) in conjunction with different neural network
architectures and frameworks. We assess both the training times as well as the
accuracy of the classifier. Finally, we demonstrate the effectiveness of the
trained classifier in a real world setting during manufacturing process.Comment: 10 page
CloudScope: diagnosing and managing performance interference in multi-tenant clouds
© 2015 IEEE.Virtual machine consolidation is attractive in cloud computing platforms for several reasons including reduced infrastructure costs, lower energy consumption and ease of management. However, the interference between co-resident workloads caused by virtualization can violate the service level objectives (SLOs) that the cloud platform guarantees. Existing solutions to minimize interference between virtual machines (VMs) are mostly based on comprehensive micro-benchmarks or online training which makes them computationally intensive. In this paper, we present CloudScope, a system for diagnosing interference for multi-tenant cloud systems in a lightweight way. CloudScope employs a discrete-time Markov Chain model for the online prediction of performance interference of co-resident VMs. It uses the results to optimally (re)assign VMs to physical machines and to optimize the hypervisor configuration, e.g. the CPU share it can use, for different workloads. We have implemented CloudScope on top of the Xen hypervisor and conducted experiments using a set of CPU, disk, and network intensive workloads and a real system (MapReduce). Our results show that CloudScope interference prediction achieves an average error of 9%. The interference-aware scheduler improves VM performance by up to 10% compared to the default scheduler. In addition, the hypervisor reconfiguration can improve network throughput by up to 30%
CERN Storage Systems for Large-Scale Wireless
The project aims at evaluating the use of CERN computing infrastructure for next generation sensor networks data analysis. The proposed system allows the simulation of a large-scale sensor array for traffic analysis, streaming data to CERN storage systems in an efficient way. The data are made available for offline and quasi-online analysis, enabling both long term planning and fast reaction on the environment
Controlling Network Latency in Mixed Hadoop Clusters: Do We Need Active Queue Management?
With the advent of big data, data center applications are processing vast amounts of unstructured and semi-structured data, in parallel on large clusters, across hundreds to thousands of nodes. The highest performance for these batch big data workloads is achieved using expensive network equipment with large buffers, which accommodate bursts in network traffic and allocate bandwidth fairly even when the network is congested. Throughput-sensitive big data applications are, however, often executed in the same data center as latency-sensitive workloads. For both workloads to be supported well, the network must provide both maximum throughput and low latency. Progress has been made in this direction, as modern network switches support Active Queue Management (AQM) and Explicit Congestion Notifications (ECN), both mechanisms to control the level of queue occupancy, reducing the total network latency. This paper is the first study of the effect of Active Queue Management on both throughput and latency, in the context of Hadoop and the MapReduce programming model. We give a quantitative comparison of four different approaches for controlling buffer occupancy and latency: RED and CoDel, both standalone and also combined with ECN and DCTCP network protocol, and identify the AQM configurations that maintain Hadoop execution time gains from larger buffers within 5%, while reducing network packet latency caused by bufferbloat by up to 85%. Finally, we provide recommendations to administrators of Hadoop clusters as to how to improve latency without degrading the throughput of batch big data workloads.The research leading to these results has received funding from the European Unions Seventh Framework Programme (FP7/2007–2013) under grant agreement number 610456 (Euroserver).
The research was also supported by the Ministry of Economy and Competitiveness of Spain under the contracts TIN2012-34557 and TIN2015-65316-P, Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272), HiPEAC-3 Network of Excellence (ICT- 287759), and the Severo Ochoa Program (SEV-2011-00067) of the Spanish Government.Peer ReviewedPostprint (author's final draft
Big Data Meets Telcos: A Proactive Caching Perspective
Mobile cellular networks are becoming increasingly complex to manage while
classical deployment/optimization techniques and current solutions (i.e., cell
densification, acquiring more spectrum, etc.) are cost-ineffective and thus
seen as stopgaps. This calls for development of novel approaches that leverage
recent advances in storage/memory, context-awareness, edge/cloud computing, and
falls into framework of big data. However, the big data by itself is yet
another complex phenomena to handle and comes with its notorious 4V: velocity,
voracity, volume and variety. In this work, we address these issues in
optimization of 5G wireless networks via the notion of proactive caching at the
base stations. In particular, we investigate the gains of proactive caching in
terms of backhaul offloadings and request satisfactions, while tackling the
large-amount of available data for content popularity estimation. In order to
estimate the content popularity, we first collect users' mobile traffic data
from a Turkish telecom operator from several base stations in hours of time
interval. Then, an analysis is carried out locally on a big data platform and
the gains of proactive caching at the base stations are investigated via
numerical simulations. It turns out that several gains are possible depending
on the level of available information and storage size. For instance, with 10%
of content ratings and 15.4 Gbyte of storage size (87% of total catalog size),
proactive caching achieves 100% of request satisfaction and offloads 98% of the
backhaul when considering 16 base stations.Comment: 8 pages, 5 figure
- …