Search CORE

333 research outputs found

On data skewness, stragglers, and MapReduce progress indicators

Author: Chambers J. M.
Dai J.
Gufler B.
Herodotou H.
Herodotou H.
Li J.
Ousterhout K.
Zaharia M.
Publication venue
Publication date: 01/01/2015
Field of study

We tackle the problem of predicting the performance of MapReduce applications, designing accurate progress indicators that keep programmers informed on the percentage of completed computation time during the execution of a job. Through extensive experiments, we show that state-of-the-art progress indicators (including the one provided by Hadoop) can be seriously harmed by data skewness, load unbalancing, and straggling tasks. This is mainly due to their implicit assumption that the running time depends linearly on the input size. We thus design a novel profile-guided progress indicator, called NearestFit, that operates without the linear hypothesis assumption and exploits a careful combination of nearest neighbor regression and statistical curve fitting techniques. Our theoretical progress model requires fine-grained profile data, that can be very difficult to manage in practice. To overcome this issue, we resort to computing accurate approximations for some of the quantities used in our model through space- and time-efficient data streaming algorithms. We implemented NearestFit on top of Hadoop 2.6.0. An extensive empirical assessment over the Amazon EC2 platform on a variety of real-world benchmarks shows that NearestFit is practical w.r.t. space and time overheads and that its accuracy is generally very good, even in scenarios where competitors incur non-negligible errors and wide prediction fluctuations. Overall, NearestFit significantly improves the current state-of-art on progress analysis for MapReduce

arXiv.org e-Print Archive

Crossref

Archivio della ricerca- LUISS Libera Università Internazionale degli Studi Sociali Guido Carli di Roma

Archivio della ricerca- Università di Roma La Sapienza

Low latency via redundancy

Author: Al-Fares M.
Ananthanarayanan G.
Andersen D. G.
Asmussen S.
Beaver D.
Han D.
Li J.
Zaharia M.
Zwart A. P.
Publication venue
Publication date: 16/06/2013
Field of study

Low latency is critical for interactive networked applications. But while we know how to scale systems to increase capacity, reducing latency --- especially the tail of the latency distribution --- can be much more difficult. In this paper, we argue that the use of redundancy is an effective way to convert extra capacity into reduced latency. By initiating redundant operations across diverse resources and using the first result which completes, redundancy improves a system's latency even under exceptional conditions. We study the tradeoff with added system utilization, characterizing the situations in which replicating all tasks reduces mean latency. We then demonstrate empirically that replicating all operations can result in significant mean and tail latency reduction in real-world systems including DNS queries, database servers, and packet forwarding within networks

arXiv.org e-Print Archive

CiteSeerX

Crossref

Modelling e-commerce customer reactions. Exploring online shopping carnivals in China

Author: Edu Tudor
Fam Kim-Shyan
Li Qi
Liu Yang
Negricea Costel
Zaharia Razvan
Publication venue: Taylor and Francis Group and Juraj Dobrila University of Pula, Faculty of economics and tourism Dr. Mijo Mirković
Publication date: 01/09/2021
Field of study

This research investigates customer reactions by exploring satisfaction(SAT), complaints(CC) and loyalty(CL) in an online shopping carnival(OSC) context in China. Expanding the American Customer Satisfaction Index(ACSI) model by including e-commerce corporate image(ECCI) next to customer expectations(CE), perceived quality(PQ), perceived value(PV), SAT was determined, while CC and CL were estimated based on SAT. For estimating CL, ECCI was added. 300 valid questionnaires were collected from Chinese shoppers with OSC experience. The research hypotheses were tested through Confirmatory Factor Analysis and Structural Equation Modelling. The results prompt five key paths influencing SAT and CL. No significant impact on and of CC was identified. ECCI significantly impacted on CC, SAT and CL. This study provides in the context of OSCs a new research perspective of customer reactions, centred on satisfaction, emphasising the role of image on expectations, satisfaction and loyalty, and incorporating customer complaints to quantify negative aspects of shopping experience in determining customer loyalty. E-commerce companies should deliver unforgettable customer experience through building a long-lasting image, offering consistent quality and delivering clearly-delineated value, as antecedents of satisfaction and loyalty. The model can be further expanded by exploring the consequences of customer loyalty on potential buying behaviour, focusing on purchasing intention and recommendations

Directory of Open Access Journals

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Random Hyper-parameter Search-Based Deep Neural Network for Power Consumption Forecasting

Author: DG Manolakis
GI Diaz
H Cheng
J Bergstra
J Torres
M Zaharia
X Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

In this paper, we introduce a deep learning approach, based on feed-forward neural networks, for big data time series forecasting with arbitrary prediction horizons. We firstly propose a random search to tune the multiple hyper-parameters involved in the method perfor-mance. There is a twofold objective for this search: firstly, to improve the forecasts and, secondly, to decrease the learning time. Next, we pro-pose a procedure based on moving averages to smooth the predictions obtained by the different models considered for each value of the pre-diction horizon. We conduct a comprehensive evaluation using a real-world dataset composed of electricity consumption in Spain, evaluating accuracy and comparing the performance of the proposed deep learning with a grid search and a random search without applying smoothing. Reported results show that a random search produces competitive accu-racy results generating a smaller number of models, and the smoothing process reduces the forecasting error.Ministerio de Economía y Competitividad TIN2017-88209-C2-1-

Crossref

idUS. Depósito de Investigación Universidad de Sevilla

Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP

Author: Hall David
Khattab Omar
Li Xiang Lisa
Liang Percy
Potts Christopher
Santhanam Keshav
Zaharia Matei
Publication venue
Publication date: 28/12/2022
Field of study

Retrieval-augmented in-context learning has emerged as a powerful approach for addressing knowledge-intensive tasks using frozen language models (LM) and retrieval models (RM). Existing work has combined these in simple "retrieve-then-read" pipelines in which the RM retrieves passages that are inserted into the LM prompt. To begin to fully realize the potential of frozen LMs and RMs, we propose Demonstrate-Search-Predict (DSP), a framework that relies on passing natural language texts in sophisticated pipelines between an LM and an RM. DSP can express high-level programs that bootstrap pipeline-aware demonstrations, search for relevant passages, and generate grounded predictions, systematically breaking down problems into small transformations that the LM and RM can handle more reliably. We have written novel DSP programs for answering questions in open-domain, multi-hop, and conversational settings, establishing in early evaluations new state-of-the-art in-context learning results and delivering 37-200%, 8-40%, and 80-290% relative gains against vanilla LMs, a standard retrieve-then-read pipeline, and a contemporaneous self-ask pipeline, respectively

arXiv.org e-Print Archive

Hybrid, Optical and Wireless Near-Gigabit Communications System

Author: Charbonier Benoit
Hongwu Li
Kokar Yvan
Rakotondrainibe Lahatra
Tnaguy Eric
Zaharia Gheorghe
Zein Ghaïs El
Publication venue
Publication date: 09/09/2009
Field of study

This paper presents the study and the realization of a hybrid 60 GHz wireless communications system. As the 60 GHz radio link operates only in a single-room configuration, an additional Radio over Fibre (RoF) link is used to ensure the communications in all the rooms of a residential environment. A single carrier architecture is adopted. The system uses low complexity baseband processing modules. A byte/frame synchronization technique is designed to provide a high value of the preamble detection probability and a very small value of the false alarm probability. Conventional RS (255, 239) encoder and decoder are used to correct errors in the transmission channel. Results of Bit Error Rate (BER) measurements are presented for various antennas configurations

arXiv.org e-Print Archive

Modeling ring current ion and electron dynamics and plasma instabilities during a high‐speed stream driven storm

Author: Abel
Albert
Borovsky
Bortnik
Burton
Chen
Chen
Chen
Chen
Chen
D. T. Welling
Daglis
Dessler
Elkington
Erlandson
Frank
Fraser
Hairston
Horne
Horne
Horne
Horne
Horne
Jordanova
Jordanova
Jordanova
Jordanova
Jordanova
Jordanova
Jordanova
Jordanova
Jordanova
Jordanova
Kennel
Kennel
Kennel
Kistler
Kozyra
Krieger
L. Chen
Lennartsson
Li
Li
Li
Liu
Liu
Lorentzen
Loto'aniu
Mauk
Maynard
Meredith
Meredith
Meredith
Meredith
Miyoshi
Mursula
Nunn
Omura
Paulikas
Perraut
R. M. Thorne
Rasmussen
Roederer
Russell
S. G. Zaharia
Santolík
Santolík
Sckopke
Sheeley
Sibeck
Stern
Summers
Thorne
Thorne
Thorne
Thorne
Tsurutani
Tsyganenko
Turner
Turner
V. K. Jordanova
Volland
Weimer
Welling
Wrenn
Young
Yu
Zaharia
Zaharia
Zaharia
Zaharia
Zaharia
Publication venue: 'American Geophysical Union (AGU)'
Publication date: 01/05/2012
Field of study

Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/94593/1/jgra21840.pd

Crossref

Deep Blue Documents at the University of Michigan

Discretized streams: A fault-tolerant model for scalable stream processing

Author: Haoyuan Li
Haoyuan Li
Matei Zaharia
Matei Zaharia
Scott Shenker
Scott Shenker
Tathagata Das
Tathagata Das
Timothy Hunter
Timothy Hunter
Publication venue
Publication date: 01/01/2012
Field of study

Abstract Many "big data" applications need to act on data arriving in real time. However, current programming models for distributed stream processing are relatively low-level, often leaving the user to worry about consistency of state across the system and fault recovery. Furthermore, the models that provide fault recovery do so in an expensive manner, requiring either hot replication or long recovery times. We propose a new programming model, discretized streams (D-Streams), that offers a high-level functional API, strong consistency, and efficient fault recovery. D-Streams support a new recovery mechanism that improves efficiency over the traditional replication and upstream backup schemes in streaming databasesparallel recovery of lost state-and unlike previous systems, also mitigate stragglers. We implement D-Streams as an extension to the Spark cluster computing engine that lets users seamlessly intermix streaming, batch and interactive queries. Our system can process over 60 million records/second at sub-second latency on 100 nodes

CiteSeerX

Cloud-agnostic architectures for machine learning based on Apache Spark

Author: Al-Gumaei
Dean
Hunt
Kiss
Kiss
Kluyver
Kovács
Kreps
Li
Merkel
Nagy
Nguyen
Pintye
Pop
Pääkkönen
Salloum
Sciacca
Sebök
Shvachko
Taylor
Vavilapalli
Yao
Zaharia
Zaharia
Publication venue: 'Elsevier BV'
Publication date: 01/01/2021
Field of study

Crossref

SZTAKI Publication Repository

Repository of the Academy's Library