333 research outputs found

    On data skewness, stragglers, and MapReduce progress indicators

    Full text link
    We tackle the problem of predicting the performance of MapReduce applications, designing accurate progress indicators that keep programmers informed on the percentage of completed computation time during the execution of a job. Through extensive experiments, we show that state-of-the-art progress indicators (including the one provided by Hadoop) can be seriously harmed by data skewness, load unbalancing, and straggling tasks. This is mainly due to their implicit assumption that the running time depends linearly on the input size. We thus design a novel profile-guided progress indicator, called NearestFit, that operates without the linear hypothesis assumption and exploits a careful combination of nearest neighbor regression and statistical curve fitting techniques. Our theoretical progress model requires fine-grained profile data, that can be very difficult to manage in practice. To overcome this issue, we resort to computing accurate approximations for some of the quantities used in our model through space- and time-efficient data streaming algorithms. We implemented NearestFit on top of Hadoop 2.6.0. An extensive empirical assessment over the Amazon EC2 platform on a variety of real-world benchmarks shows that NearestFit is practical w.r.t. space and time overheads and that its accuracy is generally very good, even in scenarios where competitors incur non-negligible errors and wide prediction fluctuations. Overall, NearestFit significantly improves the current state-of-art on progress analysis for MapReduce

    Low latency via redundancy

    Full text link
    Low latency is critical for interactive networked applications. But while we know how to scale systems to increase capacity, reducing latency --- especially the tail of the latency distribution --- can be much more difficult. In this paper, we argue that the use of redundancy is an effective way to convert extra capacity into reduced latency. By initiating redundant operations across diverse resources and using the first result which completes, redundancy improves a system's latency even under exceptional conditions. We study the tradeoff with added system utilization, characterizing the situations in which replicating all tasks reduces mean latency. We then demonstrate empirically that replicating all operations can result in significant mean and tail latency reduction in real-world systems including DNS queries, database servers, and packet forwarding within networks

    Modelling e-commerce customer reactions. Exploring online shopping carnivals in China

    Get PDF
    This research investigates customer reactions by exploring satisfaction(SAT), complaints(CC) and loyalty(CL) in an online shopping carnival(OSC) context in China. Expanding the American Customer Satisfaction Index(ACSI) model by including e-commerce corporate image(ECCI) next to customer expectations(CE), perceived quality(PQ), perceived value(PV), SAT was determined, while CC and CL were estimated based on SAT. For estimating CL, ECCI was added. 300 valid questionnaires were collected from Chinese shoppers with OSC experience. The research hypotheses were tested through Confirmatory Factor Analysis and Structural Equation Modelling. The results prompt five key paths influencing SAT and CL. No significant impact on and of CC was identified. ECCI significantly impacted on CC, SAT and CL. This study provides in the context of OSCs a new research perspective of customer reactions, centred on satisfaction, emphasising the role of image on expectations, satisfaction and loyalty, and incorporating customer complaints to quantify negative aspects of shopping experience in determining customer loyalty. E-commerce companies should deliver unforgettable customer experience through building a long-lasting image, offering consistent quality and delivering clearly-delineated value, as antecedents of satisfaction and loyalty. The model can be further expanded by exploring the consequences of customer loyalty on potential buying behaviour, focusing on purchasing intention and recommendations

    Random Hyper-parameter Search-Based Deep Neural Network for Power Consumption Forecasting

    Get PDF
    In this paper, we introduce a deep learning approach, based on feed-forward neural networks, for big data time series forecasting with arbitrary prediction horizons. We firstly propose a random search to tune the multiple hyper-parameters involved in the method perfor-mance. There is a twofold objective for this search: firstly, to improve the forecasts and, secondly, to decrease the learning time. Next, we pro-pose a procedure based on moving averages to smooth the predictions obtained by the different models considered for each value of the pre-diction horizon. We conduct a comprehensive evaluation using a real-world dataset composed of electricity consumption in Spain, evaluating accuracy and comparing the performance of the proposed deep learning with a grid search and a random search without applying smoothing. Reported results show that a random search produces competitive accu-racy results generating a smaller number of models, and the smoothing process reduces the forecasting error.Ministerio de Economía y Competitividad TIN2017-88209-C2-1-

    Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP

    Full text link
    Retrieval-augmented in-context learning has emerged as a powerful approach for addressing knowledge-intensive tasks using frozen language models (LM) and retrieval models (RM). Existing work has combined these in simple "retrieve-then-read" pipelines in which the RM retrieves passages that are inserted into the LM prompt. To begin to fully realize the potential of frozen LMs and RMs, we propose Demonstrate-Search-Predict (DSP), a framework that relies on passing natural language texts in sophisticated pipelines between an LM and an RM. DSP can express high-level programs that bootstrap pipeline-aware demonstrations, search for relevant passages, and generate grounded predictions, systematically breaking down problems into small transformations that the LM and RM can handle more reliably. We have written novel DSP programs for answering questions in open-domain, multi-hop, and conversational settings, establishing in early evaluations new state-of-the-art in-context learning results and delivering 37-200%, 8-40%, and 80-290% relative gains against vanilla LMs, a standard retrieve-then-read pipeline, and a contemporaneous self-ask pipeline, respectively

    Hybrid, Optical and Wireless Near-Gigabit Communications System

    Get PDF
    This paper presents the study and the realization of a hybrid 60 GHz wireless communications system. As the 60 GHz radio link operates only in a single-room configuration, an additional Radio over Fibre (RoF) link is used to ensure the communications in all the rooms of a residential environment. A single carrier architecture is adopted. The system uses low complexity baseband processing modules. A byte/frame synchronization technique is designed to provide a high value of the preamble detection probability and a very small value of the false alarm probability. Conventional RS (255, 239) encoder and decoder are used to correct errors in the transmission channel. Results of Bit Error Rate (BER) measurements are presented for various antennas configurations

    Discretized streams: A fault-tolerant model for scalable stream processing

    Get PDF
    Abstract Many "big data" applications need to act on data arriving in real time. However, current programming models for distributed stream processing are relatively low-level, often leaving the user to worry about consistency of state across the system and fault recovery. Furthermore, the models that provide fault recovery do so in an expensive manner, requiring either hot replication or long recovery times. We propose a new programming model, discretized streams (D-Streams), that offers a high-level functional API, strong consistency, and efficient fault recovery. D-Streams support a new recovery mechanism that improves efficiency over the traditional replication and upstream backup schemes in streaming databasesparallel recovery of lost state-and unlike previous systems, also mitigate stragglers. We implement D-Streams as an extension to the Spark cluster computing engine that lets users seamlessly intermix streaming, batch and interactive queries. Our system can process over 60 million records/second at sub-second latency on 100 nodes
    corecore