933 research outputs found

    Real-time Bidding for Online Advertising: Measurement and Analysis

    Get PDF
    The real-time bidding (RTB), aka programmatic buying, has recently become the fastest growing area in online advertising. Instead of bulking buying and inventory-centric buying, RTB mimics stock exchanges and utilises computer algorithms to automatically buy and sell ads in real-time; It uses per impression context and targets the ads to specific people based on data about them, and hence dramatically increases the effectiveness of display advertising. In this paper, we provide an empirical analysis and measurement of a production ad exchange. Using the data sampled from both demand and supply side, we aim to provide first-hand insights into the emerging new impression selling infrastructure and its bidding behaviours, and help identifying research and design issues in such systems. From our study, we observed that periodic patterns occur in various statistics including impressions, clicks, bids, and conversion rates (both post-view and post-click), which suggest time-dependent models would be appropriate for capturing the repeated patterns in RTB. We also found that despite the claimed second price auction, the first price payment in fact is accounted for 55.4% of total cost due to the arrangement of the soft floor price. As such, we argue that the setting of soft floor price in the current RTB systems puts advertisers in a less favourable position. Furthermore, our analysis on the conversation rates shows that the current bidding strategy is far less optimal, indicating the significant needs for optimisation algorithms incorporating the facts such as the temporal behaviours, the frequency and recency of the ad displays, which have not been well considered in the past.Comment: Accepted by ADKDD '13 worksho

    Efficient Neural Network Implementations on Parallel Embedded Platforms Applied to Real-Time Torque-Vectoring Optimization Using Predictions for Multi-Motor Electric Vehicles

    Get PDF
    The combination of machine learning and heterogeneous embedded platforms enables new potential for developing sophisticated control concepts which are applicable to the field of vehicle dynamics and ADAS. This interdisciplinary work provides enabler solutions -ultimately implementing fast predictions using neural networks (NNs) on field programmable gate arrays (FPGAs) and graphical processing units (GPUs)- while applying them to a challenging application: Torque Vectoring on a multi-electric-motor vehicle for enhanced vehicle dynamics. The foundation motivating this work is provided by discussing multiple domains of the technological context as well as the constraints related to the automotive field, which contrast with the attractiveness of exploiting the capabilities of new embedded platforms to apply advanced control algorithms for complex control problems. In this particular case we target enhanced vehicle dynamics on a multi-motor electric vehicle benefiting from the greater degrees of freedom and controllability offered by such powertrains. Considering the constraints of the application and the implications of the selected multivariable optimization challenge, we propose a NN to provide batch predictions for real-time optimization. This leads to the major contribution of this work: efficient NN implementations on two intrinsically parallel embedded platforms, a GPU and a FPGA, following an analysis of theoretical and practical implications of their different operating paradigms, in order to efficiently harness their computing potential while gaining insight into their peculiarities. The achieved results exceed the expectations and additionally provide a representative illustration of the strengths and weaknesses of each kind of platform. Consequently, having shown the applicability of the proposed solutions, this work contributes valuable enablers also for further developments following similar fundamental principles.Some of the results presented in this work are related to activities within the 3Ccar project, which has received funding from ECSEL Joint Undertaking under grant agreement No. 662192. This Joint Undertaking received support from the European Union’s Horizon 2020 research and innovation programme and Germany, Austria, Czech Republic, Romania, Belgium, United Kingdom, France, Netherlands, Latvia, Finland, Spain, Italy, Lithuania. This work was also partly supported by the project ENABLES3, which received funding from ECSEL Joint Undertaking under grant agreement No. 692455-2

    Model-driven Scheduling for Distributed Stream Processing Systems

    Full text link
    Distributed Stream Processing frameworks are being commonly used with the evolution of Internet of Things(IoT). These frameworks are designed to adapt to the dynamic input message rate by scaling in/out.Apache Storm, originally developed by Twitter is a widely used stream processing engine while others includes Flink, Spark streaming. For running the streaming applications successfully there is need to know the optimal resource requirement, as over-estimation of resources adds extra cost.So we need some strategy to come up with the optimal resource requirement for a given streaming application. In this article, we propose a model-driven approach for scheduling streaming applications that effectively utilizes a priori knowledge of the applications to provide predictable scheduling behavior. Specifically, we use application performance models to offer reliable estimates of the resource allocation required. Further, this intuition also drives resource mapping, and helps narrow the estimated and actual dataflow performance and resource utilization. Together, this model-driven scheduling approach gives a predictable application performance and resource utilization behavior for executing a given DSPS application at a target input stream rate on distributed resources.Comment: 54 page

    BriskStream: Scaling Data Stream Processing on Shared-Memory Multicore Architectures

    Full text link
    We introduce BriskStream, an in-memory data stream processing system (DSPSs) specifically designed for modern shared-memory multicore architectures. BriskStream's key contribution is an execution plan optimization paradigm, namely RLAS, which takes relative-location (i.e., NUMA distance) of each pair of producer-consumer operators into consideration. We propose a branch and bound based approach with three heuristics to resolve the resulting nontrivial optimization problem. The experimental evaluations demonstrate that BriskStream yields much higher throughput and better scalability than existing DSPSs on multi-core architectures when processing different types of workloads.Comment: To appear in SIGMOD'1

    Approximate FPGA-based LSTMs under Computation Time Constraints

    Full text link
    Recurrent Neural Networks and in particular Long Short-Term Memory (LSTM) networks have demonstrated state-of-the-art accuracy in several emerging Artificial Intelligence tasks. However, the models are becoming increasingly demanding in terms of computational and memory load. Emerging latency-sensitive applications including mobile robots and autonomous vehicles often operate under stringent computation time constraints. In this paper, we address the challenge of deploying computationally demanding LSTMs at a constrained time budget by introducing an approximate computing scheme that combines iterative low-rank compression and pruning, along with a novel FPGA-based LSTM architecture. Combined in an end-to-end framework, the approximation method's parameters are optimised and the architecture is configured to address the problem of high-performance LSTM execution in time-constrained applications. Quantitative evaluation on a real-life image captioning application indicates that the proposed methods required up to 6.5x less time to achieve the same application-level accuracy compared to a baseline method, while achieving an average of 25x higher accuracy under the same computation time constraints.Comment: Accepted at the 14th International Symposium in Applied Reconfigurable Computing (ARC) 201

    Characterization and Acceleration of High Performance Compute Workloads

    Get PDF

    Characterization and Acceleration of High Performance Compute Workloads

    Get PDF

    High-Level Annotation of Routing Congestion for Xilinx Vivado HLS Designs

    Get PDF
    Ever since transistor cost stopped decreasing, customized programmable platforms, such as field-programmable gate arrays (FPGAs), became a major way to improve software execution performance and energy consumption. While software developers can use high-level synthesis (HLS) to speed up register-transfer level (RTL) code generation from C++ or OpenCL source code, placement and routing issues, such as congestion, can still prevent achieving an FPGA programming bitstream or dramatically reduce the FPGA implementation performance. Congestion reports from physical design tools refer to thousands of RTL signal names instead of developer-accessible identifiers and statements, considerably complicating the developer understanding and resolution of the issues at the source level. We propose a high-level back-annotation flow that summarizes the routing congestion issues at the source level by analyzing the reports from the FPGA physical design tools and the internal debugging files of the HLS tools. Our flow describes congestion using comments back-annotated on the source code and identifies if the congestion causes are the on-chip memories or the DSP units (multipliers/adders), which are the shared resources very often associated with routing problems on FPGAs. We demonstrate on realistic large designs how the information provided by our flow helps to quickly spot congestion causes at the source level and to solve them using appropriate HLS directives
    corecore