932 research outputs found

    Packing: Towards 2x NLP BERT Acceleration

    Full text link
    We find that at sequence length 512 padding tokens represent in excess of 50% of the Wikipedia dataset used for pretraining BERT (Bidirectional Encoder Representations from Transformers). Therefore by removing all padding we achieve a 2x speed-up in terms of sequences/sec. To exploit this characteristic of the dataset, we develop and contrast two deterministic packing algorithms. Both algorithms rely on the assumption that sequences are interchangeable and therefore packing can be performed on the histogram of sequence lengths, rather than per sample. This transformation of the problem leads to algorithms which are fast and have linear complexity in dataset size. The shortest-pack-first histogram-packing (SPFHP) algorithm determines the packing order for the Wikipedia dataset of over 16M sequences in 0.02 seconds. The non-negative least-squares histogram-packing (NNLSHP) algorithm converges in 28.4 seconds but produces solutions which are more depth efficient, managing to get near optimal packing by combining a maximum of 3 sequences in one sample. Using the dataset with multiple sequences per sample requires additional masking in the attention layer and a modification of the MLM loss function. We demonstrate that both of these changes are straightforward to implement and have relatively little impact on the achievable performance gain on modern hardware. Finally, we pretrain BERT-Large using the packed dataset, demonstrating no loss of convergence and the desired 2x speed-up

    A study on crate sizing, inventory and packing problem

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Asymptotic optimality of bestfit for stochastic bin packing

    Get PDF
    ABSTRACT In the static bin packing problem, items of different sizes must be packed into bins or servers with unit capacity in a way that minimizes the number of bins used, and it is well-known to be a hard combinatorial problem. Best-Fit is among the simplest online heuristics for this problem. Motivated by the problem of packing virtual machines in servers in the cloud, we consider the dynamic version of this problem, when jobs arrive randomly over time and leave the system after completion of their service. We analyze the fluid limits of the system under an asymptotic Best-Fit algorithm and show that it asymptotically minimizes the number of servers used in steady state (on the fluid scale). The significance of the result is due to the fact that Best-Fit seems to achieve the best performance in practice

    Learning Scheduling Algorithms for Data Processing Clusters

    Full text link
    Efficiently scheduling data processing jobs on distributed compute clusters requires complex algorithms. Current systems, however, use simple generalized heuristics and ignore workload characteristics, since developing and tuning a scheduling policy for each workload is infeasible. In this paper, we show that modern machine learning techniques can generate highly-efficient policies automatically. Decima uses reinforcement learning (RL) and neural networks to learn workload-specific scheduling algorithms without any human instruction beyond a high-level objective such as minimizing average job completion time. Off-the-shelf RL techniques, however, cannot handle the complexity and scale of the scheduling problem. To build Decima, we had to develop new representations for jobs' dependency graphs, design scalable RL models, and invent RL training methods for dealing with continuous stochastic job arrivals. Our prototype integration with Spark on a 25-node cluster shows that Decima improves the average job completion time over hand-tuned scheduling heuristics by at least 21%, achieving up to 2x improvement during periods of high cluster load

    Particle size distribution estimation of a powder agglomeration process using acoustic emissions

    Get PDF
    Washing powder needs to undergo quality checks before it is sold, and according to a report by the partner company, these quality checks include an offline procedure where a reference sieve analysis is used to determine the size distributions of the powder. This method is reportedly slow, and cannot be used to measure large agglomerates of powders. A solution to this problem was proposed with the implementation of real time Acoustic Emissions (AE) which would provide the sufficient information to make an assessment of the nature of the particle sizes. From the literature reviewed for this thesis, it was observed that particle sizes can be monitored online with AE but there does not appear to be a system capable of monitoring particle sizes for processes where the final powder mixture ratio varies significantly. This has been identified as a knowledge gap in existing literature and the research carried out for this thesis contributes to closing that gap. To investigate this problem, a benchtop experimental rig was designed. The rig represented limited operating conditions of the mixer but retained the critical factors. The acquired data was analysed with a designed hybrid signal processing method based on a time domain analysis of impact peaks using an amplitude threshold approach. Glass beads, polyethylene and washing powder particles were considered for the experiments, and the results showed that within the tested conditions, the designed signal processing approach was capable of estimating the PSD of various powder mixture combinations comprising particles in the range of 53-1500 microns, it was also noted that the architecture of the designed signal processing method allowed for a quicker online computation time when compared with other notable hybrid signal processing methods for particle sizing in the literature

    Crowdsourced Quantification and Visualization of Urban Mobility Space Inequality

    Get PDF
    Most cities are car-centric, allocating a privileged amount of urban space to cars at the expense of sustainable mobility like cycling. Simultaneously, privately owned vehicles are vastly underused, wasting valuable opportunities for accommodating more people in a livable urban environment by occupying spacious parking areas. Since a data-driven quantification and visualization of such urban mobility space inequality is lacking, here we explore how crowdsourced data can help to advance its understanding. In particular, we describe how the open-source online platform What the Street!? uses massive user-generated data from OpenStreetMap for the interactive exploration of city-wide mobility spaces. Using polygon packing and graph algorithms, the platform rearranges all parking and mobility spaces of cars, rails, and bicycles of a city to be directly comparable, making mobility space inequality accessible to a broad public. This crowdsourced method confirms a prevalent imbalance between modal share and space allocation in 23 cities worldwide, typically discriminating bicycles. Analyzing the guesses of the platform’s visitors about mobility space distributions, we find that this discrimination is consistently underestimated in the public opinion. Finally, we discuss a visualized scenario in which extensive parking areas are regained through fleets of shared, autonomous vehicles. We outline how such accessible visualization platforms can facilitate urban planners and policy makers to reclaim road and parking space for pushing forward sustainable transport solutions
    corecore