932 research outputs found
Packing: Towards 2x NLP BERT Acceleration
We find that at sequence length 512 padding tokens represent in excess of 50%
of the Wikipedia dataset used for pretraining BERT (Bidirectional Encoder
Representations from Transformers). Therefore by removing all padding we
achieve a 2x speed-up in terms of sequences/sec. To exploit this characteristic
of the dataset, we develop and contrast two deterministic packing algorithms.
Both algorithms rely on the assumption that sequences are interchangeable and
therefore packing can be performed on the histogram of sequence lengths, rather
than per sample. This transformation of the problem leads to algorithms which
are fast and have linear complexity in dataset size. The shortest-pack-first
histogram-packing (SPFHP) algorithm determines the packing order for the
Wikipedia dataset of over 16M sequences in 0.02 seconds. The non-negative
least-squares histogram-packing (NNLSHP) algorithm converges in 28.4 seconds
but produces solutions which are more depth efficient, managing to get near
optimal packing by combining a maximum of 3 sequences in one sample. Using the
dataset with multiple sequences per sample requires additional masking in the
attention layer and a modification of the MLM loss function. We demonstrate
that both of these changes are straightforward to implement and have relatively
little impact on the achievable performance gain on modern hardware. Finally,
we pretrain BERT-Large using the packed dataset, demonstrating no loss of
convergence and the desired 2x speed-up
Asymptotic optimality of bestfit for stochastic bin packing
ABSTRACT In the static bin packing problem, items of different sizes must be packed into bins or servers with unit capacity in a way that minimizes the number of bins used, and it is well-known to be a hard combinatorial problem. Best-Fit is among the simplest online heuristics for this problem. Motivated by the problem of packing virtual machines in servers in the cloud, we consider the dynamic version of this problem, when jobs arrive randomly over time and leave the system after completion of their service. We analyze the fluid limits of the system under an asymptotic Best-Fit algorithm and show that it asymptotically minimizes the number of servers used in steady state (on the fluid scale). The significance of the result is due to the fact that Best-Fit seems to achieve the best performance in practice
Learning Scheduling Algorithms for Data Processing Clusters
Efficiently scheduling data processing jobs on distributed compute clusters
requires complex algorithms. Current systems, however, use simple generalized
heuristics and ignore workload characteristics, since developing and tuning a
scheduling policy for each workload is infeasible. In this paper, we show that
modern machine learning techniques can generate highly-efficient policies
automatically. Decima uses reinforcement learning (RL) and neural networks to
learn workload-specific scheduling algorithms without any human instruction
beyond a high-level objective such as minimizing average job completion time.
Off-the-shelf RL techniques, however, cannot handle the complexity and scale of
the scheduling problem. To build Decima, we had to develop new representations
for jobs' dependency graphs, design scalable RL models, and invent RL training
methods for dealing with continuous stochastic job arrivals. Our prototype
integration with Spark on a 25-node cluster shows that Decima improves the
average job completion time over hand-tuned scheduling heuristics by at least
21%, achieving up to 2x improvement during periods of high cluster load
Particle size distribution estimation of a powder agglomeration process using acoustic emissions
Washing powder needs to undergo quality checks before it is sold, and according to a report by the partner company, these quality checks include an offline procedure where a reference sieve analysis is used to determine the size distributions of the powder. This method is reportedly slow, and cannot be used to measure large agglomerates of powders. A solution to this problem was proposed with the implementation of real time Acoustic Emissions (AE) which would provide the sufficient information to make an assessment of the nature of the particle sizes.
From the literature reviewed for this thesis, it was observed that particle sizes can be monitored online with AE but there does not appear to be a system capable of monitoring particle sizes for processes where the final powder mixture ratio varies significantly. This has been identified as a knowledge gap in existing literature and the research carried out for this thesis contributes to closing that gap.
To investigate this problem, a benchtop experimental rig was designed. The rig represented limited operating conditions of the mixer but retained the critical factors. The acquired data was analysed with a designed hybrid signal processing method based on a time domain analysis of impact peaks using an amplitude threshold approach.
Glass beads, polyethylene and washing powder particles were considered for the experiments, and the results showed that within the tested conditions, the designed signal processing approach was capable of estimating the PSD of various powder mixture combinations comprising particles in the range of 53-1500 microns, it was also noted that the architecture of the designed signal processing method allowed for a quicker online computation time when compared with other notable hybrid signal processing methods for particle sizing in the literature
Crowdsourced Quantification and Visualization of Urban Mobility Space Inequality
Most cities are car-centric, allocating a privileged amount of urban space to cars at the expense of sustainable mobility like cycling. Simultaneously, privately owned vehicles are vastly underused, wasting valuable opportunities for accommodating more people in a livable urban environment by occupying spacious parking areas. Since a data-driven quantification and visualization of such urban mobility space inequality is lacking, here we explore how crowdsourced data can help to advance its understanding. In particular, we describe how the open-source online platform What the Street!? uses massive user-generated data from OpenStreetMap for the interactive exploration of city-wide mobility spaces. Using polygon packing and graph algorithms, the platform rearranges all parking and mobility spaces of cars, rails, and bicycles of a city to be directly comparable, making mobility space inequality accessible to a broad public. This crowdsourced method confirms a prevalent imbalance between modal share and space allocation in 23 cities worldwide, typically discriminating bicycles. Analyzing the guesses of the platform’s visitors about mobility space distributions, we find that this discrimination is consistently underestimated in the public opinion. Finally, we discuss a visualized scenario in which extensive parking areas are regained through fleets of shared, autonomous vehicles. We outline how such accessible visualization platforms can facilitate urban planners and policy makers to reclaim road and parking space for pushing forward sustainable transport solutions
- …