175 research outputs found
Stream Processing Systems Benchmark: StreamBench
Batch processing technologies (Such as MapReduce, Hive, Pig) have matured and been widely used in the industry. These systems solved the issue processing big volumes of data successfully. However, first big amount of data need to be collected and stored in a database or file system. That is very time-consuming. Then it takes time to finish batch processing analysis jobs before get any results. While there are many cases that need analysed results from unbounded sequence of data in seconds or sub-seconds. To satisfy the increasing demand of processing such streaming data, several streaming processing systems are implemented and widely adopted, such as Apache Storm, Apache Spark, IBM InfoSphere Streams, and Apache Flink. They all support online stream processing, high scalability, and tasks monitoring. While how to evaluate stream processing systems before choosing one in production development is an open question.
In this thesis, we introduce StreamBench, a benchmark framework to facilitate performance comparisons of stream processing systems. A common API component and a core set of workloads are defined. We implement the common API and run benchmarks for three widely used open source stream processing systems: Apache Storm, Flink, and Spark Streaming. A key feature of the StreamBench framework is that it is extensible -- it supports easy definition of new workloads, in addition to making it easy to benchmark new stream processing systems
TLM: Token-Level Masking for Transformers
Structured dropout approaches, such as attention dropout and DropHead, have
been investigated to regularize the multi-head attention mechanism in
Transformers. In this paper, we propose a new regularization scheme based on
token-level rather than structure-level to reduce overfitting. Specifically, we
devise a novel Token-Level Masking (TLM) training strategy for Transformers to
regularize the connections of self-attention, which consists of two masking
techniques that are effective and easy to implement. The underlying idea is to
manipulate the connections between tokens in the multi-head attention via
masking, where the networks are forced to exploit partial neighbors'
information to produce a meaningful representation. The generality and
effectiveness of TLM are thoroughly evaluated via extensive experiments on 4
diversified NLP tasks across 18 datasets, including natural language
understanding benchmark GLUE, ChineseGLUE, Chinese Grammatical Error
Correction, and data-to-text generation. The results indicate that TLM can
consistently outperform attention dropout and DropHead, e.g., it increases by
0.5 points relative to DropHead with BERT-large on GLUE. Moreover, TLM can
establish a new record on the data-to-text benchmark Rotowire (18.93 BLEU). Our
code will be publicly available at https://github.com/Young1993/tlm.Comment: 13 pages. Accepted by EMNLP2023 main conferenc
The impact of planetary boundary layer parameterisation over the Yangtze River Delta region, China, part 1: meteorological simulation.
The planetary boundary layer (PBL) is the main region for the exchange of matter, momentum, and energy between land and atmosphere. The transport processes in the PBL determine the distribution of temperature, water vapour, wind speed and other physical quantities and are very important for the simulation of the physical characteristics of the meteorology. Based on the two non-local (YSU, ACM2) and two local closure PBL schemes (MYJ, MYNN) in the Weather Research and Forecasting (WRF) model, seasonal and daily cycles of meteorological variables over the Yangtze River Delta (YRD) region are investigated. It is shown that all four PBL schemes overestimate 10-m wind speed and 2-m temperature, while underestimate relative humidity. Inter-comparisons among the different PBL schemes show that the MYNN scheme results in closer match of 2-m temperature and 10-m wind speed to surface observations in summer, while the MYJ scheme shows the smallest bias of 2-m temperature and relative humidity in winter. Compared to the observed PBL height obtained from a micro-pulse lidar system, the MYNN scheme exhibits lowest mean bias while the ACM2 scheme shows the highest correlation. It is also found that there is a varying degree of sensitivity of the PBL height in winter and summer, respectively; a best-performing PBL scheme should be chosen under different seasons to predict various meteorological conditions over complicated topography like the YRD region
The impact of planetary boundary layer parameterisation scheme over the Yangtze River Delta region, China, part I: seasonal and diurnal sensitivity studies.
The planetary boundary layer (PBL) is the main region for the exchange of matter, momentum and energy between land and atmosphere. The transport processes in the PBL determine the distribution of temperature, water vapour, wind speed and other physical quantities within the PBL and are very important for the simulation of the physical characteristics of the meteorology. Based on the two non-local closure PBL schemes (YSU, ACM2) and two local closure PBL schemes (MYJ, MYNN) in the Weather Research and Forecasting (WRF) model, seasonal and daily cycles of meteorological variables over the Yangtze River Delta (YRD) region are investigated. It is shown that all the four PBL schemes overestimate 10-m wind speed and 2-m temperature, while underestimate relative humidity. The MYJ scheme produces the largest biases on 10-m wind speed and the smallest biases on humidity, while the ACM2 scheme show WRF-simulated 2-m temperature and 10-m wind speed are closer to surface meteorological observations in summer. The ACM2 scheme performs well with daytime PBL height, the MYNN scheme performs the lowest mean bias of 0.04 km and the ACM2 scheme shows the highest correlation coefficient of 0.59 compared with observational data. It is found that there is a varying degree of sensitivity of the respective PBL in winter and summer and a best-performing PBL scheme should be chosen to predict various meteorological conditions under different seasons over a complicated region like the YRD
Responses of the field-aligned currents in the plasma sheet boundary layer to a geomagnetic storm
Geomagnetic storms can result in large magnetic field disturbances and intense currents in the magnetosphere and even on the ground. As an important medium of momentum and energy transport among the solar wind, magnetosphere, and ionosphere, field-aligned currents (FACs) can also be strengthened in storm times. This study shows the responses of FACs in the plasma sheet boundary layer (PSBL) observed by the Magnetospheric Multiscale (MMS) spacecraft in different phases of a large storm that lasted from May 27, 2017, to May 29, 2017. Most of the FACs were carried by electrons, and several FACs in the storm time also contained sufficient ion FACs. The FAC magnitudes were larger in the storm than in the quiet period, and those in the main phase were the strongest. In this case, the direction of the FACs in the main phase showed no preference for tailward or earthward, whereas the direction of the FACs in the recovery phase was mostly tailward. The results suggest that the FACs in the PSBL are closely related to the storm and could be driven by activities in the tail region, where the energy transported from the solar wind to the magnetosphere is stored and released as the storm is evolving. Thus, the FACs are an important medium of energy transport between the tail and the ionosphere, and the PSBL is a significant magnetosphere–ionosphere coupling region in the nightside
Arctic weather routing: a review of ship performance models and ice routing algorithms
With the accelerated melting of the Arctic sea ice, the opening of the Northeast Passage of the Arctic is becoming increasingly accessible. Nevertheless, the constantly changing natural environment of the Arctic and its multiple impacts on vessel navigation performance have resulted in a lack of confidence in the outcomes of polar automated route planning. This paper aims to evaluate the effectiveness of two distinct models by examining the advancements in two essential components of e-navigation, namely ship performance methods and ice routing algorithms. We also seek to provide an outlook on the future directions of model development. Furthermore, through comparative experiments, we have examined the existing research on ice path planning and pointed out promising research directions in future Arctic Weather Routing research
- …