8 research outputs found
CFDBench: A Comprehensive Benchmark for Machine Learning Methods in Fluid Dynamics
In recent years, applying deep learning to solve physics problems has
attracted much attention. Data-driven deep learning methods produce operators
that can learn solutions to the whole system of partial differential equations.
However, the existing methods are only evaluated on simple flow equations
(e.g., Burger's equation), and only consider the generalization ability on
different initial conditions. In this paper, we construct CFDBench, a benchmark
with four classic problems in computational fluid dynamics (CFD): lid-driven
cavity flow, laminar boundary layer flow in circular tubes, dam flows through
the steps, and periodic Karman vortex street. Each flow problem includes data
with different boundary conditions, fluid physical properties, and domain
geometry. Compared to existing datasets, the advantages of CFDBench are (1)
comprehensive. It contains common physical parameters such as velocity,
pressure, and cavity fraction. (2) realistic. It is very suitable for deep
learning solutions of fluid mechanics equations. (3) challenging. It has a
certain learning difficulty, prompting to find models with strong learning
ability. (4) standardized. CFDBench facilitates a comprehensive and fair
comparison of different deep learning methods for CFD. We make appropriate
modifications to popular deep neural networks to apply them to CFDBench and
enable the accommodation of more changing inputs. The evaluation on CFDBench
reveals some new shortcomings of existing works and we propose possible
directions for solving such problems.Comment: 33 pages, 11 figures, preprin
READIN: A Chinese Multi-Task Benchmark with Realistic and Diverse Input Noises
For many real-world applications, the user-generated inputs usually contain
various noises due to speech recognition errors caused by linguistic
variations1 or typographical errors (typos). Thus, it is crucial to test model
performance on data with realistic input noises to ensure robustness and
fairness. However, little study has been done to construct such benchmarks for
Chinese, where various language-specific input noises happen in the real world.
In order to fill this important gap, we construct READIN: a Chinese multi-task
benchmark with REalistic And Diverse Input Noises. READIN contains four diverse
tasks and requests annotators to re-enter the original test data with two
commonly used Chinese input methods: Pinyin input and speech input. We designed
our annotation pipeline to maximize diversity, for example by instructing the
annotators to use diverse input method editors (IMEs) for keyboard noises and
recruiting speakers from diverse dialectical groups for speech noises. We
experiment with a series of strong pretrained language models as well as robust
training methods, we find that these models often suffer significant
performance drops on READIN even with robustness methods like data
augmentation. As the first large-scale attempt in creating a benchmark with
noises geared towards user-generated inputs, we believe that READIN serves as
an important complement to existing Chinese NLP benchmarks. The source code and
dataset can be obtained from https://github.com/thunlp/READIN.Comment: Preprin
Sub-Character Tokenization for Chinese Pretrained Language Models
Tokenization is fundamental to pretrained language models (PLMs). Existing
tokenization methods for Chinese PLMs typically treat each character as an
indivisible token. However, they ignore the unique feature of the Chinese
writing system where additional linguistic information exists below the
character level, i.e., at the sub-character level. To utilize such information,
we propose sub-character (SubChar for short) tokenization. Specifically, we
first encode the input text by converting each Chinese character into a short
sequence based on its glyph or pronunciation, and then construct the vocabulary
based on the encoded text with sub-word tokenization. Experimental results show
that SubChar tokenizers have two main advantages over existing tokenizers: 1)
They can tokenize inputs into much shorter sequences, thus improving the
computational efficiency. 2) Pronunciation-based SubChar tokenizers can encode
Chinese homophones into the same transliteration sequences and produce the same
tokenization output, hence being robust to all homophone typos. At the same
time, models trained with SubChar tokenizers perform competitively on
downstream tasks. We release our code at
https://github.com/thunlp/SubCharTokenization to facilitate future work.Comment: This draft supersedes the previous version named "SHUOWEN-JIEZI:
Linguistically Informed Tokenizers For Chinese Language Model Pretraining
Comprehensive Monitoring and Benefit Evaluation of Converting Farmlands into Forests and Grasslands in China
Conversion of farmlands to forests and grasslands (CFFG) is one of the major ecological projects with the largest investment, strongest policy, widest coverage and highest degree of participation in China, and even in the world. In order to scientifically evaluate the benefits and dynamic changes, better serve the decision-making, consolidate the achievements and promote the high-quality development of this project, it is of great significance to organize the monitoring and evaluation of its benefits. On the basis of reviewing and summarizing the monitoring and evaluation history of the benefits, this study established an indicator system for comprehensive monitoring and evaluation, composed of three components of benefits, 10 categories and 48 indicators, including 23 indicators of ecological benefits, 11 indicators of economic benefits and 14 indicators of social benefits. These methods of monitoring and evaluation are applied to the systematic and full coverage monitoring and evaluation of the national project of CFFG for the first time. There are four aspects of the innovation of this research: First, it is the first time that a comprehensive ecological, economic and social benefit evaluation indicator system has been established. Second, it is the first time that quantitative evaluation methods have been established. Third, it is the first comprehensive quantitative assessment of the CFFG project. Fourth, this is a full-scale evaluation of the project for the first time. The evaluation results show that the total value of the three benefits from the CFFG project is 2405.046 billion Yuan (354.4129 billion US)·y−1, of which the ecological benefit is 1416.864 billion Yuan (208.7922 billion US)·y−1, the economic benefit is 255.486 billion Yuan (37.649 billion US)·y−1 and the social benefit is 732.696 billion Yuan (107.9717 billion US)·y−1, accounting for 58.92%, 10.62% and 30.46%, respectively, of the total benefits. Our results provide detailed evaluation of the achievement and benefits of the CFFG project
A Multi-objective Planning Model for ES Charging Stations Considering the Power Quality and Fossil Energy Consumption
Self-Stabilized Precipitation Polymerization and Its Application
An effective, value-added use of the large amounts of olefinic compounds produced in the processing of petroleum, aside from ethylene and propylene, has been a long outstanding challenge. Here, we developed a novel heterogeneous polymerization method, beyond emulsion/dispersion/suspension, termed self-stabilized precipitation (2SP) polymerization, which involves the nucleation and growth of nanoparticles (NPs) of a well-defined size without the use of any stabilizers and multifunctional monomers (crosslinker). This technique leads to two revolutionary advances: (1) the generation of functional copolymer particles from single olefinic monomer or complex olefinic mixtures (including C4/C5/C9 fractions) in large quantities, which open a new way to transform huge amount of unused olefinic compounds in C4/C5/C9 fractions into valuable copolymers, and (2) the resultant polymeric NPs possess a self-limiting size and narrow size distribution, therefore being one of the most simple, efficient, and green strategies to produce uniform, size-tunable, and functional polymeric nanoparticles. More importantly, the separation of the NPs from the reaction medium is simple and the supernatant liquid can be reused; hence this new synthetic strategy has great potential for industrial production
Novel DC Bias Suppression Device Based on Adjustable Parallel Resistances
For lack of the appropriate global distribution of dc currents, the conventional suppression method to suppress dc bias based on capacitive dc blocking device (BD) redirects current to the ground as much as possible, which predisposes to the exceeding neutral current of other transformers in the regional power grid and leads to the contradiction between the power grid corporation and other public enterprises. Therefore, this paper presents a flexible suppression method for dc bias based on a novel dc-bias suppression device. First, a current balancing device (CBD) based on adjustable parallel resistances is designed. The mathematical model for global optimal switching of CBDs is established by a field-circuit coupling method with the equivalent resistance network of an ac system along with the location of substations and ground electrodes. The optimal switching scheme to minimize the global maximum dc current is obtained by gravitational search algorithm. Based on the aforementioned work, we propose a suppression strategy considering electro-corrosion of metal pipelines. The effectiveness and superiority of suppression methods are verified by comparative case studies of the Yichang power grid