8 research outputs found

    CFDBench: A Comprehensive Benchmark for Machine Learning Methods in Fluid Dynamics

    Full text link
    In recent years, applying deep learning to solve physics problems has attracted much attention. Data-driven deep learning methods produce operators that can learn solutions to the whole system of partial differential equations. However, the existing methods are only evaluated on simple flow equations (e.g., Burger's equation), and only consider the generalization ability on different initial conditions. In this paper, we construct CFDBench, a benchmark with four classic problems in computational fluid dynamics (CFD): lid-driven cavity flow, laminar boundary layer flow in circular tubes, dam flows through the steps, and periodic Karman vortex street. Each flow problem includes data with different boundary conditions, fluid physical properties, and domain geometry. Compared to existing datasets, the advantages of CFDBench are (1) comprehensive. It contains common physical parameters such as velocity, pressure, and cavity fraction. (2) realistic. It is very suitable for deep learning solutions of fluid mechanics equations. (3) challenging. It has a certain learning difficulty, prompting to find models with strong learning ability. (4) standardized. CFDBench facilitates a comprehensive and fair comparison of different deep learning methods for CFD. We make appropriate modifications to popular deep neural networks to apply them to CFDBench and enable the accommodation of more changing inputs. The evaluation on CFDBench reveals some new shortcomings of existing works and we propose possible directions for solving such problems.Comment: 33 pages, 11 figures, preprin

    READIN: A Chinese Multi-Task Benchmark with Realistic and Diverse Input Noises

    Full text link
    For many real-world applications, the user-generated inputs usually contain various noises due to speech recognition errors caused by linguistic variations1 or typographical errors (typos). Thus, it is crucial to test model performance on data with realistic input noises to ensure robustness and fairness. However, little study has been done to construct such benchmarks for Chinese, where various language-specific input noises happen in the real world. In order to fill this important gap, we construct READIN: a Chinese multi-task benchmark with REalistic And Diverse Input Noises. READIN contains four diverse tasks and requests annotators to re-enter the original test data with two commonly used Chinese input methods: Pinyin input and speech input. We designed our annotation pipeline to maximize diversity, for example by instructing the annotators to use diverse input method editors (IMEs) for keyboard noises and recruiting speakers from diverse dialectical groups for speech noises. We experiment with a series of strong pretrained language models as well as robust training methods, we find that these models often suffer significant performance drops on READIN even with robustness methods like data augmentation. As the first large-scale attempt in creating a benchmark with noises geared towards user-generated inputs, we believe that READIN serves as an important complement to existing Chinese NLP benchmarks. The source code and dataset can be obtained from https://github.com/thunlp/READIN.Comment: Preprin

    Sub-Character Tokenization for Chinese Pretrained Language Models

    Full text link
    Tokenization is fundamental to pretrained language models (PLMs). Existing tokenization methods for Chinese PLMs typically treat each character as an indivisible token. However, they ignore the unique feature of the Chinese writing system where additional linguistic information exists below the character level, i.e., at the sub-character level. To utilize such information, we propose sub-character (SubChar for short) tokenization. Specifically, we first encode the input text by converting each Chinese character into a short sequence based on its glyph or pronunciation, and then construct the vocabulary based on the encoded text with sub-word tokenization. Experimental results show that SubChar tokenizers have two main advantages over existing tokenizers: 1) They can tokenize inputs into much shorter sequences, thus improving the computational efficiency. 2) Pronunciation-based SubChar tokenizers can encode Chinese homophones into the same transliteration sequences and produce the same tokenization output, hence being robust to all homophone typos. At the same time, models trained with SubChar tokenizers perform competitively on downstream tasks. We release our code at https://github.com/thunlp/SubCharTokenization to facilitate future work.Comment: This draft supersedes the previous version named "SHUOWEN-JIEZI: Linguistically Informed Tokenizers For Chinese Language Model Pretraining

    Comprehensive Monitoring and Benefit Evaluation of Converting Farmlands into Forests and Grasslands in China

    No full text
    Conversion of farmlands to forests and grasslands (CFFG) is one of the major ecological projects with the largest investment, strongest policy, widest coverage and highest degree of participation in China, and even in the world. In order to scientifically evaluate the benefits and dynamic changes, better serve the decision-making, consolidate the achievements and promote the high-quality development of this project, it is of great significance to organize the monitoring and evaluation of its benefits. On the basis of reviewing and summarizing the monitoring and evaluation history of the benefits, this study established an indicator system for comprehensive monitoring and evaluation, composed of three components of benefits, 10 categories and 48 indicators, including 23 indicators of ecological benefits, 11 indicators of economic benefits and 14 indicators of social benefits. These methods of monitoring and evaluation are applied to the systematic and full coverage monitoring and evaluation of the national project of CFFG for the first time. There are four aspects of the innovation of this research: First, it is the first time that a comprehensive ecological, economic and social benefit evaluation indicator system has been established. Second, it is the first time that quantitative evaluation methods have been established. Third, it is the first comprehensive quantitative assessment of the CFFG project. Fourth, this is a full-scale evaluation of the project for the first time. The evaluation results show that the total value of the three benefits from the CFFG project is 2405.046 billion Yuan (354.4129 billion US)·y−1, of which the ecological benefit is 1416.864 billion Yuan (208.7922 billion US)·y−1, the economic benefit is 255.486 billion Yuan (37.649 billion US)·y−1 and the social benefit is 732.696 billion Yuan (107.9717 billion US)·y−1, accounting for 58.92%, 10.62% and 30.46%, respectively, of the total benefits. Our results provide detailed evaluation of the achievement and benefits of the CFFG project

    Self-Stabilized Precipitation Polymerization and Its Application

    No full text
    An effective, value-added use of the large amounts of olefinic compounds produced in the processing of petroleum, aside from ethylene and propylene, has been a long outstanding challenge. Here, we developed a novel heterogeneous polymerization method, beyond emulsion/dispersion/suspension, termed self-stabilized precipitation (2SP) polymerization, which involves the nucleation and growth of nanoparticles (NPs) of a well-defined size without the use of any stabilizers and multifunctional monomers (crosslinker). This technique leads to two revolutionary advances: (1) the generation of functional copolymer particles from single olefinic monomer or complex olefinic mixtures (including C4/C5/C9 fractions) in large quantities, which open a new way to transform huge amount of unused olefinic compounds in C4/C5/C9 fractions into valuable copolymers, and (2) the resultant polymeric NPs possess a self-limiting size and narrow size distribution, therefore being one of the most simple, efficient, and green strategies to produce uniform, size-tunable, and functional polymeric nanoparticles. More importantly, the separation of the NPs from the reaction medium is simple and the supernatant liquid can be reused; hence this new synthetic strategy has great potential for industrial production

    Novel DC Bias Suppression Device Based on Adjustable Parallel Resistances

    No full text
    For lack of the appropriate global distribution of dc currents, the conventional suppression method to suppress dc bias based on capacitive dc blocking device (BD) redirects current to the ground as much as possible, which predisposes to the exceeding neutral current of other transformers in the regional power grid and leads to the contradiction between the power grid corporation and other public enterprises. Therefore, this paper presents a flexible suppression method for dc bias based on a novel dc-bias suppression device. First, a current balancing device (CBD) based on adjustable parallel resistances is designed. The mathematical model for global optimal switching of CBDs is established by a field-circuit coupling method with the equivalent resistance network of an ac system along with the location of substations and ground electrodes. The optimal switching scheme to minimize the global maximum dc current is obtained by gravitational search algorithm. Based on the aforementioned work, we propose a suppression strategy considering electro-corrosion of metal pipelines. The effectiveness and superiority of suppression methods are verified by comparative case studies of the Yichang power grid
    corecore