Search CORE

79 research outputs found

Recent Advances on Sorting Methods of High-Throughput Droplet-Based Microfluidics in Enzyme Directed Evolution

Author: Fu Xiaozhi
Meng Fanda
Sun Xiaomeng
Xu Qiang
Zhang Yueying
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2021
Field of study

Droplet-based microfluidics has been widely applied in enzyme directed evolution (DE), in either cell or cell-free system, due to its low cost and high throughput. As the isolation principles are based on the labeled or label-free characteristics in the droplets, sorting method contributes mostly to the efficiency of the whole system. Fluorescence-activated droplet sorting (FADS) is the mostly applied labeled method but faces challenges of target enzyme scope. Label-free sorting methods show potential to greatly broaden the microfluidic application range. Here, we review the developments of droplet sorting methods through a comprehensive literature survey, including labeled detections [FADS and absorbance-activated droplet sorting (AADS)] and label-free detections [electrochemical-based droplet sorting (ECDS), mass-activated droplet sorting (MADS), Raman-activated droplet sorting (RADS), and nuclear magnetic resonance-based droplet sorting (NMR-DS)]. We highlight recent cases in the last 5 years in which novel enzymes or highly efficient variants are generated by microfluidic DE. In addition, the advantages and challenges of different sorting methods are briefly discussed to provide an outlook for future applications in enzyme DE

Chalmers Research

Placement Distance of Exit Advance Guide Sign on an Eight-Lane Expressway Considering Lane Changing Behaviour in China

Author: Han Yanwen
Li Rui
Ma Yanfeng
Su Xiaozhi
Sun Xujiao
Publication venue: 'Faculty of Transport and Traffic Sciences'
Publication date: 01/01/2022
Field of study

The reasonable placement of the advance guide signs (AGSs) is important in improving driving efficiency and safety when exiting an expressway. By analysing the lane-changing process when approaching an exit on new two-way eight-lane expressways, we modified the tradi-tional AGS model lane-change distance formula. To this end, a field experiment was designed to explore the lane-change traversal time at the free flow condition (LOS 1). Considering the limitations of the experimental equip-ment, lane change distance at the worst levels of service was explored using VISSIM simulation. The results show that the eight-lane changing distance based on modified theoretical calculations, revealed a minor difference with VISSIM simulation in free flow condition. Further-more, placement distance at the worst levels of service are discussed. Then placement distance of all-level AGSs is recommended to be 3 km, 2 km, 1.2 km, and 0.8 km, considering the driver\u27s short-term memory attenuation calculation formula. Determining the two-way eight-lane AGS placement distance from the perspective of LOS can provide a basis on which to supplement the existing stan-dards and references for the AGS placement distance af-ter the expressway expansion in China

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

READIN: A Chinese Multi-Task Benchmark with Realistic and Diverse Input Noises

Author: Chen Yingfa
Liu Zhiyuan
Si Chenglei
Sun Maosong
Wang Xiaozhi
Zhang Zhengyan
Publication venue
Publication date: 14/02/2023
Field of study

For many real-world applications, the user-generated inputs usually contain various noises due to speech recognition errors caused by linguistic variations1 or typographical errors (typos). Thus, it is crucial to test model performance on data with realistic input noises to ensure robustness and fairness. However, little study has been done to construct such benchmarks for Chinese, where various language-specific input noises happen in the real world. In order to fill this important gap, we construct READIN: a Chinese multi-task benchmark with REalistic And Diverse Input Noises. READIN contains four diverse tasks and requests annotators to re-enter the original test data with two commonly used Chinese input methods: Pinyin input and speech input. We designed our annotation pipeline to maximize diversity, for example by instructing the annotators to use diverse input method editors (IMEs) for keyboard noises and recruiting speakers from diverse dialectical groups for speech noises. We experiment with a series of strong pretrained language models as well as robust training methods, we find that these models often suffer significant performance drops on READIN even with robustness methods like data augmentation. As the first large-scale attempt in creating a benchmark with noises geared towards user-generated inputs, we believe that READIN serves as an important complement to existing Chinese NLP benchmarks. The source code and dataset can be obtained from https://github.com/thunlp/READIN.Comment: Preprin

arXiv.org e-Print Archive

Learning to Fuse Monocular and Multi-view Cues for Multi-frame Depth Estimation in Dynamic Scenes

Author: Chen Hao
Chen Xiaozhi
Gong Dong
Li Rui
Sun Jinqiu
Wang Kaixuan
Yin Wei
Zhang Yanning
Zhu Yu
Publication venue
Publication date: 18/04/2023
Field of study

Multi-frame depth estimation generally achieves high accuracy relying on the multi-view geometric consistency. When applied in dynamic scenes, e.g., autonomous driving, this consistency is usually violated in the dynamic areas, leading to corrupted estimations. Many multi-frame methods handle dynamic areas by identifying them with explicit masks and compensating the multi-view cues with monocular cues represented as local monocular depth or features. The improvements are limited due to the uncontrolled quality of the masks and the underutilized benefits of the fusion of the two types of cues. In this paper, we propose a novel method to learn to fuse the multi-view and monocular cues encoded as volumes without needing the heuristically crafted masks. As unveiled in our analyses, the multi-view cues capture more accurate geometric information in static areas, and the monocular cues capture more useful contexts in dynamic areas. To let the geometric perception learned from multi-view cues in static areas propagate to the monocular representation in dynamic areas and let monocular cues enhance the representation of multi-view cost volume, we propose a cross-cue fusion (CCF) module, which includes the cross-cue attention (CCA) to encode the spatially non-local relative intra-relations from each source to enhance the representation of the other. Experiments on real-world datasets prove the significant effectiveness and generalization ability of the proposed method.Comment: Accepted by CVPR 2023. Code and models are available at: https://github.com/ruili3/dynamic-multiframe-dept

arXiv.org e-Print Archive

Sub-Character Tokenization for Chinese Pretrained Language Models

Author: Chen Yingfa
Liu Qun
Liu Zhiyuan
Qi Fanchao
Si Chenglei
Sun Maosong
Wang Xiaozhi
Wang Yasheng
Zhang Zhengyan
Publication venue
Publication date: 22/12/2021
Field of study

Tokenization is fundamental to pretrained language models (PLMs). Existing tokenization methods for Chinese PLMs typically treat each character as an indivisible token. However, they ignore the unique feature of the Chinese writing system where additional linguistic information exists below the character level, i.e., at the sub-character level. To utilize such information, we propose sub-character (SubChar for short) tokenization. Specifically, we first encode the input text by converting each Chinese character into a short sequence based on its glyph or pronunciation, and then construct the vocabulary based on the encoded text with sub-word tokenization. Experimental results show that SubChar tokenizers have two main advantages over existing tokenizers: 1) They can tokenize inputs into much shorter sequences, thus improving the computational efficiency. 2) Pronunciation-based SubChar tokenizers can encode Chinese homophones into the same transliteration sequences and produce the same tokenization output, hence being robust to all homophone typos. At the same time, models trained with SubChar tokenizers perform competitively on downstream tasks. We release our code at https://github.com/thunlp/SubCharTokenization to facilitate future work.Comment: This draft supersedes the previous version named "SHUOWEN-JIEZI: Linguistically Informed Tokenizers For Chinese Language Model Pretraining

arXiv.org e-Print Archive

Directory of Open Access Journals

Emergent Modularity in Pre-trained Transformers

Author: Han Xu
Lin Yankai
Liu Zhiyuan
Sun Maosong
Wang Xiaozhi
Xiao Chaojun
Xie Ruobing
Zeng Zhiyuan
Zhang Zhengyan
Zhou Jie
Publication venue
Publication date: 28/05/2023
Field of study

This work examines the presence of modularity in pre-trained Transformers, a feature commonly found in human brains and thought to be vital for general intelligence. In analogy to human brains, we consider two main characteristics of modularity: (1) functional specialization of neurons: we evaluate whether each neuron is mainly specialized in a certain function, and find that the answer is yes. (2) function-based neuron grouping: we explore finding a structure that groups neurons into modules by function, and each module works for its corresponding function. Given the enormous amount of possible structures, we focus on Mixture-of-Experts as a promising candidate, which partitions neurons into experts and usually activates different experts for different inputs. Experimental results show that there are functional experts, where clustered are the neurons specialized in a certain function. Moreover, perturbing the activations of functional experts significantly affects the corresponding function. Finally, we study how modularity emerges during pre-training, and find that the modular structure is stabilized at the early stage, which is faster than neuron stabilization. It suggests that Transformers first construct the modular structure and then learn fine-grained neuron functions. Our code and data are available at https://github.com/THUNLP/modularity-analysis.Comment: Findings of ACL 202

arXiv.org e-Print Archive