79 research outputs found
Recent Advances on Sorting Methods of High-Throughput Droplet-Based Microfluidics in Enzyme Directed Evolution
Droplet-based microfluidics has been widely applied in enzyme directed evolution (DE), in either cell or cell-free system, due to its low cost and high throughput. As the isolation principles are based on the labeled or label-free characteristics in the droplets, sorting method contributes mostly to the efficiency of the whole system. Fluorescence-activated droplet sorting (FADS) is the mostly applied labeled method but faces challenges of target enzyme scope. Label-free sorting methods show potential to greatly broaden the microfluidic application range. Here, we review the developments of droplet sorting methods through a comprehensive literature survey, including labeled detections [FADS and absorbance-activated droplet sorting (AADS)] and label-free detections [electrochemical-based droplet sorting (ECDS), mass-activated droplet sorting (MADS), Raman-activated droplet sorting (RADS), and nuclear magnetic resonance-based droplet sorting (NMR-DS)]. We highlight recent cases in the last 5 years in which novel enzymes or highly efficient variants are generated by microfluidic DE. In addition, the advantages and challenges of different sorting methods are briefly discussed to provide an outlook for future applications in enzyme DE
Placement Distance of Exit Advance Guide Sign on an Eight-Lane Expressway Considering Lane Changing Behaviour in China
The reasonable placement of the advance guide signs (AGSs) is important in improving driving efficiency and safety when exiting an expressway. By analysing the lane-changing process when approaching an exit on new two-way eight-lane expressways, we modified the tradi-tional AGS model lane-change distance formula. To this end, a field experiment was designed to explore the lane-change traversal time at the free flow condition (LOS 1). Considering the limitations of the experimental equip-ment, lane change distance at the worst levels of service was explored using VISSIM simulation. The results show that the eight-lane changing distance based on modified theoretical calculations, revealed a minor difference with VISSIM simulation in free flow condition. Further-more, placement distance at the worst levels of service are discussed. Then placement distance of all-level AGSs is recommended to be 3 km, 2 km, 1.2 km, and 0.8 km, considering the driver\u27s short-term memory attenuation calculation formula. Determining the two-way eight-lane AGS placement distance from the perspective of LOS can provide a basis on which to supplement the existing stan-dards and references for the AGS placement distance af-ter the expressway expansion in China
READIN: A Chinese Multi-Task Benchmark with Realistic and Diverse Input Noises
For many real-world applications, the user-generated inputs usually contain
various noises due to speech recognition errors caused by linguistic
variations1 or typographical errors (typos). Thus, it is crucial to test model
performance on data with realistic input noises to ensure robustness and
fairness. However, little study has been done to construct such benchmarks for
Chinese, where various language-specific input noises happen in the real world.
In order to fill this important gap, we construct READIN: a Chinese multi-task
benchmark with REalistic And Diverse Input Noises. READIN contains four diverse
tasks and requests annotators to re-enter the original test data with two
commonly used Chinese input methods: Pinyin input and speech input. We designed
our annotation pipeline to maximize diversity, for example by instructing the
annotators to use diverse input method editors (IMEs) for keyboard noises and
recruiting speakers from diverse dialectical groups for speech noises. We
experiment with a series of strong pretrained language models as well as robust
training methods, we find that these models often suffer significant
performance drops on READIN even with robustness methods like data
augmentation. As the first large-scale attempt in creating a benchmark with
noises geared towards user-generated inputs, we believe that READIN serves as
an important complement to existing Chinese NLP benchmarks. The source code and
dataset can be obtained from https://github.com/thunlp/READIN.Comment: Preprin
Learning to Fuse Monocular and Multi-view Cues for Multi-frame Depth Estimation in Dynamic Scenes
Multi-frame depth estimation generally achieves high accuracy relying on the
multi-view geometric consistency. When applied in dynamic scenes, e.g.,
autonomous driving, this consistency is usually violated in the dynamic areas,
leading to corrupted estimations. Many multi-frame methods handle dynamic areas
by identifying them with explicit masks and compensating the multi-view cues
with monocular cues represented as local monocular depth or features. The
improvements are limited due to the uncontrolled quality of the masks and the
underutilized benefits of the fusion of the two types of cues. In this paper,
we propose a novel method to learn to fuse the multi-view and monocular cues
encoded as volumes without needing the heuristically crafted masks. As unveiled
in our analyses, the multi-view cues capture more accurate geometric
information in static areas, and the monocular cues capture more useful
contexts in dynamic areas. To let the geometric perception learned from
multi-view cues in static areas propagate to the monocular representation in
dynamic areas and let monocular cues enhance the representation of multi-view
cost volume, we propose a cross-cue fusion (CCF) module, which includes the
cross-cue attention (CCA) to encode the spatially non-local relative
intra-relations from each source to enhance the representation of the other.
Experiments on real-world datasets prove the significant effectiveness and
generalization ability of the proposed method.Comment: Accepted by CVPR 2023. Code and models are available at:
https://github.com/ruili3/dynamic-multiframe-dept
Sub-Character Tokenization for Chinese Pretrained Language Models
Tokenization is fundamental to pretrained language models (PLMs). Existing
tokenization methods for Chinese PLMs typically treat each character as an
indivisible token. However, they ignore the unique feature of the Chinese
writing system where additional linguistic information exists below the
character level, i.e., at the sub-character level. To utilize such information,
we propose sub-character (SubChar for short) tokenization. Specifically, we
first encode the input text by converting each Chinese character into a short
sequence based on its glyph or pronunciation, and then construct the vocabulary
based on the encoded text with sub-word tokenization. Experimental results show
that SubChar tokenizers have two main advantages over existing tokenizers: 1)
They can tokenize inputs into much shorter sequences, thus improving the
computational efficiency. 2) Pronunciation-based SubChar tokenizers can encode
Chinese homophones into the same transliteration sequences and produce the same
tokenization output, hence being robust to all homophone typos. At the same
time, models trained with SubChar tokenizers perform competitively on
downstream tasks. We release our code at
https://github.com/thunlp/SubCharTokenization to facilitate future work.Comment: This draft supersedes the previous version named "SHUOWEN-JIEZI:
Linguistically Informed Tokenizers For Chinese Language Model Pretraining
Emergent Modularity in Pre-trained Transformers
This work examines the presence of modularity in pre-trained Transformers, a
feature commonly found in human brains and thought to be vital for general
intelligence. In analogy to human brains, we consider two main characteristics
of modularity: (1) functional specialization of neurons: we evaluate whether
each neuron is mainly specialized in a certain function, and find that the
answer is yes. (2) function-based neuron grouping: we explore finding a
structure that groups neurons into modules by function, and each module works
for its corresponding function. Given the enormous amount of possible
structures, we focus on Mixture-of-Experts as a promising candidate, which
partitions neurons into experts and usually activates different experts for
different inputs. Experimental results show that there are functional experts,
where clustered are the neurons specialized in a certain function. Moreover,
perturbing the activations of functional experts significantly affects the
corresponding function. Finally, we study how modularity emerges during
pre-training, and find that the modular structure is stabilized at the early
stage, which is faster than neuron stabilization. It suggests that Transformers
first construct the modular structure and then learn fine-grained neuron
functions. Our code and data are available at
https://github.com/THUNLP/modularity-analysis.Comment: Findings of ACL 202
- …