14,168 research outputs found
2kenize: Tying Subword Sequences for Chinese Script Conversion
Simplified Chinese to Traditional Chinese character conversion is a common
preprocessing step in Chinese NLP. Despite this, current approaches have poor
performance because they do not take into account that a simplified Chinese
character can correspond to multiple traditional characters. Here, we propose a
model that can disambiguate between mappings and convert between the two
scripts. The model is based on subword segmentation, two language models, as
well as a method for mapping between subword sequences. We further construct
benchmark datasets for topic classification and script conversion. Our proposed
method outperforms previous Chinese Character conversion approaches by 6 points
in accuracy. These results are further confirmed in a downstream application,
where 2kenize is used to convert pretraining dataset for topic classification.
An error analysis reveals that our method's particular strengths are in dealing
with code-mixing and named entities.Comment: Accepted to ACL 202
Three-dimensional structure of the Upper Scorpius association with the Gaia first data release
Using new proper motion data from recently published catalogs, we revisit the
membership of previously identified members of the Upper Scorpius association.
We confirmed 750 of them as cluster members based on the convergent point
method, compute their kinematic parallaxes and combined them with Gaia
parallaxes to investigate the 3D structure and geometry of the association
using a robust covariance method. We find a mean distance of ~pc
and show that the morphology of the association defined by the brightest (and
most massive) stars yields a prolate ellipsoid with dimensions of
~pc, while the faintest cluster members define a more
elongated structure with dimensions of ~pc. We
suggest that the different properties of both populations is an imprint of the
star formation history in this region.Comment: 5 pages, 1 figure, MNRAS letters (in press
Long-lived non-thermal states realized by atom losses in one-dimensional quasi-condensates
We investigate the cooling produced by a loss process non selective in energy
on a one-dimensional (1D) Bose gas with repulsive contact interactions in the
quasi-condensate regime. By performing nonlinear classical field calculations
for a homogeneous system, we show that the gas reaches a non-thermal state
where different modes have acquired different temperatures. After losses have
been turned off, this state is robust with respect to the nonlinear dynamics,
described by the Gross-Pitaevskii equation. We argue that the integrability of
the Gross-Pitaevskii equation is linked to the existence of such long-lived
non-thermal states, and illustrate this by showing that such states are not
supported within a non-integrable model of two coupled 1D gases of different
masses. We go beyond a classical field analysis, taking into account the
quantum noise introduced by the discreteness of losses, and show that the
non-thermal state is still produced and its non-thermal character is even
enhanced. Finally, we extend the discussion to gases trapped in a harmonic
potential and present experimental observations of a long-lived non-thermal
state within a trapped 1D quasi-condensate following an atom loss process
Foaming properties of protein/pectin electrostatic complexes and foam structure at the nanoscale
The foaming properties, foaming capacity and foam stability, of soluble
complexes of pectin and a globular protein, napin, have been investigated with
a "Foamscan" apparatus. Complementary, we also used SANS with a recent method
consisting in an analogy between the SANS by foams and the neutron reflectivity
of films to measure in situ film thickness of foams. The effect of ionic
strength, of protein concentration and of charge density of the pectin has been
analysed. Whereas the foam stability is improved for samples containing soluble
complexes, no effect has been noticed on the foam film thickness, which is
almost around 315 {\AA} whatever the samples. These results let us specify the
role of each specie in the mixture: free proteins contribute to the foaming
capacity, provided the initial free protein content in the bulk is sufficient
to allow the foam formation, and soluble complexes slow down the drainage by
their presence in the Plateau borders, which finally results in the
stabilisation of foams
- …