13,867 research outputs found

    2kenize: Tying Subword Sequences for Chinese Script Conversion

    Full text link
    Simplified Chinese to Traditional Chinese character conversion is a common preprocessing step in Chinese NLP. Despite this, current approaches have poor performance because they do not take into account that a simplified Chinese character can correspond to multiple traditional characters. Here, we propose a model that can disambiguate between mappings and convert between the two scripts. The model is based on subword segmentation, two language models, as well as a method for mapping between subword sequences. We further construct benchmark datasets for topic classification and script conversion. Our proposed method outperforms previous Chinese Character conversion approaches by 6 points in accuracy. These results are further confirmed in a downstream application, where 2kenize is used to convert pretraining dataset for topic classification. An error analysis reveals that our method's particular strengths are in dealing with code-mixing and named entities.Comment: Accepted to ACL 202

    Three-dimensional structure of the Upper Scorpius association with the Gaia first data release

    Full text link
    Using new proper motion data from recently published catalogs, we revisit the membership of previously identified members of the Upper Scorpius association. We confirmed 750 of them as cluster members based on the convergent point method, compute their kinematic parallaxes and combined them with Gaia parallaxes to investigate the 3D structure and geometry of the association using a robust covariance method. We find a mean distance of 146±3±6146\pm 3\pm 6~pc and show that the morphology of the association defined by the brightest (and most massive) stars yields a prolate ellipsoid with dimensions of 74×38×3274\times38\times32~pc3^{3}, while the faintest cluster members define a more elongated structure with dimensions of 98×24×1898\times24\times18~pc3^{3}. We suggest that the different properties of both populations is an imprint of the star formation history in this region.Comment: 5 pages, 1 figure, MNRAS letters (in press

    Long-lived non-thermal states realized by atom losses in one-dimensional quasi-condensates

    Get PDF
    We investigate the cooling produced by a loss process non selective in energy on a one-dimensional (1D) Bose gas with repulsive contact interactions in the quasi-condensate regime. By performing nonlinear classical field calculations for a homogeneous system, we show that the gas reaches a non-thermal state where different modes have acquired different temperatures. After losses have been turned off, this state is robust with respect to the nonlinear dynamics, described by the Gross-Pitaevskii equation. We argue that the integrability of the Gross-Pitaevskii equation is linked to the existence of such long-lived non-thermal states, and illustrate this by showing that such states are not supported within a non-integrable model of two coupled 1D gases of different masses. We go beyond a classical field analysis, taking into account the quantum noise introduced by the discreteness of losses, and show that the non-thermal state is still produced and its non-thermal character is even enhanced. Finally, we extend the discussion to gases trapped in a harmonic potential and present experimental observations of a long-lived non-thermal state within a trapped 1D quasi-condensate following an atom loss process

    Foaming properties of protein/pectin electrostatic complexes and foam structure at the nanoscale

    Get PDF
    The foaming properties, foaming capacity and foam stability, of soluble complexes of pectin and a globular protein, napin, have been investigated with a "Foamscan" apparatus. Complementary, we also used SANS with a recent method consisting in an analogy between the SANS by foams and the neutron reflectivity of films to measure in situ film thickness of foams. The effect of ionic strength, of protein concentration and of charge density of the pectin has been analysed. Whereas the foam stability is improved for samples containing soluble complexes, no effect has been noticed on the foam film thickness, which is almost around 315 {\AA} whatever the samples. These results let us specify the role of each specie in the mixture: free proteins contribute to the foaming capacity, provided the initial free protein content in the bulk is sufficient to allow the foam formation, and soluble complexes slow down the drainage by their presence in the Plateau borders, which finally results in the stabilisation of foams
    corecore