173 research outputs found

    KSZTAŁCENIE TŁUMACZY USTNYCH I PISEMNYCH W CHINACH: STAN OBECNY, WYZWANIA I ROZWIĄZANIA

    Get PDF
    Based on the Guideline for MTI Training Program established by China National Committee for MTI (Master of Translation and Interpreting) Education, this research makes a successive survey on the status quo of master education of legal translators and interpreters from 2014-2016 to trace the changes and problems revealed in the five major universities of political science and law in China. By comparing and analyzing the information and facts collected on the training target, curriculum setting, teaching staff, platform construction, practical training as well as the employment in those five universities, the authors sort out the diversified advantages and features of the five law schools, and probe into the existing problems and difficulties in common. On the grounds of the survey and interview conducted by the authors in the recent years, the authors put forward the solutions and suggestions on the improvement and future development of Chinese MTI education.Autorzy, w oparciu o Przewodnik Programu Kształcenia Tłumaczy Pisemnych i Ustnych ogłoszony przez Chiński Państwowy Komitet Kształcenia Tłumaczy Pisemnych i Ustnych, badają status studiów magisterskich w Chinach w zakresie kształcenia tłumaczy prawniczych w latach 2014-2016. Celem jest wychwycenie zmian i problemów, jakie zaistniały w ramach przedmiotu badania na pięciu największych chińskich uniwersytetach. Analizie poddano informacje i dane rzeczywiste zgromadzone na zajęciach kursowych a zawarte w programach studiów, dotyczące kadry, struktury platformy, zajęć praktycznych oraz informacje dotyczące zatrudnienia w badanych uniwersytetach. Autorzy artykułu skategoryzowali ww. informacje według zalet i charakterystyki określonych pięciu wydziałów prawa i podjęli próbę zdefiniowania ich wspólnych problemów. W oparciu o przeprowadzone badania autorzy artykułu wskazują określone sugestie, możliwości rozwiązania danych problemów oraz potencjalne ulepszenia, które sprzyjać będą przyszłemu rozwojowi kształcenia tłumaczy prawnych w Chinach

    Improving BERT with Hybrid Pooling Network and Drop Mask

    Full text link
    Transformer-based pre-trained language models, such as BERT, achieve great success in various natural language understanding tasks. Prior research found that BERT captures a rich hierarchy of linguistic information at different layers. However, the vanilla BERT uses the same self-attention mechanism for each layer to model the different contextual features. In this paper, we propose a HybridBERT model which combines self-attention and pooling networks to encode different contextual features in each layer. Additionally, we propose a simple DropMask method to address the mismatch between pre-training and fine-tuning caused by excessive use of special mask tokens during Masked Language Modeling pre-training. Experiments show that HybridBERT outperforms BERT in pre-training with lower loss, faster training speed (8% relative), lower memory cost (13% relative), and also in transfer learning with 1.5% relative higher accuracies on downstream tasks. Additionally, DropMask improves accuracies of BERT on downstream tasks across various masking rates.Comment: 7 pages, 2 figure

    Unbiased Delayed Feedback Label Correction for Conversion Rate Prediction

    Full text link
    Conversion rate prediction is critical to many online applications such as digital display advertising. To capture dynamic data distribution, industrial systems often require retraining models on recent data daily or weekly. However, the delay of conversion behavior usually leads to incorrect labeling, which is called delayed feedback problem. Existing work may fail to introduce the correct information about false negative samples due to data sparsity and dynamic data distribution. To directly introduce the correct feedback label information, we propose an Unbiased delayed feedback Label Correction framework (ULC), which uses an auxiliary model to correct labels for observed negative feedback samples. Firstly, we theoretically prove that the label-corrected loss is an unbiased estimate of the oracle loss using true labels. Then, as there are no ready training data for label correction, counterfactual labeling is used to construct artificial training data. Furthermore, since counterfactual labeling utilizes only partial training data, we design an embedding-based alternative training method to enhance performance. Comparative experiments on both public and private datasets and detailed analyses show that our proposed approach effectively alleviates the delayed feedback problem and consistently outperforms the previous state-of-the-art methods.Comment: accepted by KDD 202

    Searching for the earliest use of limestone as a flux in Chinese high-fired ceramic glazes—evidence from Sr isotopic analysis of Chinese northern porcelain

    Get PDF
    Samples of northern porcelain wares dating to between the 6th and 13th centuries from the three most important northern Chinese ceramic kiln sites, Gongyi, Xing and Ding have been studied in this work. The Sr isotope and chemical compositions of the ceramic glazes of these wares have been determined. Based on the scientific results we have been able to suggest the raw materials used to make the glazes. Using Strontium isotopic analysis we have successfully shown that the earliest use of limestone as a glaze flux so far identified is during the period from the Sui to mid-Tang Dynasties (late 6th-early 9th century) to produce white slip glazed ware in the Xing kilns so it may have been ‘invented’ there

    Ditto: A Simple and Efficient Approach to Improve Sentence Embeddings

    Full text link
    Prior studies diagnose the anisotropy problem in sentence representations from pre-trained language models, e.g., BERT, without fine-tuning. Our analysis reveals that the sentence embeddings from BERT suffer from a bias towards uninformative words, limiting the performance in semantic textual similarity (STS) tasks. To address this bias, we propose a simple and efficient unsupervised approach, Diagonal Attention Pooling (Ditto), which weights words with model-based importance estimations and computes the weighted average of word representations from pre-trained models as sentence embeddings. Ditto can be easily applied to any pre-trained language model as a postprocessing operation. Compared to prior sentence embedding approaches, Ditto does not add parameters nor requires any learning. Empirical evaluations demonstrate that our proposed Ditto can alleviate the anisotropy problem and improve various pre-trained models on STS tasks.Comment: 8 pages, accepted by EMNLP 2023 short paper, the source code can be found at https://github.com/alibaba-damo-academy/SpokenNLP/tree/main/ditt

    Loss Masking Is Not Needed in Decoder-only Transformer for Discrete-token-based ASR

    Full text link
    Recently, unified speech-text models, such as SpeechGPT, VioLA, and AudioPaLM, have achieved remarkable performance on various speech tasks. These models discretize speech signals into tokens (speech discretization) and use a shared vocabulary for both text and speech tokens. Then they train a single decoder-only Transformer on a mixture of speech tasks. However, these models rely on the Loss Masking strategy for the ASR task, which ignores the dependency among speech tokens. In this paper, we propose to model speech tokens in an autoregressive way, similar to text. We find that applying the conventional cross-entropy loss on input speech tokens does not consistently improve the ASR performance over the Loss Masking approach. To address this issue, we propose a novel approach denoted Smoothed Label Distillation (SLD), which applies a KL divergence loss with smoothed labels on speech tokens. Our experiments show that SLD effectively models speech tokens and outperforms Loss Masking for decoder-only Transformers in ASR tasks with different speech discretization methods. The source code can be found here: https://github.com/alibaba-damo-academy/SpokenNLP/tree/main/sldComment: 5 pages, accepted by ICASSP 202
    corecore