7 research outputs found

    Data augmentation for machine translation via dependency subtree swapping

    Get PDF
    We present a generic framework for data augmentation via dependency subtree swapping that is applicable to machine translation. We extract corresponding subtrees from the dependency parse trees of the source and target sentences and swap these across bisentences to create augmented samples. We perform thorough filtering based on graphbased similarities of the dependency trees and additional heuristics to ensure that extracted subtrees correspond to the same meaning. We conduct resource-constrained experiments on 4 language pairs in both directions using the IWSLT text translation datasets and the Hunglish2 corpus. The results demonstrate consistent improvements in BLEU score over our baseline models in 3 out of 4 language pairs. Our code is available on GitHub

    HunSum-1 : an abstractive summarization dataset for Hungarian

    Get PDF
    We introduce HunSum-1 : a dataset for Hungarian abstractive summarization, consisting of 1.14M news articles. The dataset is built by collecting, cleaning and deduplicating data from 9 major Hungarian news sites through CommonCrawl. Using this dataset, we build abstractive summarizer models based on huBERT and mT5. We demonstrate the value of the created dataset by performing a quantitative and qualitative analysis on the models’ results. The HunSum-1 dataset, all models used in our experiments and our code1 are available open source

    Data Augmentation for Machine Translation via Dependency Subtree Swapping

    Full text link
    We present a generic framework for data augmentation via dependency subtree swapping that is applicable to machine translation. We extract corresponding subtrees from the dependency parse trees of the source and target sentences and swap these across bisentences to create augmented samples. We perform thorough filtering based on graphbased similarities of the dependency trees and additional heuristics to ensure that extracted subtrees correspond to the same meaning. We conduct resource-constrained experiments on 4 language pairs in both directions using the IWSLT text translation datasets and the Hunglish2 corpus. The results demonstrate consistent improvements in BLEU score over our baseline models in 3 out of 4 language pairs. Our code is available on GitHub

    Data Augmentation for Machine Translation via Dependency Subtree Swapping

    Get PDF

    HunSum-1: an Abstractive Summarization Dataset for Hungarian

    Get PDF

    SIGMORPHON 2021 Shared Task on Morphological Reinflection: Generalization Across Languages

    Get PDF
    This year's iteration of the SIGMORPHON Shared Task on morphological reinflection focuses on typological diversity and cross-lingual variation of morphosyntactic features. In terms of the task, we enrich UniMorph with new data for 32 languages from 13 language families, with most of them being under-resourced: Kunwinjku, Classical Syriac, Arabic (Modern Standard, Egyptian, Gulf), Hebrew, Amharic, Aymara, Magahi, Braj, Kurdish (Central, Northern, Southern), Polish, Karelian, Livvi, Ludic, Veps, Võro, Evenki, Xibe, Tuvan, Sakha, Turkish, Indonesian, Kodi, Seneca, Asháninka, Yanesha, Chukchi, Itelmen, Eibela. We evaluate six systems on the new data and conduct an extensive error analysis of the systems' predictions. Transformer-based models generally demonstrate superior performance on the majority of languages, achieving >90% accuracy on 65% of them. The languages on which systems yielded low accuracy are mainly under-resourced, with a limited amount of data. Most errors made by the systems are due to allomorphy, honorificity, and form variation. In addition, we observe that systems especially struggle to inflect multiword lemmas. The systems also produce misspelled forms or end up in repetitive loops (e.g., RNN-based models). Finally, we report a large drop in systems' performance on previously unseen lemmas.Peer reviewe

    A progrediáló inzulinrezisztencia hatása a glükózanyagcsere csontállapot kapcsolatokra = Relations between bone status and glucose metabolism with progression of insulin resistance

    No full text
    Az elhízás, a metabolikus szindróma, a 2-es típusú cukorbetegség és a csontritkulás előfordulása világszerte növekszik, vagyis a világméretű diabéteszjárványt az az elhízás „hajtja”, amely nők esetében erősebb csontokat eredményez. Vizsgálatunk során a glükózanyagcsere-zavar korai időszakában kerestük a csontállapot és a metabolikus paraméterek közötti összefüggéseket. A vizsgálatban 20 egészséges és 51 glükózintoleráns (49 ± 9 év) nőbeteg vett részt. Mértük a szénhidrát-, lipid- és csontanyagcsere paramétereit, a csontok denzitását (lumbális 1–4 csigolyákon és a femur nyakon); cukorterheléses és hyperinsulinaemiás-normoglykaemiás klamp vizsgálatot végeztünk. A csontok denzitása a két csoport között nem különbözött. Az egészségesek denzitása szoros kapcsolatban volt az egésztest-cukorfelhasználással (inzulinérzékenység) (gerinc r = –0,4921, p < 0,05, femur: r = –0,4972, p < 0,05), de a romló glükóztoleranciával ez a kapcsolat megszűnt (gerinc: r = –0,022, ns; femur: r = –0,3136, ns). Az adipokinek közül csak az adiponectin korrelált a denzitással, amíg ez a kapcsolat a cukoranyagcsere romlásával megmaradt a gerincen ( r = –0,5081, p < 0,05; –0,2804, p < 0,05), eltűnt a femuron ( r = –0,6742, p < 0,01; –0,1723, ns). A formációs és reszorpciós markerből képzett „reszorpciós hányados” növekedése a glükózanyagcsere romlásával csökkenő csontreszorpciót jelezte. Adataink az inzulinrezisztencia „gold standard” mérőmódszerét használva szoros kapcsolatot igazoltak a glükózanyagcsere, inzulinérzékenység és a csontok állapota között az egészséges, változó korban lévő nőkben, mely a glükóztolerancia romlásával és az inzulinrezisztencia kialakulásával megbomlik. Az egészségesekben észlelhető, de az inzulinrezisztencia kialakulásával romló, negatív adiponectin-csont kapcsolat értelmezése további vizsgálatokat igényel. | A paradox is hidden in the increasing number of patients with insulin resistance, Type 2 diabetes and osteoporosis, as the world wide diabetes epidemic is driven by the same obesity which protects the bones in the obese females. Our aim was to investigate the connection between the early glucose intolerance, insulin resistance and bone density and metabolism. After metabolic status of matched 20 healthy and 51 glucose intolerant women (age: 49 ± 9 y.) was determined, hyperinsulinemic-euglycemic clamps were done, while adipo- and cytokine levels were measured. Bone mineral density over lumbar spine and the femur neck were measured by DEXA. No differences in bone density were observed between groups at any sites measured. Tight correlations were found between total body glucose utilization and bone density in healthy group (lumbar spine r = –0.4921, p < 0.05, femur neck: r = –0.4972, p < 0.05), while with deterioration of glucose metabolism this correlation disappeared (lumbar spine: r = –0.022, ns; femur neck: r = –0.3136, ns). The adiponectin was the only adipokine which correlated with lumbar spine density in both groups ( r = –0.5081, p < 0.05; –0.2804, p < 0.05), but not with femur density, where this connection disappeared with glucose intolerance ( r = –0.6742, p < 0.01; –0.1723, ns). Relations of bone metabolic markers indicated that bone resorption decreases with worsening of insulin resistance. In conclusion inverse correlations were found between bone density and glucose metabolism, or insulin sensitivity in healthy women in perimenopause, but this connection disappeared with the deterioration of glucose metabolism and progression of insulin resistance measured by the “gold standard” insulin-glucose clamps. Decreasing insulin sensitivity of bones and escape from “metabolic control” may result in frequently observed hyperdensity in Type 2 diabetics
    corecore