666 research outputs found

    Crossing the Threshold: Idiomatic Machine Translation through Retrieval Augmentation and Loss Weighting

    Full text link
    Idioms are common in everyday language, but often pose a challenge to translators because their meanings do not follow from the meanings of their parts. Despite significant advances, machine translation systems still struggle to translate idiomatic expressions. We provide a simple characterization of idiomatic translation and related issues. This allows us to conduct a synthetic experiment revealing a tipping point at which transformer-based machine translation models correctly default to idiomatic translations. To expand multilingual resources, we compile a dataset of ~4k natural sentences containing idiomatic expressions in French, Finnish, and Japanese. To improve translation of natural idioms, we introduce two straightforward yet effective techniques: the strategic upweighting of training loss on potentially idiomatic sentences, and using retrieval-augmented models. This not only improves the accuracy of a strong pretrained MT model on idiomatic sentences by up to 13% in absolute accuracy, but also holds potential benefits for non-idiomatic sentences.Comment: EMNLP 202

    Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics Interface of LMs Through Agentivity

    Full text link
    Recent advances in large language models have prompted researchers to examine their abilities across a variety of linguistic tasks, but little has been done to investigate how models handle the interactions in meaning across words and larger syntactic forms -- i.e. phenomena at the intersection of syntax and semantics. We present the semantic notion of agentivity as a case study for probing such interactions. We created a novel evaluation dataset by utilitizing the unique linguistic properties of a subset of optionally transitive English verbs. This dataset was used to prompt varying sizes of three model classes to see if they are sensitive to agentivity at the lexical level, and if they can appropriately employ these word-level priors given a specific syntactic context. Overall, GPT-3 text-davinci-003 performs extremely well across all experiments, outperforming all other models tested by far. In fact, the results are even better correlated with human judgements than both syntactic and semantic corpus statistics. This suggests that LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery than select corpora for certain tasks

    EUREKA: EUphemism Recognition Enhanced through Knn-based methods and Augmentation

    Get PDF
    We introduce EUREKA, an ensemble-based approach for performing automatic euphemism detection. We (1) identify and correct potentially mislabelled rows in the dataset, (2) curate an expanded corpus called EuphAug, (3) leverage model representations of Potentially Euphemistic Terms (PETs), and (4) explore using representations of semantically close sentences to aid in classification. Using our augmented dataset and kNN-based methods, EUREKA was able to achieve state-of-the-art results on the public leaderboard of the Euphemism Detection Shared Task, ranking first with a macro F1 score of 0.881

    Microstructure and texture evolutions in FeCrAl cladding tube during pilger processing

    Get PDF
    The microstructure of FeCrAl cladding tubes depends on the fabricating process history. In this study, the microstructural characteristics of wrought FeCrAl alloys during industrial pilger processing into thin-walled tubes were investigated. The hot extruded tube showed ∼100 μm equiaxed grains with weak α∗-fiber in {h11}<1/h12> texture, while pilger rolling process change the microstructure to fragmented and elongated grains along the rolling direction. The pilgered textures could be predicted with the VPSC model. The inter-pass annealing at 800–850 \ub0C for 1 h results in recovery and recrystallization of the ferric matrix and restoration of ductility. The final finished tube shows fine recrystallized grains (∼11 μm) with dominant γ-fiber in three dimensions. Pilger rolling enhanced α-fiber while annealing reduced α-fiber and enhanced γ-fiber. Microstructural evolution in the Laves precipitates followed the sequence of faceted needle-like → spherical → faceted ellipsoidal. Thermomechanical processing resulted in cladding tubes with an area fraction of ∼5% and a number density of 5 7 10−11 m−2 in Laves precipitates, which is half that of the first-pilgered tube. Laves precipitates pin the grain boundaries to control the microstructure and prevent grain coarsening

    Witness: The Modern Writer as Witness

    Full text link
    Editor\u27s Note [Excerpt] Magic can mean many different things, especially for writers. Magic can be an illusion, a sleight of hand designed to trick onlookers into believing the impossible. Or magic can be a supernatural force in a world of harsh reality, a set of beliefs that sits just outside the realms of organized religion and advanced technology. Wizards and demons, Las Vegas entertainers and houngans --they all practice a kind of sorcery. For poets and prose writers, though, magic affords an opportunity for us to stretch the limitations of the physical world in search of new themes, settings, and characters. Magic is a door we eagerly walk through to reach new lands. We at Witness have thoroughly enjoyed the process of selecting the themed works we have collected here, mainly because the idea of enchantment is inspiring. There is the possibility of positive charms; there is a chance for dark witchery. And sometimes the spell cast by a character is nebulous, difficult to categorize. It’s arguable that we cherish these incantations the most, since they leave us in a state of wonderment bordering on disorientation. Yes, magic can also leave us bewildered and thankful for the bewilderment.https://digitalscholarship.unlv.edu/witness/1001/thumbnail.jp

    Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation

    Full text link
    Many recent advances in natural language generation have been fueled by training large language models on internet-scale data. However, this paradigm can lead to models that generate toxic, inaccurate, and unhelpful content, and automatic evaluation metrics often fail to identify these behaviors. As models become more capable, human feedback is an invaluable signal for evaluating and improving models. This survey aims to provide an overview of the recent research that has leveraged human feedback to improve natural language generation. First, we introduce an encompassing formalization of feedback, and identify and organize existing research into a taxonomy following this formalization. Next, we discuss how feedback can be described by its format and objective, and cover the two approaches proposed to use feedback (either for training or decoding): directly using the feedback or training feedback models. We also discuss existing datasets for human-feedback data collection, and concerns surrounding feedback collection. Finally, we provide an overview of the nascent field of AI feedback, which exploits large language models to make judgments based on a set of principles and minimize the need for human intervention.Comment: Work in Progres

    Two-station measurement of Rayleigh-wave phase velocities for the Huatung basin, the westernmost Philippine Sea, with OBS : implications for regional tectonics

    Get PDF
    Author Posting. © The Authors, 2009. This article is posted here by permission of John Wiley & Sons for personal use, not for redistribution. The definitive version was published in Geophysical Journal International 179 (2009): 1859-1869, doi:10.1111/j.1365-246X.2009.04391.x.A broad-band ocean-bottom seismometer (OBS) deployed ~180 km east of Taiwan provides a first glimpse into the upper mantle beneath the westernmost section of the Philippine Sea or the Huatung basin (HB). We measured interstation phase velocities of Rayleigh waves between the OBS and stations on the eastern coast of Taiwan. The phase velocities show smooth variations from 3.8 to 3.9 km s−1 for periods of 25–40 s. In this short period range, phase velocities are comparable to those characterizing the 15–30 Ma Parece-Vela basin of the Philippine Sea. Modelling of the finite-frequency effect proves the validity of the measurement for the average HB. The shear-wave velocity models inverted from the 25 to 40 s dispersion show a velocity at lithospheric depths about 0.1 km s−1 lower than that of the west Philippine Sea, which agrees with the age effect derived from the Pacific pure-path model. Inversions incorporating the less reliable data above 40 s yield a shear velocity <4.0 km s−1 below 150 km, an unrealistic value even for a hotspot plume environment. The seismological evidence, together with the correlation in seafloor depth, suggests that the HB and the Parece-Vela basin may have a similar age. This is at odds with the previous geochronological study suggesting an early-Cretaceous age for the HB. Thermal rejuvenation of the lithosphere was examined as a potential solution to reconciling the two age models.The research is supported by the National Science Council, Taiwan, Republic of China, under grant NSC 96–2745-M-001–005

    Antimicrobial resistance among migrants in Europe: a systematic review and meta-analysis

    Get PDF
    BACKGROUND: Rates of antimicrobial resistance (AMR) are rising globally and there is concern that increased migration is contributing to the burden of antibiotic resistance in Europe. However, the effect of migration on the burden of AMR in Europe has not yet been comprehensively examined. Therefore, we did a systematic review and meta-analysis to identify and synthesise data for AMR carriage or infection in migrants to Europe to examine differences in patterns of AMR across migrant groups and in different settings. METHODS: For this systematic review and meta-analysis, we searched MEDLINE, Embase, PubMed, and Scopus with no language restrictions from Jan 1, 2000, to Jan 18, 2017, for primary data from observational studies reporting antibacterial resistance in common bacterial pathogens among migrants to 21 European Union-15 and European Economic Area countries. To be eligible for inclusion, studies had to report data on carriage or infection with laboratory-confirmed antibiotic-resistant organisms in migrant populations. We extracted data from eligible studies and assessed quality using piloted, standardised forms. We did not examine drug resistance in tuberculosis and excluded articles solely reporting on this parameter. We also excluded articles in which migrant status was determined by ethnicity, country of birth of participants' parents, or was not defined, and articles in which data were not disaggregated by migrant status. Outcomes were carriage of or infection with antibiotic-resistant organisms. We used random-effects models to calculate the pooled prevalence of each outcome. The study protocol is registered with PROSPERO, number CRD42016043681. FINDINGS: We identified 2274 articles, of which 23 observational studies reporting on antibiotic resistance in 2319 migrants were included. The pooled prevalence of any AMR carriage or AMR infection in migrants was 25·4% (95% CI 19·1-31·8; I2 =98%), including meticillin-resistant Staphylococcus aureus (7·8%, 4·8-10·7; I2 =92%) and antibiotic-resistant Gram-negative bacteria (27·2%, 17·6-36·8; I2 =94%). The pooled prevalence of any AMR carriage or infection was higher in refugees and asylum seekers (33·0%, 18·3-47·6; I2 =98%) than in other migrant groups (6·6%, 1·8-11·3; I2 =92%). The pooled prevalence of antibiotic-resistant organisms was slightly higher in high-migrant community settings (33·1%, 11·1-55·1; I2 =96%) than in migrants in hospitals (24·3%, 16·1-32·6; I2 =98%). We did not find evidence of high rates of transmission of AMR from migrant to host populations. INTERPRETATION: Migrants are exposed to conditions favouring the emergence of drug resistance during transit and in host countries in Europe. Increased antibiotic resistance among refugees and asylum seekers and in high-migrant community settings (such as refugee camps and detention facilities) highlights the need for improved living conditions, access to health care, and initiatives to facilitate detection of and appropriate high-quality treatment for antibiotic-resistant infections during transit and in host countries. Protocols for the prevention and control of infection and for antibiotic surveillance need to be integrated in all aspects of health care, which should be accessible for all migrant groups, and should target determinants of AMR before, during, and after migration. FUNDING: UK National Institute for Health Research Imperial Biomedical Research Centre, Imperial College Healthcare Charity, the Wellcome Trust, and UK National Institute for Health Research Health Protection Research Unit in Healthcare-associated Infections and Antimictobial Resistance at Imperial College London

    DOCK2 is involved in the host genetics and biology of severe COVID-19

    Get PDF
    「コロナ制圧タスクフォース」COVID-19疾患感受性遺伝子DOCK2の重症化機序を解明 --アジア最大のバイオレポジトリーでCOVID-19の治療標的を発見--. 京都大学プレスリリース. 2022-08-10.Identifying the host genetic factors underlying severe COVID-19 is an emerging challenge. Here we conducted a genome-wide association study (GWAS) involving 2, 393 cases of COVID-19 in a cohort of Japanese individuals collected during the initial waves of the pandemic, with 3, 289 unaffected controls. We identified a variant on chromosome 5 at 5q35 (rs60200309-A), close to the dedicator of cytokinesis 2 gene (DOCK2), which was associated with severe COVID-19 in patients less than 65 years of age. This risk allele was prevalent in East Asian individuals but rare in Europeans, highlighting the value of genome-wide association studies in non-European populations. RNA-sequencing analysis of 473 bulk peripheral blood samples identified decreased expression of DOCK2 associated with the risk allele in these younger patients. DOCK2 expression was suppressed in patients with severe cases of COVID-19. Single-cell RNA-sequencing analysis (n = 61 individuals) identified cell-type-specific downregulation of DOCK2 and a COVID-19-specific decreasing effect of the risk allele on DOCK2 expression in non-classical monocytes. Immunohistochemistry of lung specimens from patients with severe COVID-19 pneumonia showed suppressed DOCK2 expression. Moreover, inhibition of DOCK2 function with CPYPP increased the severity of pneumonia in a Syrian hamster model of SARS-CoV-2 infection, characterized by weight loss, lung oedema, enhanced viral loads, impaired macrophage recruitment and dysregulated type I interferon responses. We conclude that DOCK2 has an important role in the host immune response to SARS-CoV-2 infection and the development of severe COVID-19, and could be further explored as a potential biomarker and/or therapeutic target
    corecore