1,256 research outputs found

    Ask Language Model to Clean Your Noisy Translation Data

    Full text link
    Transformer models have demonstrated remarkable performance in neural machine translation (NMT). However, their vulnerability to noisy input poses a significant challenge in practical implementation, where generating clean output from noisy input is crucial. The MTNT dataset is widely used as a benchmark for evaluating the robustness of NMT models against noisy input. Nevertheless, its utility is limited due to the presence of noise in both the source and target sentences. To address this limitation, we focus on cleaning the noise from the target sentences in MTNT, making it more suitable as a benchmark for noise evaluation. Leveraging the capabilities of large language models (LLMs), we observe their impressive abilities in noise removal. For example, they can remove emojis while considering their semantic meaning. Additionally, we show that LLM can effectively rephrase slang, jargon, and profanities. The resulting datasets, called C-MTNT, exhibit significantly less noise in the target sentences while preserving the semantic integrity of the original sentences. Our human and GPT-4 evaluations also lead to a consistent conclusion that LLM performs well on this task. Lastly, experiments on C-MTNT showcased its effectiveness in evaluating the robustness of NMT models, highlighting the potential of advanced language models for data cleaning and emphasizing C-MTNT as a valuable resource.Comment: EMNLP 2023, Finding

    CMB-S4: Forecasting Constraints on Primordial Gravitational Waves

    Full text link
    CMB-S4---the next-generation ground-based cosmic microwave background (CMB) experiment---is set to significantly advance the sensitivity of CMB measurements and enhance our understanding of the origin and evolution of the Universe, from the highest energies at the dawn of time through the growth of structure to the present day. Among the science cases pursued with CMB-S4, the quest for detecting primordial gravitational waves is a central driver of the experimental design. This work details the development of a forecasting framework that includes a power-spectrum-based semi-analytic projection tool, targeted explicitly towards optimizing constraints on the tensor-to-scalar ratio, rr, in the presence of Galactic foregrounds and gravitational lensing of the CMB. This framework is unique in its direct use of information from the achieved performance of current Stage 2--3 CMB experiments to robustly forecast the science reach of upcoming CMB-polarization endeavors. The methodology allows for rapid iteration over experimental configurations and offers a flexible way to optimize the design of future experiments given a desired scientific goal. To form a closed-loop process, we couple this semi-analytic tool with map-based validation studies, which allow for the injection of additional complexity and verification of our forecasts with several independent analysis methods. We document multiple rounds of forecasts for CMB-S4 using this process and the resulting establishment of the current reference design of the primordial gravitational-wave component of the Stage-4 experiment, optimized to achieve our science goals of detecting primordial gravitational waves for r>0.003r > 0.003 at greater than 5σ5\sigma, or, in the absence of a detection, of reaching an upper limit of r<0.001r < 0.001 at 95%95\% CL.Comment: 24 pages, 8 figures, 9 tables, submitted to ApJ. arXiv admin note: text overlap with arXiv:1907.0447

    Optimasi Portofolio Resiko Menggunakan Model Markowitz MVO Dikaitkan dengan Keterbatasan Manusia dalam Memprediksi Masa Depan dalam Perspektif Al-Qur`an

    Full text link
    Risk portfolio on modern finance has become increasingly technical, requiring the use of sophisticated mathematical tools in both research and practice. Since companies cannot insure themselves completely against risk, as human incompetence in predicting the future precisely that written in Al-Quran surah Luqman verse 34, they have to manage it to yield an optimal portfolio. The objective here is to minimize the variance among all portfolios, or alternatively, to maximize expected return among all portfolios that has at least a certain expected return. Furthermore, this study focuses on optimizing risk portfolio so called Markowitz MVO (Mean-Variance Optimization). Some theoretical frameworks for analysis are arithmetic mean, geometric mean, variance, covariance, linear programming, and quadratic programming. Moreover, finding a minimum variance portfolio produces a convex quadratic programming, that is minimizing the objective function ðð¥with constraintsð ð 𥠥 ðandð´ð¥ = ð. The outcome of this research is the solution of optimal risk portofolio in some investments that could be finished smoothly using MATLAB R2007b software together with its graphic analysis

    Genetic correlation between amyotrophic lateral sclerosis and schizophrenia

    Get PDF
    A. Palotie on työryhmän Schizophrenia Working Grp Psychiat jäsen.We have previously shown higher-than-expected rates of schizophrenia in relatives of patients with amyotrophic lateral sclerosis (ALS), suggesting an aetiological relationship between the diseases. Here, we investigate the genetic relationship between ALS and schizophrenia using genome-wide association study data from over 100,000 unique individuals. Using linkage disequilibrium score regression, we estimate the genetic correlation between ALS and schizophrenia to be 14.3% (7.05-21.6; P = 1 x 10(-4)) with schizophrenia polygenic risk scores explaining up to 0.12% of the variance in ALS (P = 8.4 x 10(-7)). A modest increase in comorbidity of ALS and schizophrenia is expected given these findings (odds ratio 1.08-1.26) but this would require very large studies to observe epidemiologically. We identify five potential novel ALS-associated loci using conditional false discovery rate analysis. It is likely that shared neurobiological mechanisms between these two disorders will engender novel hypotheses in future preclinical and clinical studies.Peer reviewe

    TRY plant trait database – enhanced coverage and open access

    Get PDF
    Plant traits - the morphological, anatomical, physiological, biochemical and phenological characteristics of plants - determine how plants respond to environmental factors, affect other trophic levels, and influence ecosystem properties and their benefits and detriments to people. Plant trait data thus represent the basis for a vast area of research spanning from evolutionary biology, community and functional ecology, to biodiversity conservation, ecosystem and landscape management, restoration, biogeography and earth system modelling. Since its foundation in 2007, the TRY database of plant traits has grown continuously. It now provides unprecedented data coverage under an open access data policy and is the main plant trait database used by the research community worldwide. Increasingly, the TRY database also supports new frontiers of trait‐based plant research, including the identification of data gaps and the subsequent mobilization or measurement of new data. To support this development, in this article we evaluate the extent of the trait data compiled in TRY and analyse emerging patterns of data coverage and representativeness. Best species coverage is achieved for categorical traits - almost complete coverage for ‘plant growth form’. However, most traits relevant for ecology and vegetation modelling are characterized by continuous intraspecific variation and trait–environmental relationships. These traits have to be measured on individual plants in their respective environment. Despite unprecedented data coverage, we observe a humbling lack of completeness and representativeness of these continuous traits in many aspects. We, therefore, conclude that reducing data gaps and biases in the TRY database remains a key challenge and requires a coordinated approach to data mobilization and trait measurements. This can only be achieved in collaboration with other initiatives

    An embedding technique to determine ττ backgrounds in proton-proton collision data

    Get PDF
    An embedding technique is presented to estimate standard model tau tau backgrounds from data with minimal simulation input. In the data, the muons are removed from reconstructed mu mu events and replaced with simulated tau leptons with the same kinematic properties. In this way, a set of hybrid events is obtained that does not rely on simulation except for the decay of the tau leptons. The challenges in describing the underlying event or the production of associated jets in the simulation are avoided. The technique described in this paper was developed for CMS. Its validation and the inherent uncertainties are also discussed. The demonstration of the performance of the technique is based on a sample of proton-proton collisions collected by CMS in 2017 at root s = 13 TeV corresponding to an integrated luminosity of 41.5 fb(-1).Peer reviewe

    Measurement of the Splitting Function in &ITpp &ITand Pb-Pb Collisions at root&ITsNN&IT=5.02 TeV

    Get PDF
    Data from heavy ion collisions suggest that the evolution of a parton shower is modified by interactions with the color charges in the dense partonic medium created in these collisions, but it is not known where in the shower evolution the modifications occur. The momentum ratio of the two leading partons, resolved as subjets, provides information about the parton shower evolution. This substructure observable, known as the splitting function, reflects the process of a parton splitting into two other partons and has been measured for jets with transverse momentum between 140 and 500 GeV, in pp and PbPb collisions at a center-of-mass energy of 5.02 TeV per nucleon pair. In central PbPb collisions, the splitting function indicates a more unbalanced momentum ratio, compared to peripheral PbPb and pp collisions.. The measurements are compared to various predictions from event generators and analytical calculations.Peer reviewe

    Measurement of t(t)over-bar normalised multi-differential cross sections in pp collisions at root s=13 TeV, and simultaneous determination of the strong coupling strength, top quark pole mass, and parton distribution functions

    Get PDF
    Peer reviewe

    Measurement of nuclear modification factors of gamma(1S)), gamma(2S), and gamma(3S) mesons in PbPb collisions at root s(NN)=5.02 TeV

    Get PDF
    The cross sections for ϒ(1S), ϒ(2S), and ϒ(3S) production in lead-lead (PbPb) and proton-proton (pp) collisions at √sNN = 5.02 TeV have been measured using the CMS detector at the LHC. The nuclear modification factors, RAA, derived from the PbPb-to-pp ratio of yields for each state, are studied as functions of meson rapidity and transverse momentum, as well as PbPb collision centrality. The yields of all three states are found to be significantly suppressed, and compatible with a sequential ordering of the suppression, RAA(ϒ(1S)) > RAA(ϒ(2S)) > RAA(ϒ(3S)). The suppression of ϒ(1S) is larger than that seen at √sNN = 2.76 TeV, although the two are compatible within uncertainties. The upper limit on the RAA of ϒ(3S) integrated over pT, rapidity and centrality is 0.096 at 95% confidence level, which is the strongest suppression observed for a quarkonium state in heavy ion collisions to date. © 2019 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). Funded by SCOAP3.Peer reviewe

    Electroweak production of two jets in association with a Z boson in proton-proton collisions root s =13 TeV

    Get PDF
    A measurement of the electroweak (EW) production of two jets in association with a Z boson in proton-proton collisions at root s = 13 TeV is presented, based on data recorded in 2016 by the CMS experiment at the LHC corresponding to an integrated luminosity of 35.9 fb(-1). The measurement is performed in the lljj final state with l including electrons and muons, and the jets j corresponding to the quarks produced in the hard interaction. The measured cross section in a kinematic region defined by invariant masses m(ll) > 50 GeV, m(jj) > 120 GeV, and transverse momenta P-Tj > 25 GeV is sigma(EW) (lljj) = 534 +/- 20 (stat) fb (syst) fb, in agreement with leading-order standard model predictions. The final state is also used to perform a search for anomalous trilinear gauge couplings. No evidence is found and limits on anomalous trilinear gauge couplings associated with dimension-six operators are given in the framework of an effective field theory. The corresponding 95% confidence level intervals are -2.6 <cwww/Lambda(2) <2.6 TeV-2 and -8.4 <cw/Lambda(2) <10.1 TeV-2. The additional jet activity of events in a signal-enriched region is also studied, and the measurements are in agreement with predictions.Peer reviewe
    corecore