17 research outputs found

    Annotated Job Ads with Named Entity Recognition

    Full text link
    We have trained a named entity recognition (NER) model that screens Swedish job ads for different kinds of useful information (e.g. skills required from a job seeker). It was obtained by fine-tuning KB-BERT. The biggest challenge we faced was the creation of a labelled dataset, which required manual annotation. This paper gives an overview of the methods we employed to make the annotation process more efficient and to ensure high quality data. We also report on the performance of the resulting model.Comment: SLTC 202

    Text Annotation Handbook: A Practical Guide for Machine Learning Projects

    Full text link
    This handbook is a hands-on guide on how to approach text annotation tasks. It provides a gentle introduction to the topic, an overview of theoretical concepts as well as practical advice. The topics covered are mostly technical, but business, ethical and regulatory issues are also touched upon. The focus lies on readability and conciseness rather than completeness and scientific rigor. Experience with annotation and knowledge of machine learning are useful but not required. The document may serve as a primer or reference book for a wide range of professions such as team leaders, project managers, IT architects, software developers and machine learning engineers.Comment: 30 pages, white pape

    Retardation of arsenic transport through a Pleistocene aquifer

    Get PDF
    Groundwater drawn daily from shallow alluvial sands by millions of wells over large areas of south and southeast Asia exposes an estimated population of over a hundred million people to toxic levels of arsenic1. Holocene aquifers are the source of widespread arsenic poisoning across the region2, 3. In contrast, Pleistocene sands deposited in this region more than 12,000 years ago mostly do not host groundwater with high levels of arsenic. Pleistocene aquifers are increasingly used as a safe source of drinking water4 and it is therefore important to understand under what conditions low levels of arsenic can be maintained. Here we reconstruct the initial phase of contamination of a Pleistocene aquifer near Hanoi, Vietnam. We demonstrate that changes in groundwater flow conditions and the redox state of the aquifer sands induced by groundwater pumping caused the lateral intrusion of arsenic contamination more than 120 metres from a Holocene aquifer into a previously uncontaminated Pleistocene aquifer. We also find that arsenic adsorbs onto the aquifer sands and that there is a 16–20-fold retardation in the extent of the contamination relative to the reconstructed lateral movement of groundwater over the same period. Our findings suggest that arsenic contamination of Pleistocene aquifers in south and southeast Asia as a consequence of increasing levels of groundwater pumping may have been delayed by the retardation of arsenic transport.National Science Foundation (U.S.) (NSF grant EAR09-11557)Swiss Agency for Development and Cooperation (Grant NAFOSTED 105-09-59-09 to CETASD, the Centre for Environmental Technology and Sustainable Development (Vietnam))National Institute of Environmental Health Sciences (NIEHS grant P42 ES010349)National Institute of Environmental Health Sciences (NIEHS grant P42 ES016454

    Determination of csw in Nf = 3 + 1 Lattice QCD with massive Wilson fermions

    Get PDF
    Um aussagekräftige, mit dem Experiment vergleichbare Resultate aus Berechnungen der Gitter-QCD zu erhalten, ist die Extrapolation zum Kontinuum unabdingbar. Das bewährte Symanzik-Verbesserungsprogramm führt zu einer systematischen Reduzierung der Ordnung von Cutoff-Effekten, die eine bessere Kontrolle über die genannten Fehler sowie größere und damit erschwinglichere Gitterabstände ermöglicht. Auf die Wilson-Fermionenwirkung bezogen bedarf es nur des Hinzufügens des Sheikholeslami-Wohlert-Terms mit dem O(a)-Verbesserungskoeffizienten csw. In der vorliegenden Arbeit wird eine Strategie zur nicht-perturbativen Bestimmung dieses Koeffizienten in der Theorie mit Nf=3+1 massiven Seequarks entwickelt. Diese ist in ein allgemeines, massenabhängiges Renormierungs- und Verbesserungsschema eingebettet, dessen Grundlagen dargelegt werden. Die Auferlegung der Verbesserungsbedingung, bei der die PCAC-Relation im Schrödinger-Funktional Verwendung findet, geschieht entlang einer Linie konstanter Physik, welche dem Charm-Quark näherungsweise seine physikalische Masse zuordnet. Dieser vergleichsweise aufwendige Ansatz hat zum Ziel, große, massenabhängige O(a^2)-Effekte in zukünftigen Simulationen im großen Volumen mit vier dynamischen Quarkspezies zu vermeiden. Die numerischen Resultate dieser Arbeit werden unter Verwendung der tree-level-verbesserten Lüscher-Weisz-Eichwirkung gewonnen. Da die sogenannte Gradient-Flow-Kopplung bei der Definition der Linie konstanter Physik Verwendung findet, wird in einer zusätzlichen Untersuchung die Wechselbeziehung dieser Kopplung mit der Topologischen Ladung beleuchtet, insbesondere im Bezug auf die unter den Namen Critical Slowing Down und Topology Freezing bekannten Phänomene.In order to obtain sensible results from Lattice QCD that may be compared with experiment, extrapolation to the continuum is crucial. The well-established Symanzik improvement program systematically reduces the order of cutoff effects, allowing for better control of the aforementioned errors, as well as larger and thus more affordable lattice spacings. Applied to the Wilson fermion action, it entails the addition of the Sheikholeslami–Wohlert term with the O(a) improvement coefficient csw. In this work, a strategy is developed for the non-perturbative determination of csw in the theory with Nf=3+1 massive sea quarks. It is embedded in a general, mass-dependent renormalization and improvement scheme, for which we lay the foundations. The improvement condition, formulated by means of the PCAC relation in the Schrödinger Functional, is imposed along a line of constant physics that is designed to be close to the physical mass of the charm quark. The aim of this rather elaborate approach is to avoid large, mass-dependent O(a^2) effects in future large volume simulations with four dynamical quark species. The numerical results are worked out using the tree-level improved Lüscher–Weisz gauge action. Since the gradient flow coupling is employed in the definition of the line of constant physics, its interdependence with the topological charge in regard to critical slowing down and topology freezing is investigated in a supplemental study

    Critical slowing down and the gradient flow coupling in the Schrödinger functional

    No full text
    We study the sensitivity of the gradient flow coupling to sectors of different topological charge and its implications in practical situations. Furthermore, we investigate an alternative definition of the running coupling that is expected to be less sensitive to the problems of the HMC algorithm to efficiently sample all topological sectors
    corecore