4,220 research outputs found

    Entity matching with transformer architectures - a step forward in data integration

    Get PDF
    Transformer architectures have proven to be very effective and provide state-of-the-art results in many natural language tasks. The attention-based architecture in combination with pre-training on large amounts of text lead to the recent breakthrough and a variety of slightly different implementations. In this paper we analyze how well four of the most recent attention-based transformer architectures (BERT, XLNet, RoBERTa and DistilBERT) perform on the task of entity matching - a crucial part of data integration. Entity matching (EM) is the task of finding data instances that refer to the same real-world entity. It is a challenging task if the data instances consist of long textual data or if the data instances are "dirty" due to misplaced values. To evaluate the capability of transformer architectures and transfer-learning on the task of EM, we empirically compare the four approaches on inherently difficult data sets. We show that transformer architectures outperform classical deep learning methods in EM by an average margin of 27.5%

    A data Grid prototype for distributed data production in CMS

    Get PDF
    The CMS experiment at CERN is setting up a Grid infrastructure required to fulfill the needs imposed by Terabyte scale productions for the next few years. The goal is to automate the production and at the same time allow the users to interact with the system, if required, to make decisions which would optimize performance. We present the architecture, design and functionality of our first working Objectivity file replication prototype. The middle-ware of choice is the Globus toolkit that provides promising functionality. Our results prove the ability of the Globus toolkit to be used as an underlying technology for a world-wide Data Grid. The required data management functionality includes high speed file transfers, secure access to remote files, selection and synchronization of replicas and managing the meta information. The whole system is expected to be flexible enough to incorporate site specific policies. The data management granularity is the file rather than the object level. The first prototype is currently in use for the High Level Trigger (HLT) production (autumn 2000). Owing to these efforts, CMS is one of the pioneers to use the Data Grid functionality in a running production system. The project can be viewed as an evaluator of different strategies, a test for the capabilities of middle-ware tools and a provider of basic Grid functionalities

    Data Grid tutorials with hands-on experience

    Get PDF
    Grid technologies are more and more used in scientific as well as in industrial environments but often documentation and the correct usage are either not sufficient or not too well understood. Comprehensive training with hands-on experience helps people first to understand the technology and second to use it in a correct and efficient way. We have organised and run several training sessions in different locations all over the world and provide our experience. The major factors of success are a solid base of theoretical lectures and, more dominantly, a facility that allows for practical Grid exercises during and possibly after tutorial sessions

    Dramaturgie der Zerstreuung : Schiller und das romantische Drama

    Get PDF
    Aus der Perspektive einer Geschichte der Streitkultur ist die literaturpolitische Situation der spĂ€ten 1790er Jahre von einer grundlegenden Opposition bestimmt. Demzufolge lĂ€ĂŸt sich diese Situation als epochale Gleichzeitigkeit, als Nebeneinander von ‘Weimarer Klassik’ und ‘Jenaer (FrĂŒh-) Romantik’ beschreiben. Das Ende einer produktiven Zusammenarbeit zwischen Schiller und vor allem Friedrich Schlegel nach 1796 konstituiert Parteien, die jede Übereinkunft verweigern (beispielsweise wird auf beiden Seiten ausgiebig ĂŒbereinander ‘gelacht’) und deren hauptsĂ€chlich polemische SelbstverstĂ€ndigung und Programmatik schließlich von einer auf kategoriale Vereinfachungen angewiesenen Literaturgeschichtsschreibung kanonisiert wird
    • 

    corecore