Search CORE

2 research outputs found

Multilingual Transformers for Product Matching -- Experiments and a New Benchmark in Polish

Author: Możdżonek Michał
Tkachuk Sergiy
Wróblewska Anna
Łukasik Szymon
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2022
Field of study

Product matching corresponds to the task of matching identical products across different data sources. It typically employs available product features which, apart from being multimodal, i.e., comprised of various data types, might be non-homogeneous and incomplete. The paper shows that pre-trained, multilingual Transformer models, after fine-tuning, are suitable for solving the product matching problem using textual features both in English and Polish languages. We tested multilingual mBERT and XLM-RoBERTa models in English on Web Data Commons - training dataset and gold standard for large-scale product matching. The obtained results show that these models perform similarly to the latest solutions tested on this set, and in some cases, the results were even better. Additionally, we prepared a new dataset entirely in Polish and based on offers in selected categories obtained from several online stores for the research purpose. It is the first open dataset for product matching tasks in Polish, which allows comparing the effectiveness of the pre-trained models. Thus, we also showed the baseline results obtained by the fine-tuned mBERT and XLM-RoBERTa models on the Polish datasets.Comment: 11 pages, 5 figure

arXiv.org e-Print Archive

On Preventing and Detecting Cyber Attacks in Industrial Control System Networks

Author: Klimaszewski Konrad
Kopka Przemysław
Kozioł Sylwester
Kuźmicki Krzysztof
Możdżonek Rafał
Padée Adam
Wiślicki Wojciech
Wójcik Michał
Włodarski Tomasz
Ćwiek Arkadiusz
Publication venue: 'National Institute of Telecommunications'
Publication date: 01/01/2019
Field of study

This paper outlines the problem of cybersecurity in OT (operations/operational technology) networks. It provides descriptions of the most common components of these systems, summarizes the threats and compares them with those present in the IT domain. A considerable section of the paper summarizes research conducted over the past decade, focusing on how common the problem is and in which countries it prevails. The article presents techniques most commonly used in the protection of these systems, with many examples from the nuclear industry given

Biblioteka Nauki - repozytorium artykuÅÃ³w