22 research outputs found
Using topology preservation measures for high-dimensional data analysis in a reduced feature space
This paper deals with high-dimensional data analysis accomplished through supplementing standard feature extraction procedures with topology preservation measures. This approach is based on an observation that not all elements of an initial dataset are equally preserved in its low-dimensional embedding space representation. The contribution first overviews existing topology preservation measures, then their inclusion in the classical methods of exploratory data analysis is discussed. Finally, some illustrative examples of presented approach in the tasks of cluster analysis and classification are given
Multilingual Transformers for Product Matching -- Experiments and a New Benchmark in Polish
Product matching corresponds to the task of matching identical products
across different data sources. It typically employs available product features
which, apart from being multimodal, i.e., comprised of various data types,
might be non-homogeneous and incomplete. The paper shows that pre-trained,
multilingual Transformer models, after fine-tuning, are suitable for solving
the product matching problem using textual features both in English and Polish
languages. We tested multilingual mBERT and XLM-RoBERTa models in English on
Web Data Commons - training dataset and gold standard for large-scale product
matching. The obtained results show that these models perform similarly to the
latest solutions tested on this set, and in some cases, the results were even
better.
Additionally, we prepared a new dataset entirely in Polish and based on
offers in selected categories obtained from several online stores for the
research purpose. It is the first open dataset for product matching tasks in
Polish, which allows comparing the effectiveness of the pre-trained models.
Thus, we also showed the baseline results obtained by the fine-tuned mBERT and
XLM-RoBERTa models on the Polish datasets.Comment: 11 pages, 5 figure
Fuzzy modeling with the particle swarm optimization algorithm
The main goal of this paper is a description of clustering algorithm based on the particle swarm optimization algorithm, inspired on social behavior of animals and its application in fuzzy modeling. In the paper the idea of the heuristic swarm-based algorithm was presented, including a few modifications. Moreover, the results of the experimental evaluation were shown, both a selected optimization technique and its synthesi with a fuzzy modeling method referring to the k-means algorithm and the fuzzy control process
Analyzing the Influence of Language Model-Generated Responses in Mitigating Hate Speech on Social Media Directed at Ukrainian Refugees in Poland
In the context of escalating hate speech and polarization on social media,
this study investigates the potential of employing responses generated by Large
Language Models (LLM), complemented with pertinent verified knowledge links, to
counteract such trends. Through extensive A/B testing involving the posting of
753 automatically generated responses, the goal was to minimize the propagation
of hate speech directed at Ukrainian refugees in Poland.
The results indicate that deploying LLM-generated responses as replies to
harmful tweets effectively diminishes user engagement, as measured by
likes/impressions. When we respond to an original tweet, i.e., which is not a
reply, we reduce the engagement of users by over 20\% without increasing the
number of impressions. On the other hand, our responses increase the ratio of
the number of replies to a harmful tweet to impressions, especially if the
harmful tweet is not original. Additionally, the study examines how generated
responses influence the overall sentiment of tweets in the discussion,
revealing that our intervention does not significantly alter the mean
sentiment.
This paper suggests the implementation of an automatic moderation system to
combat hate speech on social media and provides an in-depth analysis of the A/B
experiment, covering methodology, data collection, and statistical outcomes.
Ethical considerations and challenges are also discussed, offering guidance for
the development of discourse moderation systems leveraging the capabilities of
generative AI
Using simulation to calibrate real data acquisition in veterinary medicine
This paper explores the innovative use of simulation environments to enhance
data acquisition and diagnostics in veterinary medicine, focusing specifically
on gait analysis in dogs. The study harnesses the power of Blender and the
Blenderproc library to generate synthetic datasets that reflect diverse
anatomical, environmental, and behavioral conditions. The generated data,
represented in graph form and standardized for optimal analysis, is utilized to
train machine learning algorithms for identifying normal and abnormal gaits.
Two distinct datasets with varying degrees of camera angle granularity are
created to further investigate the influence of camera perspective on model
accuracy. Preliminary results suggest that this simulation-based approach holds
promise for advancing veterinary diagnostics by enabling more precise data
acquisition and more effective machine learning models. By integrating
synthetic and real-world patient data, the study lays a robust foundation for
improving overall effectiveness and efficiency in veterinary medicine
Bio-inspired algorithm optimization of neural network for the prediction of Dubai crude oil price
Previous studies proposed several bio-inspired algorithms for the optimization
of Neural Network (NN) to avoid local minima and to improve accuracy
and convergence speed. To advance the performance of NN, a new bio-inspired algorithm
called Flower Pollination Algorithm (FPA) is used to optimize the weights and
bias of NN due to its ability to explore very large search space and frequent chosen
of similar solution. The FPA optimized NN (FPNN) was applied to build a
model for the prediction of Dubai crude oil price unlike previous studies that mainly
focus on theWest Texas Intermediate and Brent crude oil price benchmarks. Result
Wybrane aspekty zarządzania kryzysowego z wykorzystaniem bezzałogowych statków powietrznych (BSP) na przykładzie katastrofy komunikacyjnej
Purpose: The theoretical aim of this study was to present the impact of modern technologies on the improvement of the effectiveness of process activities (documenting) at the site of a communication disaster. On the other hand, the utilitarian goal was to present the improvement of the organization of documenting a mass incident with the use of drones and photogrammetry tools. Design and methods: As part of the exercise consisting of a simulation of a communication disaster, the activities were documented using the functionality of unmanned aerial vehicles which interact with an IT system (Pix4D application). The characteristic drone models which can be used in the monitoring of a disaster site were presented. The discussed research approach describes the methods used to perform drone flights and to what extent the photogrammetric method of processing digital images obtained from drones was used. The issue of field measurements (control points, control lines), the purpose of which was to determine the accuracy of mapping and matching to the coordinate system, was discussed. Results: As part of the research, images were captured and taken with the use of UAVs and IT systems, which were collated and compared with the results of measurements from the visual inspection of the disaster site, performed in a traditional manner by the representatives of the procedural entity. A comparative analysis of the collected research material leading to a comparison of the work results captured by means of the traditional procedural forms with the methods and techniques of modern technologies (drone with the Pix4D Cloud application) allows for the following conclusions to be drawn. For short measuring sections (up to 15 meters), the measurement accuracy of the two methods differs by about 1.5%. For longer measuring sections (up to 100 m), the measurement error is approx. 2.3%. Conclusions: In case of the UAV method and the application Pix4D Cloud, the sources of measurement errors should be seen in the accuracy of rendering of the details of the model (the quality of imaging) and the ability to use this application. On the other hand, when using the police method, in which the measurement trolley is the measuring tool, the sources of error should be seen in the uneven terrain, the obstacles in the terrain, and the measurement error of the tool itself (the trolley). The innovation of the project to use UAVs certainly gains importance especially in a terrain with limited accessibility, i.e. in hilly and mountainous terrain, at road intersections or forks.Cel: Celem teoretycznym niniejszego opracowania było przedstawienie wpływu nowoczesnych technologii na poprawę skuteczności działań procesowych (dokumentowania) na miejscu katastrofy komunikacyjnej. Natomiast celem utylitarnym artykułu było zaprezentowanie usprawnienia organizacji dokumentowania zdarzenia masowego z wykorzystaniem dronów i narzędzi fotogrametrii. Projekt i metody: W ramach ćwiczenia – symulacji katastrofy komunikacyjnej – udokumentowano zdarzenia z wykorzystaniem funkcjonalności bezzałogowych statków powietrznych (BSP), które współdziałają z systemem informatycznym (aplikacją Pix4D). Zaprezentowano charakterystyczne modele dronów, które mogą być wykorzystane w monitorowaniu miejsca katastrofy. W omówionym podejściu badawczym opisano metody wykonywania nalotów dronem oraz przedstawiono, w jakim zakresie wykorzystano metodę fotogrametryczną przetwarzania obrazów cyfrowych pochodzących z dronów. Przybliżono zagadnienie pomiarów terenowych (punkty kontrolne, linie kontrolne), których celem było określenie dokładności odwzorowania oraz dopasowania do układu współrzędnych. Wyniki: W badaniu utrwalono i wykonano obrazy z wykorzystaniem BSP i systemów informatycznych, które następnie zestawiono i porównano z wynikami pomiarów z oględzin miejsca katastrofy, wykonanych w sposób tradycyjny przez przedstawicieli podmiotu procesowego. Analiza porównawcza zgromadzonego materiału badawczego pozwoliła na zestawienie efektów pracy utrwalanych tradycyjnymi formami procesowymi z metodami i technikami nowoczesnych technologii (dron z aplikacją Pix4D Cloud) oraz sformułowanie konkluzji dotyczących dokładności pomiarów w zależności od długości odcinków wymiarowania. Na krótkich odcinkach wymiarowania (do 15 m) dokładność pomiarowa dwóch metod różni się o ok. 1,5%. Na dłuższych odcinkach wymiarowania (do 100 m) błąd pomiaru wynosi ok. 2,3%. Wnioski: W przypadku metody wykorzystującej BSP i aplikację Pix4D Cloud źródeł błędów pomiarowych należy szukać w dokładności odwzorowania szczegółów modelu (jakości obrazowania) i umiejętności posługiwania się tą aplikacją. Z kolei wykorzystując metodę policyjną, w której wózek do pomiarów stanowi narzędzie pomiarowe, źródeł błędu należy dopatrywać się w nierównym ukształtowaniu terenu, występujących przeszkodach terenowych, błędzie pomiarowego samego narzędzia (wózka). Innowacyjność projektu wykorzystania BSP z pewnością zyskuje na znaczeniu szczególnie w terenie o ograniczonej dostępności, tj. w terenie górzystym, pagórkowatym, na przecięciach lub rozwidleniach dróg