Search CORE

9 research outputs found

SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization

Author: Biesek Maciej
Gliwa Bogdan
Mochol Iwona
Wawer Aleksander
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

This paper introduces the SAMSum Corpus, a new dataset with abstractive dialogue summaries. We investigate the challenges it poses for automated summarization by testing several models and comparing their results with those obtained on a corpus of news articles. We show that model-generated summaries of dialogues achieve higher ROUGE scores than the model-generated summaries of news -- in contrast with human evaluators' judgement. This suggests that a challenging task of abstractive dialogue summarization requires dedicated models and non-standard quality measures. To our knowledge, our study is the first attempt to introduce a high-quality chat-dialogues corpus, manually annotated with abstractive summarizations, which can be used by the research community for further studies.Comment: Attachment contains the described dataset archived in 7z format. Please see the attached readme and licence. Update of the previous version: changed formats of train/val/test files in corpus.7

arXiv.org e-Print Archive

Crossref

Detecting Deceptive Utterances Using Deep Pre-Trained Neural Networks

Author: Aleksander Wawer
Justyna Sarzyńska-Wawer
Publication venue: 'MDPI AG'
Publication date: 09/06/2022
Field of study

Lying is an integral part of everyday communication in both written and oral forms. Detecting lies is therefore essential and has many possible applications. Our study aims to investigate the performance of automated lie detection methods, namely the most recent breed of pre-trained transformer neural networks capable of processing the Polish language. We used a dataset of nearly 1500 true and false statements, half of which were transcripts and the other half written statements, originating from possibly the largest study of deception in the Polish language. Technically, the problem was posed as text classification. We found that models perform better on typed than spoken utterances. The best-performing model achieved an accuracy of 0.69, which is much higher than the human performance average of 0.56. For transcribed utterances, human performance was at 0.58 and the models reached 0.62. We also explored model interpretability based on integrated gradient to shed light on classifier decisions. Our observations highlight the role of first words and phrases in model decisions, but more work is needed to systematically explore the observed patterns

Multidisciplinary Digital Publishing Institute

Czy komputer rozpozna hejtera? Wykorzystanie uczenia maszynowego (ML) w jakościowej analizie danych

Author: Troszyński Marek
Wawer Aleksander
Publication venue: 'Uniwersytet Lodzki (University of Lodz)'
Publication date: 01/01/2017
Field of study

Celem artykułu jest przedstawienie procesu automatyzacji kodowania tekstów pochodzących z mediów społecznościowych. Wdrożenie tego procesu pozwala na ilościowe potraktowanie jakościowych metod analizy treści. W efekcie otrzymujemy możliwość przeprowadzenia analizy na korpusach liczących setki tysięcy tekstów, które są kodowane w oparciu o ich znaczenia. Jest to możliwe dzięki wykorzystaniu algorytmów uczenia maszynowego (ML). Omawianą metodę kodowania prezentujemy na przykładzie projektu oznaczania „mowy nienawiści” w tekstach pochodzących z polskich forów internetowych. Kluczowym problemem jest precyzyjna konceptualizacja i operacjonalizacja tej kategorii. Pozwala to na przygotowanie dokładnej instrukcji kodowej oraz przeprowadzenie treningu zespołu kodującego. Efektem jest podwyższenie współczynnika zgodności kodujących. Oznaczone teksty zostaną wykorzystane jako dane treningowe dla metod automatycznej kategoryzacji opartych o algorytmy uczenia maszynowego. W dalszej części artykułu opisujemy zastosowane metody kodowania automatycznego. Tekst kończy podsumowanie wskazujące na czynniki, które są kluczowe dla procesu badawczego wykorzystującego uczenie maszynowe.The purpose of this article is to present the process of automatic tagging of hate speech in social media. The implementation of this process allows for quantitative treatment of qualitative methods: analysis on the corpora of hundreds thousands of texts based on their meaning. The process is possible through algorithms of machine learning (ML). The example of the hate speech designation project in texts from Polish online forums is presented. The key issue is the precise of conceptualization and operationalization of category “hate speech.” This allows for preparing specific instructions and conducting the training code unit. As a result we get higher rates of inter-coder agreement. Marked texts will be used as training data for automated categorization methods based on ML algorithms. Then we describe the course of machine coding. This article also seeks to establish problems associated with automatic coding of hate speech and propose solutions. In summary, we point the factors that are crucial to the research process that uses machine learning

Biblioteka Nauki - repozytorium artykuÅÃ³w

Social language in autism spectrum disorder: A computational analysis of sentiment and linguistic abstraction.

Author: Aleksander Wawer
Izabela Chojnicka
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2020
Field of study

Individuals with autism spectrum disorder (ASD) demonstrate impairments with pragmatic (social) language, including narrative skills and conversational abilities. We aimed to quantitatively characterize narrative performance in ASD using natural language processing techniques: sentiment and language abstraction analyses based on the Linguistic Category Model. Individuals with ASD and with typical development matched for age, gender, ethnicity, and verbal and nonverbal intelligence quotients produced language samples during two standardized tasks from the Autism Diagnostic Observation Schedule, Second Edition assessment: Telling a Story from a Book and Description of a Picture. Only the narratives produced during the Book Task differed between ASD and control groups in terms of emotional polarity and language abstraction. Participants with typical development used words with positive sentiment more often in comparison to individuals with ASD. In the case of words with negative sentiment, the differences were marginally significant (participants with typical development used words with negative sentiment more often). The Book Task narratives of individuals with ASD were also characterized by a lower level of language abstraction than narratives of peers with typical development. Linguistic abstraction was strongly positively correlated with a higher number of words with emotional polarity. Neither linguistic abstraction nor emotional polarity correlated with participants' age or verbal and nonverbal IQ. The results support the promise of sentiment and language abstraction analyses as a useful tool for the quantitative, fully automated assessment of narrative abilities among individuals with ASD

Directory of Open Access Journals

Can a Computer Recognize Hate Speech? Machine Learning (ML) in Qualitative Data Analysis

Author: Troszyński Marek
Wawer Aleksander
Publication venue: Uniwersytet Łódzki
Publication date: 01/01/2017
Field of study

Repozytorium Uniwersytetu Łódzkiego (University of Lodz Repository)

Spiral of hatred: social effects in Internet auctions. Between informativity and emotion

Author: A. Talwar
A. Wawer
Adam Wierzbicki
Aleksander Wawer
E. J. Friedman
H. Baayen
H. Wydra
J. Kleinberg
J. Philip
K. A. Ganesan
Radoslaw Nielek
S. S. Standifird
T. Scheff
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

“Pi of the Sky” Detector

Author: Agnieszka Majczyna
Aleksander F. Żarnecki
Ariel Majcher
Grzegorz Kasprowicz
Grzegorz Wrochna
Henryk Czyrkowski
Janusz Użycki
Katarzyna Małek
Krzysztof Nawrocki
Lech Mankiewicz
Lech W. Piotrowski
Marcin Sokołowski
Marcin Zaremba
Maria Ptasińska
Małgorzata Siudek
Mikołaj Ćwiok
Piotr Wawer
Robert Pietrzak
Roman Wawrzaszek
Ryszard Dąbrowski
Tadeusz Batsch
Wojciech Dominik
Publication venue: Hindawi Limited
Publication date: 01/01/2010
Field of study

“Pi of the Sky” experiment has been designed for continuous observations of a large part of the sky, in search for astrophysical phenomena characterized by short timescales, especially for prompt optical counterparts of Gamma Ray Bursts (GRBs). Other scientific goals include searching for novae and supernovae stars and monitoring of blasars and AGNs activity. “Pi of the Sky” is a fully autonomous, robotic detector, which can operate for long periods of time without a human supervision. A crucial element of the detector is an advanced software for real-time data analysis and identification of short optical transients. The most important result so far has been an independent detection and observation of the prompt optical emission of the “naked-eye” GRB080319B

Crossref

Directory of Open Access Journals