Search CORE

3 research outputs found

Identifying False Content and Hate Speech in Sinhala YouTube Videos by Analyzing the Audio

Author: Abeywardhana Lakmini
Athukorala A. Sahashra Udani
Karunasena A.
Subasinghe Sameeri Sathsara
Wickramaarachchi W. A. K. M.
Wijerathna K. K. Rashani Tharushika
Publication venue
Publication date: 30/01/2024
Field of study

YouTube faces a global crisis with the dissemination of false information and hate speech. To counter these issues, YouTube has implemented strict rules against uploading content that includes false information or promotes hate speech. While numerous studies have been conducted to reduce offensive English-language content, there's a significant lack of research on Sinhala content. This study aims to address the aforementioned gap by proposing a solution to minimize the spread of violence and misinformation in Sinhala YouTube videos. The approach involves developing a rating system that assesses whether a video contains false information by comparing the title and description with the audio content and evaluating whether the video includes hate speech. The methodology encompasses several steps, including audio extraction using the Pytube library, audio transcription via the fine-tuned Whisper model, hate speech detection employing the distilroberta-base model and a text classification LSTM model, and text summarization through the fine-tuned BART-Large- XSUM model. Notably, the Whisper model achieved a 48.99\% word error rate, while the distilroberta-base model demonstrated an F1 score of 0.856 and a recall value of 0.861 in comparison to the LSTM model, which exhibited signs of overfitting

arXiv.org e-Print Archive

Nearly Linear-Time, Parallelizable Algorithms for Non-Monotone Submodular Maximization

Author: Kuhnle Alan
Publication venue
Publication date: 03/09/2020
Field of study

We study parallelizable algorithms for maximization of a submodular function, not necessarily monotone, with respect to a cardinality constraint

k

. We improve the best approximation factor achieved by an algorithm that has optimal adaptivity and query complexity, up to logarithmic factors in the size

n

of the ground set, from

0.039 - \epsilon

0.193 - \epsilon

. We provide two algorithms; the first has approximation ratio

1/6 - \epsilon

, adaptivity

O( \log n )

, and query complexity

O( n \log k )

, while the second has approximation ratio

0.193 - \epsilon

, adaptivity

O( \log^2 n )

, and query complexity

O(n \log k)

. Heuristic versions of our algorithms are empirically validated to use a low number of adaptive rounds and total queries while obtaining solutions with high objective value in comparison with highly adaptive approximation algorithms.Comment: 24 pages, 2 figure

arXiv.org e-Print Archive