3 research outputs found
Identifying False Content and Hate Speech in Sinhala YouTube Videos by Analyzing the Audio
YouTube faces a global crisis with the dissemination of false information and
hate speech. To counter these issues, YouTube has implemented strict rules
against uploading content that includes false information or promotes hate
speech. While numerous studies have been conducted to reduce offensive
English-language content, there's a significant lack of research on Sinhala
content. This study aims to address the aforementioned gap by proposing a
solution to minimize the spread of violence and misinformation in Sinhala
YouTube videos. The approach involves developing a rating system that assesses
whether a video contains false information by comparing the title and
description with the audio content and evaluating whether the video includes
hate speech. The methodology encompasses several steps, including audio
extraction using the Pytube library, audio transcription via the fine-tuned
Whisper model, hate speech detection employing the distilroberta-base model and
a text classification LSTM model, and text summarization through the fine-tuned
BART-Large- XSUM model. Notably, the Whisper model achieved a 48.99\% word
error rate, while the distilroberta-base model demonstrated an F1 score of
0.856 and a recall value of 0.861 in comparison to the LSTM model, which
exhibited signs of overfitting
Nearly Linear-Time, Parallelizable Algorithms for Non-Monotone Submodular Maximization
We study parallelizable algorithms for maximization of a submodular function,
not necessarily monotone, with respect to a cardinality constraint . We
improve the best approximation factor achieved by an algorithm that has optimal
adaptivity and query complexity, up to logarithmic factors in the size of
the ground set, from to . We provide two
algorithms; the first has approximation ratio , adaptivity , and query complexity , while the second has
approximation ratio , adaptivity , and query
complexity . Heuristic versions of our algorithms are empirically
validated to use a low number of adaptive rounds and total queries while
obtaining solutions with high objective value in comparison with highly
adaptive approximation algorithms.Comment: 24 pages, 2 figure