Search CORE

11 research outputs found

ToxiSpanSE: An Explainable Toxicity Detection in Code Review Comments

Author: Bosu Amiangshu
Saker Jaydeb
Sultana Sayma
Wilson Steven R.
Publication venue
Publication date: 07/07/2023
Field of study

Background: The existence of toxic conversations in open-source platforms can degrade relationships among software developers and may negatively impact software product quality. To help mitigate this, some initial work has been done to detect toxic comments in the Software Engineering (SE) domain. Aims: Since automatically classifying an entire text as toxic or non-toxic does not help human moderators to understand the specific reason(s) for toxicity, we worked to develop an explainable toxicity detector for the SE domain. Method: Our explainable toxicity detector can detect specific spans of toxic content from SE texts, which can help human moderators by automatically highlighting those spans. This toxic span detection model, ToxiSpanSE, is trained with the 19,651 code review (CR) comments with labeled toxic spans. Our annotators labeled the toxic spans within 3,757 toxic CR samples. We explored several types of models, including one lexicon-based approach and five different transformer-based encoders. Results: After an extensive evaluation of all models, we found that our fine-tuned RoBERTa model achieved the best score with 0.88

F1

, 0.87 precision, and 0.93 recall for toxic class tokens, providing an explainable toxicity classifier for the SE domain. Conclusion: Since ToxiSpanSE is the first tool to detect toxic spans in the SE domain, this tool will pave a path to combat toxicity in the SE community

arXiv.org e-Print Archive

Automated Identification of Sexual Orientation and Gender Identity Discriminatory Texts from Issue Comments

Author: Bosu Amiangshu
Israt Farzana
Paul Rajshakhar
Sarker Jaydeb
Sultana Sayma
Publication venue
Publication date: 14/11/2023
Field of study

In an industry dominated by straight men, many developers representing other gender identities and sexual orientations often encounter hateful or discriminatory messages. Such communications pose barriers to participation for women and LGBTQ+ persons. Due to sheer volume, manual inspection of all communications for discriminatory communication is infeasible for a large-scale Free Open-Source Software (FLOSS) community. To address this challenge, this study aims to develop an automated mechanism to identify Sexual orientation and Gender identity Discriminatory (SGID) texts from software developers' communications. On this goal, we trained and evaluated SGID4SE ( Sexual orientation and Gender Identity Discriminatory text identification for (4) Software Engineering texts) as a supervised learning-based SGID detection tool. SGID4SE incorporates six preprocessing steps and ten state-of-the-art algorithms. SGID4SE implements six different strategies to improve the performance of the minority class. We empirically evaluated each strategy and identified an optimum configuration for each algorithm. In our ten-fold cross-validation-based evaluations, a BERT-based model boosts the best performance with 85.9% precision, 80.0% recall, and 82.9% F1-Score for the SGID class. This model achieves 95.7% accuracy and 80.4% Matthews Correlation Coefficient. Our dataset and tool establish a foundation for further research in this direction

arXiv.org e-Print Archive

BaDLAD: A Large Multi-Domain Bengali Document Layout Analysis Dataset

Author: Ahmed Intesur
Ansary Md. Nazmuddoha
Chowdhury Sayma Sultana
Dhruvo Shahriar Elahi
Dip Souhardya Saha
Emon Mahfuzur Rahman
Haque Md. Rezwanul
Hasan Md. Rakibul
Hossen Syed Mobassir
Humayun Ahmed Imtiaz
Meghla Marsia Haque
Pavel Akib Hasan
Rakib Fazle Rabbi
Reasat Tahsin
Sadeque Farig
Shihab Md. Istiak Hossain
Sushmit Asif Shahriyar
Publication venue
Publication date: 10/03/2023
Field of study

While strides have been made in deep learning based Bengali Optical Character Recognition (OCR) in the past decade, the absence of large Document Layout Analysis (DLA) datasets has hindered the application of OCR in document transcription, e.g., transcribing historical documents and newspapers. Moreover, rule-based DLA systems that are currently being employed in practice are not robust to domain variations and out-of-distribution layouts. To this end, we present the first multidomain large Bengali Document Layout Analysis Dataset: BaDLAD. This dataset contains 33,695 human annotated document samples from six domains - i) books and magazines, ii) public domain govt. documents, iii) liberation war documents, iv) newspapers, v) historical newspapers, and vi) property deeds, with 710K polygon annotations for four unit types: text-box, paragraph, image, and table. Through preliminary experiments benchmarking the performance of existing state-of-the-art deep learning architectures for English DLA, we demonstrate the efficacy of our dataset in training deep learning based Bengali document digitization models

arXiv.org e-Print Archive

Exposure-Based Screening for Nipah Virus Encephalitis, Bangladesh

Author: Emily S. Gurley
Hossain M.S. Sazzad
Mahmudur Rahman
Peter Daszak
Sayma Afroj
Sharmin Sultana
Stephen P. Luby
Ute Ströher
Publication venue: 'Centers for Disease Control and Prevention (CDC)'
Publication date: 01/02/2015
Field of study

We measured the performance of exposure screening questions to identify Nipah virus encephalitis in hospitalized encephalitis patients during the 2012–13 Nipah virus season in Bangladesh. The sensitivity (93%), specificity (82%), positive predictive value (37%), and negative predictive value (99%) results suggested that screening questions could more quickly identify persons with Nipah virus encephalitis

Directory of Open Access Journals

PubMed Central

Exposure-Based Screening for Nipah Virus Encephalitis, Bangladesh

Author: Chadha
Chua
Chua
Emily S. Gurley
Gurley
Harcourt
Hossain M.S. Sazzad
Khan
Luby
Luby
Mahmudur Rahman
Michalak
Peter Daszak
Ramasundpum
Sayma Afroj
Sazzad
Sharmin Sultana
Stephen P. Luby
Ute Ströher
Publication venue: 'Centers for Disease Control and Prevention (CDC)'
Publication date
Field of study

Crossref

OOD-Speech: A Large Bengali Speech Recognition Dataset for Out-of-Distribution Benchmarking

Author: Alam Samiul
Ansary Md. Nazmuddoha
Chowdhury Sayma Sultana
Dip Souhardya Saha
Hossen Syed Mobassir
Humayun Ahmed Imtiaz
Mamun Mamunur
Meghla Marsia Haque
Rakib Fazle Rabbi
Reasat Tahsin
Sadeque Farig
Shihab Md. Istiak Hossain
Sushmit Asif
Tasnim Nazia
Publication venue
Publication date: 15/05/2023
Field of study

We present OOD-Speech, the first out-of-distribution (OOD) benchmarking dataset for Bengali automatic speech recognition (ASR). Being one of the most spoken languages globally, Bengali portrays large diversity in dialects and prosodic features, which demands ASR frameworks to be robust towards distribution shifts. For example, islamic religious sermons in Bengali are delivered with a tonality that is significantly different from regular speech. Our training dataset is collected via massively online crowdsourcing campaigns which resulted in 1177.94 hours collected and curated from

22,645

native Bengali speakers from South Asia. Our test dataset comprises 23.03 hours of speech collected and manually annotated from 17 different sources, e.g., Bengali TV drama, Audiobook, Talk show, Online class, and Islamic sermons to name a few. OOD-Speech is jointly the largest publicly available speech dataset, as well as the first out-of-distribution ASR benchmarking dataset for Bengali

arXiv.org e-Print Archive

Transmission of Nipah Virus - 14 Years of Investigations in Bangladesh

Author: Afroj Sayma
Cauchemez Simon
Daszak Peter
Gurley Emily,
Hossain M. Jahangir
Khan A.K.M. Dawlat
Kilpatrick A. Marm
Klena John,
Luby Stephen,
Nichol Stuart,
Nikolay Birgit
Pulliam Juliet,
Rahman Mahmudur
Salje Henrik
Sazzad Hossain,
Ströher Ute
Sultana Sharmin
Publication venue: 'Massachusetts Medical Society'
Publication date: 09/05/2019
Field of study

International audienceBackgroundNipah virus is a highly virulent zoonotic pathogen that can be transmitted between humans. Understanding the dynamics of person-to-person transmission is key to designing effective interventions.MethodsWe used data from all Nipah virus cases identified during outbreak investigations in Bangladesh from April 2001 through April 2014 to investigate case-patient characteristics associated with onward transmission and factors associated with the risk of infection among patient contacts.ResultsOf 248 Nipah virus cases identified, 82 were caused by person-to-person transmission, corresponding to a reproduction number (i.e., the average number of secondary cases per case patient) of 0.33 (95% confidence interval [CI], 0.19 to 0.59). The predicted reproduction number increased with the case patient’s age and was highest among patients 45 years of age or older who had difficulty breathing (1.1; 95% CI, 0.4 to 3.2). Case patients who did not have difficulty breathing infected 0.05 times as many contacts (95% CI, 0.01 to 0.3) as other case patients did. Serologic testing of 1863 asymptomatic contacts revealed no infections. Spouses of case patients were more often infected (8 of 56 [14%]) than other close family members (7 of 547 [1.3%]) or other contacts (18 of 1996 [0.9%]). The risk of infection increased with increased duration of exposure of the contacts (adjusted odds ratio for exposure of >48 hours vs. ≤1 hour, 13; 95% CI, 2.6 to 62) and with exposure to body fluids (adjusted odds ratio, 4.3; 95% CI, 1.6 to 11).ConclusionsIncreasing age and respiratory symptoms were indicators of infectivity of Nipah virus. Interventions to control person-to-person transmission should aim to reduce exposure to body fluids. (Funded by the National Institutes of Health and others.

HAL Descartes

HAL-Pasteur

Changing Contact Patterns Over Disease Progression: Nipah Virus as a Case Study

Author: Afroj Sayma
Cauchemez Simon
Daszak Peter
Gurley Emily
Hossain M Jahangir
Khan A
Kilpatrick A Marm
Klena John
Lee Kyu Han
Luby Stephen
Nichol Stuart
Nikolay Birgit
Pulliam Juliet
Rahman Mahmudur
Salje Henrik
Satter Syed Moinuddin
Sazzad Hossain
Sultana Sharmin
Publication venue: 'Oxford University Press (OUP)'
Publication date: 02/03/2020
Field of study

International audienceAbstract Contact patterns play a key role in disease transmission, and variation in contacts during the course of illness can influence transmission, particularly when accompanied by changes in host infectiousness. We used surveys among 1642 contacts of 94 Nipah virus case patients in Bangladesh to determine how contact patterns (physical and with bodily fluids) changed as disease progressed in severity. The number of contacts increased with severity and, for case patients who died, peaked on the day of death. Given transmission has only been observed among fatal cases of Nipah virus infection, our findings suggest that changes in contact patterns during illness contribute to risk of infection

PubMed Central

HAL-Pasteur

Transmission of Nipah virus — 14 years of investigations in Bangladesh

Author: Afroj Sayma
Cauchemez Simon
Daszak Peter
Gurley Emily S.
Hossain Jahangir
Khan Dawlat
Kilpatrick Marm
Klena John D.
Luby Stephen P.
Nichol Stuart T.
Nikolay Birgit
Pulliam Juliet R. C.
Rahman Mahmudur
Salje Henrik
Sazzad Hossain M. S.
Stroher Ute
Sultana Sharmin
Publication venue: 'Massachusetts Medical Society'
Publication date: 09/05/2019
Field of study

CITATION: Nikolay, B. et al. 2019. Transmission of Nipah Virus — 14 Years of Investigations in Bangladesh. New England Journal of Medicine, 380(19):1804-1814. doi:10.1056/NEJMoa1805376The original publication is available at https://www.nejm.org/BACKGROUND: Nipah virus is a highly virulent zoonotic pathogen that can be transmitted between humans. Understanding the dynamics of person-to-person transmission is key to designing effective interventions. METHODS: We used data from all Nipah virus cases identified during outbreak investigations in Bangladesh from April 2001 through April 2014 to investigate case-patient characteristics associated with onward transmission and factors associated with the risk of infection among patient contacts. RESULTS: Of 248 Nipah virus cases identified, 82 were caused by person-to-person transmission, corresponding to a reproduction number (i.e., the average number of secondary cases per case patient) of 0.33 (95% confidence interval [CI], 0.19 to 0.59). The predicted reproduction number increased with the case patient’s age and was highest among patients 45 years of age or older who had difficulty breathing (1.1; 95% CI, 0.4 to 3.2). Case patients who did not have difficulty breathing infected 0.05 times as many contacts (95% CI, 0.01 to 0.3) as other case patients did. Serologic testing of 1863 asymptomatic contacts revealed no infections. Spouses of case patients were more often infected (8 of 56 [14%]) than other close family members (7 of 547 [1.3%]) or other contacts (18 of 1996 [0.9%]). The risk of infection increased with increased duration of exposure of the contacts (adjusted odds ratio for exposure of >48 hours vs. ≤1 hour, 13; 95% CI, 2.6 to 62) and with exposure to body fluids (adjusted odds ratio, 4.3; 95% CI, 1.6 to 11). CONCLUSIONS: Increasing age and respiratory symptoms were indicators of infectivity of Nipah virus. Interventions to control person-to-person transmission should aim to reduce exposure to body fluids. (Funded by the National Institutes of Health and others.)National Institutes of Healthhttps://www.nejm.org/doi/full/10.1056/NEJMoa1805376Publisher’s versio

Stellenbosch University SUNScholar Repository