11 research outputs found
ToxiSpanSE: An Explainable Toxicity Detection in Code Review Comments
Background: The existence of toxic conversations in open-source platforms can
degrade relationships among software developers and may negatively impact
software product quality. To help mitigate this, some initial work has been
done to detect toxic comments in the Software Engineering (SE) domain. Aims:
Since automatically classifying an entire text as toxic or non-toxic does not
help human moderators to understand the specific reason(s) for toxicity, we
worked to develop an explainable toxicity detector for the SE domain. Method:
Our explainable toxicity detector can detect specific spans of toxic content
from SE texts, which can help human moderators by automatically highlighting
those spans. This toxic span detection model, ToxiSpanSE, is trained with the
19,651 code review (CR) comments with labeled toxic spans. Our annotators
labeled the toxic spans within 3,757 toxic CR samples. We explored several
types of models, including one lexicon-based approach and five different
transformer-based encoders. Results: After an extensive evaluation of all
models, we found that our fine-tuned RoBERTa model achieved the best score with
0.88 , 0.87 precision, and 0.93 recall for toxic class tokens, providing an
explainable toxicity classifier for the SE domain. Conclusion: Since ToxiSpanSE
is the first tool to detect toxic spans in the SE domain, this tool will pave a
path to combat toxicity in the SE community
Automated Identification of Sexual Orientation and Gender Identity Discriminatory Texts from Issue Comments
In an industry dominated by straight men, many developers representing other
gender identities and sexual orientations often encounter hateful or
discriminatory messages. Such communications pose barriers to participation for
women and LGBTQ+ persons. Due to sheer volume, manual inspection of all
communications for discriminatory communication is infeasible for a large-scale
Free Open-Source Software (FLOSS) community. To address this challenge, this
study aims to develop an automated mechanism to identify Sexual orientation and
Gender identity Discriminatory (SGID) texts from software developers'
communications. On this goal, we trained and evaluated SGID4SE ( Sexual
orientation and Gender Identity Discriminatory text identification for (4)
Software Engineering texts) as a supervised learning-based SGID detection tool.
SGID4SE incorporates six preprocessing steps and ten state-of-the-art
algorithms. SGID4SE implements six different strategies to improve the
performance of the minority class. We empirically evaluated each strategy and
identified an optimum configuration for each algorithm. In our ten-fold
cross-validation-based evaluations, a BERT-based model boosts the best
performance with 85.9% precision, 80.0% recall, and 82.9% F1-Score for the SGID
class. This model achieves 95.7% accuracy and 80.4% Matthews Correlation
Coefficient. Our dataset and tool establish a foundation for further research
in this direction
BaDLAD: A Large Multi-Domain Bengali Document Layout Analysis Dataset
While strides have been made in deep learning based Bengali Optical Character
Recognition (OCR) in the past decade, the absence of large Document Layout
Analysis (DLA) datasets has hindered the application of OCR in document
transcription, e.g., transcribing historical documents and newspapers.
Moreover, rule-based DLA systems that are currently being employed in practice
are not robust to domain variations and out-of-distribution layouts. To this
end, we present the first multidomain large Bengali Document Layout Analysis
Dataset: BaDLAD. This dataset contains 33,695 human annotated document samples
from six domains - i) books and magazines, ii) public domain govt. documents,
iii) liberation war documents, iv) newspapers, v) historical newspapers, and
vi) property deeds, with 710K polygon annotations for four unit types:
text-box, paragraph, image, and table. Through preliminary experiments
benchmarking the performance of existing state-of-the-art deep learning
architectures for English DLA, we demonstrate the efficacy of our dataset in
training deep learning based Bengali document digitization models
Exposure-Based Screening for Nipah Virus Encephalitis, Bangladesh
We measured the performance of exposure screening questions to identify Nipah virus encephalitis in hospitalized encephalitis patients during the 2012–13 Nipah virus season in Bangladesh. The sensitivity (93%), specificity (82%), positive predictive value (37%), and negative predictive value (99%) results suggested that screening questions could more quickly identify persons with Nipah virus encephalitis
OOD-Speech: A Large Bengali Speech Recognition Dataset for Out-of-Distribution Benchmarking
We present OOD-Speech, the first out-of-distribution (OOD) benchmarking
dataset for Bengali automatic speech recognition (ASR). Being one of the most
spoken languages globally, Bengali portrays large diversity in dialects and
prosodic features, which demands ASR frameworks to be robust towards
distribution shifts. For example, islamic religious sermons in Bengali are
delivered with a tonality that is significantly different from regular speech.
Our training dataset is collected via massively online crowdsourcing campaigns
which resulted in 1177.94 hours collected and curated from native
Bengali speakers from South Asia. Our test dataset comprises 23.03 hours of
speech collected and manually annotated from 17 different sources, e.g.,
Bengali TV drama, Audiobook, Talk show, Online class, and Islamic sermons to
name a few. OOD-Speech is jointly the largest publicly available speech
dataset, as well as the first out-of-distribution ASR benchmarking dataset for
Bengali
Transmission of Nipah Virus - 14 Years of Investigations in Bangladesh
International audienceBackgroundNipah virus is a highly virulent zoonotic pathogen that can be transmitted between humans. Understanding the dynamics of person-to-person transmission is key to designing effective interventions.MethodsWe used data from all Nipah virus cases identified during outbreak investigations in Bangladesh from April 2001 through April 2014 to investigate case-patient characteristics associated with onward transmission and factors associated with the risk of infection among patient contacts.ResultsOf 248 Nipah virus cases identified, 82 were caused by person-to-person transmission, corresponding to a reproduction number (i.e., the average number of secondary cases per case patient) of 0.33 (95% confidence interval [CI], 0.19 to 0.59). The predicted reproduction number increased with the case patient’s age and was highest among patients 45 years of age or older who had difficulty breathing (1.1; 95% CI, 0.4 to 3.2). Case patients who did not have difficulty breathing infected 0.05 times as many contacts (95% CI, 0.01 to 0.3) as other case patients did. Serologic testing of 1863 asymptomatic contacts revealed no infections. Spouses of case patients were more often infected (8 of 56 [14%]) than other close family members (7 of 547 [1.3%]) or other contacts (18 of 1996 [0.9%]). The risk of infection increased with increased duration of exposure of the contacts (adjusted odds ratio for exposure of >48 hours vs. ≤1 hour, 13; 95% CI, 2.6 to 62) and with exposure to body fluids (adjusted odds ratio, 4.3; 95% CI, 1.6 to 11).ConclusionsIncreasing age and respiratory symptoms were indicators of infectivity of Nipah virus. Interventions to control person-to-person transmission should aim to reduce exposure to body fluids. (Funded by the National Institutes of Health and others.
Changing Contact Patterns Over Disease Progression: Nipah Virus as a Case Study
International audienceAbstract Contact patterns play a key role in disease transmission, and variation in contacts during the course of illness can influence transmission, particularly when accompanied by changes in host infectiousness. We used surveys among 1642 contacts of 94 Nipah virus case patients in Bangladesh to determine how contact patterns (physical and with bodily fluids) changed as disease progressed in severity. The number of contacts increased with severity and, for case patients who died, peaked on the day of death. Given transmission has only been observed among fatal cases of Nipah virus infection, our findings suggest that changes in contact patterns during illness contribute to risk of infection
Transmission of Nipah virus — 14 years of investigations in Bangladesh
CITATION: Nikolay, B. et al. 2019. Transmission of Nipah Virus — 14 Years of Investigations in Bangladesh. New England Journal of Medicine, 380(19):1804-1814. doi:10.1056/NEJMoa1805376The original publication is available at https://www.nejm.org/BACKGROUND: Nipah virus is a highly virulent zoonotic pathogen that can be transmitted between humans. Understanding the dynamics of person-to-person transmission is key to designing effective interventions.
METHODS: We used data from all Nipah virus cases identified during outbreak investigations in Bangladesh from April 2001 through April 2014 to investigate case-patient characteristics associated with onward transmission and factors associated with the risk of infection among patient contacts.
RESULTS: Of 248 Nipah virus cases identified, 82 were caused by person-to-person transmission, corresponding to a reproduction number (i.e., the average number of secondary cases per case patient) of 0.33 (95% confidence interval [CI], 0.19 to 0.59). The predicted reproduction number increased with the case patient’s age and was highest among patients 45 years of age or older who had difficulty breathing (1.1; 95% CI, 0.4 to 3.2). Case patients who did not have difficulty breathing infected 0.05 times as many contacts (95% CI, 0.01 to 0.3) as other case patients did. Serologic testing of 1863 asymptomatic contacts revealed no infections. Spouses of case patients were more often infected (8 of 56 [14%]) than other close family members (7 of 547 [1.3%]) or other contacts (18 of 1996 [0.9%]). The risk of infection increased with increased duration of exposure of the contacts (adjusted odds ratio for exposure of >48 hours vs. ≤1 hour, 13; 95% CI, 2.6 to 62) and with exposure to body fluids (adjusted odds ratio, 4.3; 95% CI, 1.6 to 11).
CONCLUSIONS: Increasing age and respiratory symptoms were indicators of infectivity of Nipah virus. Interventions to control person-to-person transmission should aim to reduce exposure to body fluids. (Funded by the National Institutes of Health and others.)National Institutes of Healthhttps://www.nejm.org/doi/full/10.1056/NEJMoa1805376Publisher’s versio