Search CORE

10 research outputs found

UR-FUNNY: A Multimodal Language Dataset for Understanding Humor

Author: Hasan Md Kamrul
Hoque
Mohammed
Morency Louis-Philippe
Rahman Wasifur
Tanveer Md Iftekhar
Zadeh Amir
Zhong Jianyuan
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

Humor is a unique and creative communicative behavior displayed during social interactions. It is produced in a multimodal manner, through the usage of words (text), gestures (vision) and prosodic cues (acoustic). Understanding humor from these three modalities falls within boundaries of multimodal language; a recent research trend in natural language processing that models natural language as it happens in face-to-face communication. Although humor detection is an established research area in NLP, in a multimodal context it is an understudied area. This paper presents a diverse multimodal dataset, called UR-FUNNY, to open the door to understanding multimodal language used in expressing humor. The dataset and accompanying studies, present a framework in multimodal humor detection for the natural language processing community. UR-FUNNY is publicly available for research

arXiv.org e-Print Archive

Crossref

TextMI: Textualize Multimodal Information for Integrating Non-verbal Cues in Pre-trained Language Models

Author: Hasan Md Kamrul
Hoque Ehsan
Islam Md Saiful
Khan Mohammed Ibrahim
Lee Sangwu
Naim Iftekhar
Rahman Wasifur
Publication venue
Publication date: 29/03/2023
Field of study

Pre-trained large language models have recently achieved ground-breaking performance in a wide variety of language understanding tasks. However, the same model can not be applied to multimodal behavior understanding tasks (e.g., video sentiment/humor detection) unless non-verbal features (e.g., acoustic and visual) can be integrated with language. Jointly modeling multiple modalities significantly increases the model complexity, and makes the training process data-hungry. While an enormous amount of text data is available via the web, collecting large-scale multimodal behavioral video datasets is extremely expensive, both in terms of time and money. In this paper, we investigate whether large language models alone can successfully incorporate non-verbal information when they are presented in textual form. We present a way to convert the acoustic and visual information into corresponding textual descriptions and concatenate them with the spoken text. We feed this augmented input to a pre-trained BERT model and fine-tune it on three downstream multimodal tasks: sentiment, humor, and sarcasm detection. Our approach, TextMI, significantly reduces model complexity, adds interpretability to the model's decision, and can be applied for a diverse set of tasks while achieving superior (multimodal sarcasm detection) or near SOTA (multimodal sentiment analysis and multimodal humor detection) performance. We propose TextMI as a general, competitive baseline for multimodal behavioral analysis tasks, particularly in a low-resource setting

arXiv.org e-Print Archive

Using AI to Measure Parkinson's Disease Severity at Home

Author: Abdelkader Abdelrahman
Adams Jamie L.
Dorsey E. Ray
Hoque Ehsan
Islam Md Saiful
Lee Sangwu
Rahman Wasifur
Schneider Ruth B.
Yang Phillip T.
Publication venue
Publication date: 17/08/2023
Field of study

We present an artificial intelligence system to remotely assess the motor performance of individuals with Parkinson's disease (PD). Participants performed a motor task (i.e., tapping fingers) in front of a webcam, and data from 250 global participants were rated by three expert neurologists following the Movement Disorder Society Unified Parkinson's Disease Rating Scale (MDS-UPDRS). The neurologists' ratings were highly reliable, with an intra-class correlation coefficient (ICC) of 0.88. We developed computer algorithms to obtain objective measurements that align with the MDS-UPDRS guideline and are strongly correlated with the neurologists' ratings. Our machine learning model trained on these measures outperformed an MDS-UPDRS certified rater, with a mean absolute error (MAE) of 0.59 compared to the rater's MAE of 0.79. However, the model performed slightly worse than the expert neurologists (0.53 MAE). The methodology can be replicated for similar motor tasks, providing the possibility of evaluating individuals with PD and other movement disorders remotely, objectively, and in areas with limited access to neurological care

arXiv.org e-Print Archive

AI and Machine Learning

Author: Rahman Shah Mohammed Wasifur
Publication venue: 'Academy of Traumatology'
Publication date: 01/01/2020
Field of study

A primer to AI and Machine Learning which also touches upon "good" and "bad" AI and its relationship with governments and corporations

Coventry University Pure Portal

CERN Document Server

Blind Men and the Elephant: Demystifying the Global IT Services Industry

Author: Kurien Priya
Rahman Shah Mohammed Wasifur
Publication venue: 'Academy of Traumatology'
Publication date: 01/09/2007
Field of study

Coventry University Pure Portal

The case for re‐examining IT effectiveness

Author: Kurien Priya
Purushottam V.S
Rahman Shah Mohammed Wasifur
Publication venue: 'Emerald'
Publication date: 01/04/2004
Field of study

Crossref

Coventry University Pure Portal

Humor Knowledge Enriched Transformer for Understanding Multimodal Humor

Author: Hasan Md Kamrul
Hoque Ehsan
Lee Sangwu
Mihalcea Rada
Morency Louis-Philippe
Rahman Wasifur
Zadeh Amir
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 18/05/2021
Field of study

Recognizing humor from a video utterance requires understanding the verbal and non-verbal components as well as incorporating the appropriate context and external knowledge. In this paper, we propose Humor Knowledge enriched Transformer (HKT) that can capture the gist of a multimodal humorous expression by integrating the preceding context and external knowledge. We incorporate humor centric external knowledge into the model by capturing the ambiguity and sentiment present in the language. We encode all the language, acoustic, vision, and humor centric features separately using Transformer based encoders, followed by a cross attention layer to exchange information among them. Our model achieves 77.36% and 79.41% accuracy in humorous punchline detection on UR-FUNNY and MUStaRD datasets -- achieving a new state-of-the-art on both datasets with the margin of 4.93% and 2.94% respectively. Furthermore, we demonstrate that our model can capture interpretable, humor-inducing patterns from all modalities

Association for the Advancement of Artificial Intelligence: AAAI Publications

Detecting Parkinson Disease Using a Web-Based Speech Task: Observational Study

Author: Aayush Sarkar
Abdullah Al Mamun
Christopher Tarolli
E Ray Dorsey
Ehsan Hoque
Ellen Wagner
Emma Waddell
Harshil Ratnu
Jamie Adams
Julia Soto
Karlo Lizarraga
Madeleine Coffey
Max A Little
Md Saiful Islam
Meghan Pawlik
Mohammad Rafayet Ali
Ruth Schneider
Sangwu Lee
Stella Jensen-Roberts
Taylor Myers
Victor Nikhil Antony
Wasifur Rahman
Publication venue: JMIR Publications
Publication date: 01/10/2021
Field of study

BackgroundAccess to neurological care for Parkinson disease (PD) is a rare privilege for millions of people worldwide, especially in resource-limited countries. In 2013, there were just 1200 neurologists in India for a population of 1.3 billion people; in Africa, the average population per neurologist exceeds 3.3 million people. In contrast, 60,000 people receive a diagnosis of PD every year in the United States alone, and similar patterns of rising PD cases—fueled mostly by environmental pollution and an aging population—can be seen worldwide. The current projection of more than 12 million patients with PD worldwide by 2040 is only part of the picture given that more than 20% of patients with PD remain undiagnosed. Timely diagnosis and frequent assessment are key to ensure timely and appropriate medical intervention, thus improving the quality of life of patients with PD. ObjectiveIn this paper, we propose a web-based framework that can help anyone anywhere around the world record a short speech task and analyze the recorded data to screen for PD. MethodsWe collected data from 726 unique participants (PD: 262/726, 36.1% were women; non-PD: 464/726, 63.9% were women; average age 61 years) from all over the United States and beyond. A small portion of the data (approximately 54/726, 7.4%) was collected in a laboratory setting to compare the performance of the models trained with noisy home environment data against high-quality laboratory-environment data. The participants were instructed to utter a popular pangram containing all the letters in the English alphabet, “the quick brown fox jumps over the lazy dog.” We extracted both standard acoustic features (mel-frequency cepstral coefficients and jitter and shimmer variants) and deep learning–based embedding features from the speech data. Using these features, we trained several machine learning algorithms. We also applied model interpretation techniques such as Shapley additive explanations to ascertain the importance of each feature in determining the model’s output. ResultsWe achieved an area under the curve of 0.753 for determining the presence of self-reported PD by modeling the standard acoustic features through the XGBoost—a gradient-boosted decision tree model. Further analysis revealed that the widely used mel-frequency cepstral coefficient features and a subset of previously validated dysphonia features designed for detecting PD from a verbal phonation task (pronouncing “ahh”) influence the model’s decision the most. ConclusionsOur model performed equally well on data collected in a controlled laboratory environment and in the wild across different gender and age groups. Using this tool, we can collect data from almost anyone anywhere with an audio-enabled device and help the participants screen for PD remotely, contributing to equity and access in neurological care

Directory of Open Access Journals

PubMed Central