22 research outputs found
Deep Learning for Rheumatoid Arthritis: Joint Detection and Damage Scoring in X-rays
Recent advancements in computer vision promise to automate medical image
analysis. Rheumatoid arthritis is an autoimmune disease that would profit from
computer-based diagnosis, as there are no direct markers known, and doctors
have to rely on manual inspection of X-ray images. In this work, we present a
multi-task deep learning model that simultaneously learns to localize joints on
X-ray images and diagnose two kinds of joint damage: narrowing and erosion.
Additionally, we propose a modification of label smoothing, which combines
classification and regression cues into a single loss and achieves 5% relative
error reduction compared to standard loss functions. Our final model obtained
4th place in joint space narrowing and 5th place in joint erosion in the global
RA2 DREAM challenge.Comment: Presented at the Workshop on AI for Public Health at ICLR 202
Mouth and facial informativeness norms for 2276 English words
Mouth and facial movements are part and parcel of face-to-face communication. The primary way of assessing their role in speech perception has been by manipulating their presence (e.g., by blurring the area of a speaker's lips) or by looking at how informative different mouth patterns are for the corresponding phonemes (or visemes; e.g., /b/ is visually more salient than /g/). However, moving beyond informativeness of single phonemes is challenging due to coarticulation and language variations (to name just a few factors). Here, we present mouth and facial informativeness (MaFI) for words, i.e., how visually informative words are based on their corresponding mouth and facial movements. MaFI was quantified for 2276 English words, varying in length, frequency, and age of acquisition, using phonological distance between a word and participants' speechreading guesses. The results showed that MaFI norms capture well the dynamic nature of mouth and facial movements per word, with words containing phonemes with roundness and frontness features, as well as visemes characterized by lower lip tuck, lip rounding, and lip closure being visually more informative. We also showed that the more of these features there are in a word, the more informative it is based on mouth and facial movements. Finally, we demonstrated that the MaFI norms generalize across different variants of English language. The norms are freely accessible via Open Science Framework ( https://osf.io/mna8j/ ) and can benefit any language researcher using audiovisual stimuli (e.g., to control for the effect of speech-linked mouth and facial movements)
Benefit of visual speech information for word comprehension in post-stroke aphasia
Aphasia is a language disorder that often involves speech comprehension impairments affecting communication. In face-to-face settings, speech is accompanied by mouth and facial movements, but little is known about the extent to which they benefit aphasic comprehension. This study investigated the benefit of visual information accompanying speech for word comprehension in people with aphasia (PWA) and the neuroanatomic substrates of any benefit. Thirty-six PWA and 13 neurotypical matched control participants performed a picture-word verification task in which they indicated whether a picture of an animate/inanimate object matched a subsequent word produced by an actress in a video. Stimuli were either audiovisual (with visible mouth and facial movements) or auditory-only (still picture of a silhouette) with audio being clear (unedited) or degraded (6-band noise-vocoding). We found that visual speech information was more beneficial for neurotypical participants than PWA, and more beneficial for both groups when speech was degraded. A multivariate lesion-symptom mapping analysis for the degraded speech condition showed that lesions to superior temporal gyrus, underlying insula, primary and secondary somatosensory cortices, and inferior frontal gyrus were associated with reduced benefit of audiovisual compared to auditory-only speech, suggesting that the integrity of these fronto-temporo-parietal regions may facilitate cross-modal mapping. These findings provide initial insights into our understanding of the impact of audiovisual information on comprehension in aphasia and the brain regions mediating any benefit
Benefit of Visual Speech Information for Word Comprehension in Post-stroke Aphasia
Aphasia is a language disorder that often involves speech comprehension impairments affecting communication. In face-to-face settings, speech is accompanied by mouth and facial movements, but little is known about the extent to which they benefit aphasic comprehension. This study investigated the benefit of visual information accompanying speech for word comprehension in people with aphasia (PWA) and the neuroanatomic substrates of any benefit. Thirty-six PWA and 13 neurotypical matched control participants performed a picture-word verification task in which they indicated whether a picture of an animate/inanimate object matched a subsequent word produced by an actress in a video. Stimuli were either audiovisual (with visible mouth and facial movements) or auditory-only (still picture of a silhouette) with audio being clear (unedited) or degraded (6-band noise-vocoding). We found that visual speech information was more beneficial for neurotypical participants than PWA, and more beneficial for both groups when speech was degraded. A multivariate lesion-symptom mapping analysis for the degraded speech condition showed that lesions to superior temporal gyrus, underlying insula, primary and secondary somatosensory cortices, and inferior frontal gyrus were associated with reduced benefit of audiovisual compared to auditory-only speech, suggesting that the integrity of these fronto-temporo-parietal regions may facilitate cross-modal mapping. These findings provide initial insights into our understanding of the impact of audiovisual information on comprehension in aphasia and the brain regions mediating any benefit
Efficacy of spoken word comprehension therapy in patients with chronic aphasia: a cross-over randomised controlled trial with structural imaging
Objective: The efficacy of spoken language comprehension therapies for persons with aphasia remains equivocal. We investigated the efficacy of a self-led therapy app, ‘Listen-In’, and examined the relation between brain structure and therapy response. Methods: A cross-over randomised repeated measures trial with five testing time points (12-week intervals), conducted at the university or participants' homes, captured baseline (T1), therapy (T2-T4) and maintenance (T5) effects. Participants with chronic poststroke aphasia and spoken language comprehension impairments completed consecutive Listen-In and standard care blocks (both 12 weeks with order randomised). Repeated measures analyses of variance compared change in spoken language comprehension on two co-primary outcomes over therapy versus standard care. Three structural MRI scans (T2-T4) for each participant (subgroup, n=25) were analysed using cross-sectional and longitudinal voxel-based morphometry. Results: Thirty-five participants completed, on average, 85 hours (IQR=70–100) of Listen-In (therapy first, n=18). The first study-specific co-primary outcome (Auditory Comprehension Test (ACT)) showed large and significant improvements for trained spoken words over therapy versus standard care (11%, Cohen’s d=1.12). Gains were largely maintained at 12 and 24 weeks. There were no therapy effects on the second standardised co-primary outcome (Comprehensive Aphasia Test: Spoken Words and Sentences). Change on ACT trained words was associated with volume of pretherapy right hemisphere white matter and post-therapy grey matter tissue density changes in bilateral temporal lobes. Conclusions: Individuals with chronic aphasia can improve their spoken word comprehension many years after stroke. Results contribute to hemispheric debates implicating the right hemisphere in therapy-driven language recovery. Listen-In will soon be available on GooglePlay. Trial registration number: NCT02540889
The role of visual cues in speech comprehension: evidence from brain and behaviour
Face-to-face communication is multimodal. It comprises a plethora of linguistic and non-linguistic cues, such as gestures, face and body movements, eye gaze, and prosody, that modulate language processing. However, these cues have been primarily studied in isolation and it remains unclear whether they interact and, if so, how they affect speech comprehension. This thesis aims to assess the role and the relationship of two multimodal cues: co-speech gestures and mouth movements. The claims presented in this thesis are supported by findings from studies on neurotypical adults and people with post-stroke aphasia. First, I developed a novel quantification of how informative mouth and facial movements are and provided informativeness norms for more than 2,000 English words. Such quantification allows studying the effects of speech and gestures while controlling for the invariably present mouth movements. Second, I investigated the impact of iconic gestures (i.e., gestures referring to features and properties of objects or actions in an imagistic way) and mouth informativeness on word comprehension under clear and challenging listening conditions. I found that both cues contribute to speech comprehension and that listeners dynamically process visual cues depending on their informativeness and listening conditions. Next, I showed that gestures and mouth movements also benefit discourse comprehension, but they do so differently: Listeners always take gestures into account, whereas mouth movements are only beneficial in challenging listening conditions. Finally, I investigated the impact of mouth movements on speech comprehension in aphasia. I showed that people with aphasia benefit less from mouth movements than neurotypical adults and that this effect is contingent upon lesion locations. This thesis brings together research on gestures and mouth movements and demonstrates that visual cues benefit speech comprehension interactively and dynamically depending on cue informativeness, listening conditions, and neuroanatomical profiles of people with aphasia