2,017 research outputs found
Increase Apparent Public Speaking Fluency By Speech Augmentation
Fluent and confident speech is desirable to every speaker. But professional
speech delivering requires a great deal of experience and practice. In this
paper, we propose a speech stream manipulation system which can help
non-professional speakers to produce fluent, professional-like speech content,
in turn contributing towards better listener engagement and comprehension. We
propose to achieve this task by manipulating the disfluencies in human speech,
like the sounds 'uh' and 'um', the filler words and awkward long silences.
Given any unrehearsed speech we segment and silence the filled pauses and
doctor the duration of imposed silence as well as other long pauses
('disfluent') by a predictive model learned using professional speech dataset.
Finally, we output a audio stream in which speaker sounds more fluent,
confident and practiced compared to the original speech he/she recorded.
According to our quantitative evaluation, we significantly increase the fluency
of speech by reducing rate of pauses and fillers
Improving the Applicability of AI for Psychiatric Applications through Human-in-the-loop Methodologies
Objectives: Machine learning (ML) and natural language
processing have great potential to improve effciency and
accuracy in diagnosis, treatment recommendations, predictive interventions, and scarce resource allocation within psychiatry. Researchers often conceptualize such an approach
as operating in isolation without much need for human
involvement, yet it remains crucial to harness human-inthe-loop practices when developing and implementing such
techniques as their absence may be catastrophic. We advocate for building ML-based technologies that collaborate
with experts within psychiatry in all stages of implementation and use to increase model performance while simultaneously increasing the practicality, robustness, and
reliability of the process.
Methods: We showcase pitfalls of the traditional ML framework and explain how it can be improved with human-inthe-loop techniques. Specifcally, we applied active learning
strategies to the automatic scoring of a story recall task
and compared the results to a traditional approach.
Results: Human-in-the-loop methodologies supplied a
greater understanding of where the model was least confdent or had knowledge gaps during training. As compared
to the traditional framework, less than half of the training
data were needed to reach a given accuracy.
Conclusions: Human-in-the-loop ML is an approach to
data collection and model creation that harnesses active learning to select the most critical data needed to
increase a model’s accuracy and generalizability more
effciently than classic random sampling would otherwise allow. Such techniques may additionally operate
as safeguards from spurious predictions and can aid in
decreasing disparities that artifcial intelligence systems
otherwise propagate
Distinguishing between True and False Stories using various Linguistic Features
This paper analyzes what linguistic features differentiate true and false stories written in Hebrew. To do so, we have defined four feature sets containing 145 features: POS-tags, quantitative, repetition, and special expressions. The examined corpus contains stories that were composed by 48 native Hebrew speakers who were asked to tell both false and true stories. Classification experiments on all possible combinations of these four feature sets using five supervised machine learning methods have been applied. The Part of Speech (POS) set was superior to all others and has been found as a key component. The best accuracy result (89.6%) has been achieved by a combination of sixteen POS-tags and one quantitative feature.
Modeling Incoherent Discourse in Non-Affective Psychosis
Background: Computational linguistic methodology allows quantification of speech abnormalities in non-affective psychosis. For this patient group, incoherent speech has long been described as a symptom of formal thought disorder. Our study is an interdisciplinary attempt at developing a model of incoherence in non-affective psychosis, informed by computational linguistic methodology as well as psychiatric research, which both conceptualize incoherence as associative loosening. The primary aim of this pilot study was methodological: to validate the model against clinical data and reduce bias in automated coherence analysis.
Methods: Speech samples were obtained from patients with a diagnosis of schizophrenia or schizoaffective disorder, who were divided into two groups of n = 20 subjects each, based on different clinical ratings of positive formal thought disorder, and n = 20 healthy control subjects.
Results: Coherence metrics that were automatically derived from interview transcripts significantly predicted clinical ratings of thought disorder. Significant results from multinomial regression analysis revealed that group membership (controls vs. patients with vs. without formal thought disorder) could be predicted based on automated coherence analysis when bias was considered. Further improvement of the regression model was reached by including variables that psychiatric research has shown to inform clinical diagnostics of positive formal thought disorder.
Conclusions: Automated coherence analysis may capture different features of incoherent speech than clinical ratings of formal thought disorder. Models of incoherence in non-affective psychosis should include automatically derived coherence metrics as well as lexical and syntactic features that influence the comprehensibility of speech
Recommended from our members
Computational Approaches to Modeling Speaker State in the Medical Domain
Recently, researchers in computer science and engineering have begun to explore the possibility of finding speech-based correlates of various medical conditions using automatic, computational methods. If such language cues can be identified and quantified automatically, this information can be used to support diagnosis and treatment of medical conditions in clinical settings and to further fundamental research in understanding cognition. This chapter reviews computational approaches that explore communicative patterns of patients who suffer from medical conditions such as depression, autism spectrum disorders, schizophrenia, and cancer. There are two main approaches discussed: research that explores features extracted from the acoustic signal and research that focuses on lexical and semantic features. We also present some applied research that uses computational methods to develop assistive technologies. In the final sections we discuss issues related to and the future of this emerging field of research
Language production impairments in patients with a first episode of psychosis
Language production has often been described as impaired in psychiatric diseases such as in psychosis. Nevertheless, little is known about the characteristics of linguistic difficulties and their relation with other cognitive domains in patients with a first episode of psychosis (FEP), either affective or non-affective. To deepen our comprehension of linguistic profile in FEP, 133 patients with FEP (95 non-affective, FEP-NA; 38 affective, FEP-A) and 133 healthy controls (HC) were assessed with a narrative discourse task. Speech samples were systematically analyzed with a well-established multilevel procedure investigating both micro- (lexicon, morphology, syntax) and macro-linguistic (discourse coherence, pragmatics) levels of linguistic processing. Executive functioning and IQ were also evaluated. Both linguistic and neuropsychological measures were secondarily implemented with a machine learning approach in order to explore their predictive accuracy in classifying participants as FEP or HC. Compared to HC, FEP patients showed language production difficulty at both micro- and macro-linguistic levels. As for the former, FEP produced shorter and simpler sentences and fewer words per minute, along with a reduced number of lexical fillers, compared to HC. At the macro-linguistic level, FEP performance was impaired in local coherence, which was paired with a higher percentage of utterances with semantic errors. Linguistic measures were not correlated with any neuropsychological variables. No significant differences emerged between FEP-NA and FEP-A (p≥0.02, after Bonferroni correction). Machine learning analysis showed an accuracy of group prediction of 76.36% using language features only, with semantic variables being the most impactful. Such a percentage was enhanced when paired with clinical and neuropsychological variables. Results confirm the presence of language production deficits already at the first episode of the illness, being such impairment not related to other cognitive domains. The high accuracy obtained by the linguistic set of features in classifying groups support the use of machine learning methods in neuroscience investigations
Language production impairments in patients with a first episode of psychosis
Language production has often been described as impaired in psychiatric diseases such as in psychosis. Nevertheless, little is known about the characteristics of linguistic difficulties and their relation with other cognitive domains in patients with a first episode of psychosis (FEP), either affective or non-affective. To deepen our comprehension of linguistic profile in FEP, 133 patients with FEP (95 non-affective, FEP-NA; 38 affective, FEP-A) and 133 healthy controls (HC) were assessed with a narrative discourse task. Speech samples were systematically analyzed with a well-established multilevel procedure investigating both micro- (lexicon, morphology, syntax) and macro-linguistic (discourse coherence, pragmatics) levels of linguistic processing. Executive functioning and IQ were also evaluated. Both linguistic and neuropsychological measures were secondarily implemented with a machine learning approach in order to explore their predictive accuracy in classifying participants as FEP or HC. Compared to HC, FEP patients showed language production difficulty at both micro- and macro-linguistic levels. As for the former, FEP produced shorter and simpler sentences and fewer words per minute, along with a reduced number of lexical fillers, compared to HC. At the macro-linguistic level, FEP performance was impaired in local coherence, which was paired with a higher percentage of utterances with semantic errors. Linguistic measures were not correlated with any neuropsychological variables. No significant differences emerged between FEP-NA and FEP-A (p≥0.02, after Bonferroni correction). Machine learning analysis showed an accuracy of group prediction of 76.36% using language features only, with semantic variables being the most impactful. Such a percentage was enhanced when paired with clinical and neuropsychological variables. Results confirm the presence of language production deficits already at the first episode of the illness, being such impairment not related to other cognitive domains. The high accuracy obtained by the linguistic set of features in classifying groups support the use of machine learning methods in neuroscience investigations
- …