217 research outputs found

    Utilising Hidden Markov Modelling for the Assessment of Accommodation in Conversational Speech

    Get PDF
    The work presented here suggests a method for assessing speech accommodation in a holistic acoustic manner by utilising Hidden Markov Models (HMMs). The rationale for implementation of this method is presented along with an explanation of how HMMs work. Here, a heavily simplified HMM is used (single state; mixture of gaussians) in order to assess the applicability of more sophisticated HMMs. Results are presented from a small-scale study of six pairs of female Scottish-English speakers, showing measurement of significant trends and changes in holistic acoustic features of speakers during conversational interaction. Our findings suggest that methods integrating HMMs with current holistic acoustic measures of speech may be a useful tool in accounting for acoustic change due to speaker interaction

    Comparing the Effectiveness of Support Vector Machines and Convolutional Neural Networks for Determining User Intent in Conversational Agents

    Get PDF
    Over the last ļ¬fty years, conversational agent systems have evolved in their ability to understand natural language input. In recent years Natural Language Processing (NLP) and Machine Learning (ML) have allowed computer systems to make great strides in the area of natural language understanding. However, little research has been carried out in these areas within the context of conversational systems. This paper identiļ¬es Convolutional Neural Network (CNN) and Support Vector Machine (SVM) as the two ML algorithms with the best record of performance in ex isting NLP literature, with CNN indicated as generating the better results of the two. A comprehensive experiment is deļ¬ned where the results of SVM models utilising sev eral kernels are compared to the results of a selection of CNN models. To contextualise the experiment to conversational agents a dataset based on conversational interactions is used. A state of the art NLP pipeline is also created to work with both algorithms in the context of the agent dataset. By conducting a detailed statistical analysis of the results, this paper proposes to provide an extensive indicator as to which algo rithm oļ¬€ers better performance for agent-based systems. Ultimately the experimental results indicate that CNN models do not necessarily generate better results than SVM models. In fact, the SVM model utilising a Radial Basis Function kernel generates statistically better results than all other models considered under these experimental conditions

    Brains in dialogue: investigating accommodation in live conversational speech for both speech and EEG data.

    Get PDF
    One of the phenomena to emerge from the study of human spoken interaction is accommodation or the tendency of an individualā€™s speech patterning to shift relative to their interlocutor. Whilst the experimental approach to the detection of accommodation has a solid background in the literature, it tends to treat the process of accommodation as a black box. The general approach for the detection of accommodation in speech has been to record the speech of a given speaker prior to interaction and then again after an interaction. These two measures are then compared to the speech of the interlocutor to test for similarity. If the speech sample following interaction is more similar then we can say that accommodation has taken place. Part of the goal of this thesis is to evaluate whether it is possible to look into the black box of speech accommodation and measure it ā€˜in situā€™. Given that speech accommodation appears to take place as a result of interaction, it would be reasonable to assume that a similar effect might be observable in other areas contributing to a communicative interaction. The notion of an interacting dyad developing an increased degree of alignment over the course of an interaction has been proposed by psychologists. Theories have posited that alignment occurs at multiple levels of engagement, from broad levels of syntactic alignment down to phonetic levels of alignment. The use of speech accommodation as an anchor with which to track the evolution of change in the brain signal may prove to be one approach to investigating the claims made by these theories. The second part of this thesis aims to evaluate whether the phenomenon of accommodation is also observable in the form of electrical signals generated by the brain, measured using Electroencephalography (EEG). However, evaluating the change in the EEG signal over a continuous stretch of time is a hurdle that will need to be tackled. Traditionally, EEG methodologies involve averaging the signal over many repetitions of the same task. This is not a viable option when investigating communicative interaction. Clearly the evaluation of accommodation in both speech and brain activity, especially for continuously unfolding phenomena such as accommodation, is a non-trivial task. In order to tackle this, an approach from speech recognition and computer science has been employed. The implementation of Hidden Markov Models (HMM) has been used to develop speech recognition systems and has also been used to detect fraudulent attempts to imitate the voice of others. Given that HMMs have successfully been employed to detect the imitation of another personā€™s speech they are a good candidate for being able to detect the movement towards or away from an interlocutor during the course of an interaction. In addition, the use of HMMs is non-domain specific, they can be used to evaluate any time-variant signal. This adaptability of the approach allows for it to also be applied to EEG signals in conjunction with the speech signal. Two experiments are presented here. The behavioural experiment aims to evaluate the ability of a HMM based approach to detect accommodation by engaging pairs of female, Glaswegian speakers in the collaborative DiapixUK task. The results of their interactions are then evaluated from both a traditional phonetic standpoint, by assessing changes in Voice Onset Time (VOT) of stop consonants, formant values of vowels and speech rate over the course of an interaction and using the HMM based approach. The neural experiment looks to evaluate the ability of a HMM based approach to detect accommodation in both the speech signal and in brain activity. The same experiment that was performed in Experiment 1 was repeated, with the addition of EEG caps to both participants. The data was then evaluated using the HMM based approach. This thesis presents findings that suggest a function for speech accommodation that has not been explored in the past. This is done through the use of a novel, HMM based, holistic acoustic-phonetic measurement tool which produced consistent measures across both experiments. Further to this, the measurement tool is shown to have possible extended uses for EEG data. The use of the presented HMM based, holistic-acoustic measurement tool presents a novel contribution to the field for the measurement and evaluation of accommodation

    Impacts of new and emerging assistive technologies for ageing and disabled housing

    Full text link
    This research looks at how smart home assistive technologies (AT) may be best used in both the aged care and disability sectors to reduce the need for support services. It includes an assessment of ease of use, quality-of-life and cost benefit analysis, and contributes to the development of policy options that could facilitate effective adoption of smart home AT in Australia

    The Mechanical Psychologist: How Computational Techniques Can Aid Social Researchers in the Analysis of High-Stakes Conversation

    Get PDF
    Qualitative coding is an essential observational tool for describing behaviour in the social sciences. However, it traditionally relies on manual, time-consuming, and error-prone methods performed by humans. To overcome these issues, cross-disciplinary researchers are increasingly exploring computational methods such as Natural Language Processing (NLP) and Machine Learning (ML) to annotate behaviour automatically. Automated methods offer scalability, error reduction, and the discovery of increasingly subtle patterns in data compared to human effort alone (N. C. Chen et al., 2018). Despite promising advancements, concerns regarding generalisability, mistrust of automation, and value alignment between humans and machines persist (Friedberg et al., 2012; Grimmer et al., 2021; Jiang et al., 2021; R. Levitan & Hirschberg, 2011; Mills, 2019; Nenkova et al., 2008; Rahimi et al., 2017; Yarkoni et al., 2021). This thesis investigates the potential of computational techniques, such as social signal processing, text mining, and machine learning, to streamline qualitative coding in the social sciences, focusing on two high-stakes conversational case studies. The first case study analyses political interviewing using a corpus of 691 interview transcripts from US news networks. Psychological behaviours associated with effective interviewing are measured and used to predict conversational quality through supervised machine learning. Feature engineering employs a Social Signal Processing (SSP) approach to extract latent behaviours from low-level social signals (Vinciarelli, Salamin, et al., 2009). Conversational quality, calculated from desired characteristics of interviewee speech, is validated by a human-rater study. The findings support the potential of computational approaches in qualitative coding while acknowledging challenges in interpreting low-level social signals. The second case study investigates the ability of machines to learn expert-defined behaviours from human annotation, specifically in detecting predatory behaviour in known cases of online child grooming. In this section, the author utilises 623 chat logs obtained from a US-based online watchdog, with expert annotators labelling a subset of these chat logs to train a large language model. The goal was to investigate the machineā€™s ability to detect eleven predatory behaviours based on expert annotations. The results show that the machine could detect several behaviours with as few as fifty labelled instances, but rare behaviours were frequently over-predicted. The author next implemented a collaborative human-AI approach to investigate the trade-off between human accuracy and machine efficiency. The results suggested that a human-in-the-loop approach could improve human efficiency and machine accuracy, achieving near-human performance on several behaviours approximately fifteen times faster than human effort alone. The conclusion emphasises the value of increased automation in social sciences while recognising the importance of social scientific expertise in cross-disciplinary re- search, especially when addressing real-world problems. It advocates for technology that augments and enhances human effort and expertise without replacing it entirely. This thesis acknowledges the challenges in interpreting computational signals and the importance of preserving human insight in qualitative coding. The thesis also highlights potential avenues for future research, such as refining computational methods for qualitative coding and exploring collaborative human-AI approaches to address the limitations of automated methods

    Within-formant spectral feature analysis for forensic speaker discrimination casework: A study of 45 Marwari monolinguals from Bikaner, India

    Get PDF
    This PhD project investigates the significance of within-formant measurements for the vowels [i:], [ÉŖ], [e], [ə], [a:], [o], [u:], and [ŹŠ], for forensic speaker comparison. It contains six traditional PhD thesis chapters providing background information, as well as three research articles presenting analyses. Data was sourced from the Marwari language, spoken in Rajasthan, India, as a testbed, but its applicability may extend to other languages. Speech was recorded from forty-five female Marwari monolingual speakers representing three caste dialects (fifteen per variety). Three speech elicitation techniques were used: reading from a wordlist, telling stories around picture stimuli, and engaging in conversation. Articles 1ā€“3 investigate the impact of including within-formant spectral moments (i.e., centre of gravity, standard deviation, kurtosis, skewness) and spectral measures (i.e., formant amplitude, relative amplitude, spectral bandwidth, LPC bandwidth, and spectral peaks), with and without centre formant frequencies, on speaker discrimination models. The investigations encompass various combination-based systems tested against three separate variables - vowels, variety, and speech style - using linear mixed model ANOVA and linear discriminant analysis. The research contributes to existing manual systems by providing a semi-supervised feature- based system that may supplement existing ā€˜manualā€™ and semi-supervised tools. For legal systems that currently do not accept ASR analysis, it provides a more interpretable and reproducible approach

    Recent Advances in Social Data and Artificial Intelligence 2019

    Get PDF
    The importance and usefulness of subjects and topics involving social data and artificial intelligence are becoming widely recognized. This book contains invited review, expository, and original research articles dealing with, and presenting state-of-the-art accounts pf, the recent advances in the subjects of social data and artificial intelligence, and potentially their links to Cyberspace

    Automatic Person Verification Using Speech and Face Information

    Get PDF
    Interest in biometric based identification and verification systems has increased considerably over the last decade. As an example, the shortcomings of security systems based on passwords can be addressed through the supplemental use of biometric systems based on speech signals, face images or fingerprints. Biometric recognition can also be applied to other areas, such as passport control (immigration checkpoints), forensic work (to determine whether a biometric sample belongs to a suspect) and law enforcement applications (e.g. surveillance). While biometric systems based on face images and/or speech signals can be useful, their performance can degrade in the presence of challenging conditions. In face based systems this can be in the form of a change in the illumination direction and/or face pose variations. Multi-modal systems use more than one biometric at the same time. This is done for two main reasons -- to achieve better robustness and to increase discrimination power. This thesis reviews relevant backgrounds in speech and face processing, as well as information fusion. It reports research aimed at increasing the robustness of single- and multi-modal biometric identity verification systems. In particular, it addresses the illumination and pose variation problems in face recognition, as well as the challenge of effectively fusing information from multiple modalities under non-ideal conditions
    • ā€¦
    corecore