1,141 research outputs found

    Analysing Changes in the Acoustic Features of the Human Voice to Detect Depression amongst Biological Females in Higher Education

    Get PDF
    Depression significantly affects a large percentage of the population, with young adult females being one of the most at-risk demographics. Concurrently, there is a growing demand on healthcare, and with sufficient resources often unavailable to diagnose depression, new diagnostic methods are needed that are both cost-effective and accurate. The presence of depression is seen to significantly affect certain acoustic features of the human voice. Acoustic features have been found to exhibit subtle changes beyond the perception of the human auditory system when an individual has depression. With advances in speech processing, these subtle changes can be observed by machines. By measuring these changes, the human voice can be analysed to identify acoustic features that show a correlation with depression. The implementation of voice diagnosis would both reduce the burden on healthcare and ensure those with depression are diagnosed in a timely fashion, allowing them quicker access to treatment. The research project presents an analysis of voice data from 17 biological females between the ages of 20-26 years old in higher education as a means to detect depression. Eight participants were considered healthy with no history of depression, whilst the other nine currently had depression. Participants performed two vocal tasks consisting of extending sounds for a period of time and reading back a passage of speech. Six acoustic features were then measured from the voice data to determine whether these features can be utilised as diagnostic indicators of depression. The main finding of this study demonstrated one of the acoustic features measured demonstrates significant differences when comparing depressed and healthy individuals.<br/

    Gait sonification for rehabilitation: adjusting gait patterns by acoustic transformation of kinematic data

    Get PDF
    To enhance motor learning in both sport and rehabilitation, auditory feedback has emerged as an effective tool. Since it requires less attention than visual feedback and hardly affects the visually dominated orientation in space, it can be used safely and effectively in natural locomotion such as walking. One method for generating acoustic movement feedback is the direct mapping of kinematic data to sound (movement sonification). Using this method in orthopedic gait rehabilitation could make an important contribution to the prevention of falls and secondary diseases. This would not only reduce the individual suffering of the patients, but also medical treatment costs. To determine the possible applications of movement sonification in gait rehabilitation in the context of this work, a new gait sonification method based on inertial sensor technology was developed. Against the background of current scientific findings on sensorimotor function, feedback methods, and gait analysis, three studies published in scientific journals are presented in this thesis: The first study shows the applicability and acceptance of the feedback method in patients undergoing inpatient rehabilitation after unilateral total hip arthroplasty. In addition, the direct effect of gait sonification during ten gait training sessions on the patients’ gait pattern was revealed. In the second study, the immediate follow-up effect of gait sonification on the kinematics of the same patient group is examined at four measurement points after gait training. In this context, a significant influence of sonification on the gait pattern of the patients was shown, which, however, did not meet the previously expected effects. In view of this finding, the effect of the specific sound parameter loudness of gait sonification on the gait of healthy persons was analyzed in a third study. Thus, an impact of asymmetric loudness of gait sonification on the ground contact time could be detected. Considering this cause-effect relationship can be a component in improving gait sonfication in rehabilitation. Overall, the feasibility and effectiveness of movement sonification in gait rehabilitation of patients after unilateral hip arthroplasty becomes evident. The findings thus illustrate the potential of the method to efficiently support orthopedic gait rehabilitation in the future. On the basis of the results presented, this potential can be exploited in particular by an adequate mapping of movement to sound, a systematic modification of selected sound parameters, and a target-group-specific selection of the gait sonification mode. In addition to a detailed investigation of the three factors mentioned above, an optimization and refinement of gait analysis in patients after arthroplasty using inertial sensor technology will be beneficial in the future.Akustisches Feedback kann wirkungsvoll eingesetzt werden, um das Bewegungslernen sowohl im Sport als auch in der Rehabilitation zu erleichtern. Da es weniger Aufmerksamkeit als visuelles Feedback erfordert und die visuell dominierte Orientierung im Raum kaum beeinträchtigt, kann es während einer natürlichen Fortbewegung wie dem Gehen sicher und effektiv genutzt werden. Eine Methode zur Generierung akustischen Bewegungsfeedbacks ist die direkte Abbildung kinematischer Daten auf Sound (Bewegungssonifikation). Ein Einsatz dieser Methode in der orthopädischen Gangrehabilitation könnte einen wichtigen Beitrag zur Prävention von Stürzen und Folgeerkrankungen leisten. Neben dem individuellen Leid der Patienten ließen sich so auch medizinische Behandlungskosten erheblich reduzieren. Um im Rahmen dieser Arbeit die Einsatzmöglichkeiten der Bewegungssonifikation in der Gangrehabilitation zu bestimmen, wurde eine neue Gangsonifikationsmethodik auf Basis von Inertialsensorik entwickelt. Zu der entwickelten Methodik werden, vor dem Hintergrund aktueller wissenschaftlicher Erkenntnisse zur Sensomotorik, zu Feedbackmethoden und zur Ganganalyse, in dieser Thesis drei in Fachzeitschriften publizierte Studien vorgestellt. Die erste Studie beschreibt die Anwendbarkeit und Akzeptanz der Feedbackmethode bei Patienten in stationärer Rehabilitation nach unilateraler Hüftendoprothetik. Darüber hinaus wird der direkte Effekt der Gangsonifikation während eines zehnmaligen Gangtrainings auf das Gangmuster der Patienten deutlich. In der zweiten Studie wird der unmittelbare Nacheffekt der Gangsonifikation auf die Kinematik der gleichen Patientengruppe zu vier Messzeitpunkten nach dem Gangtraining untersucht. In diesem Zusammenhang zeigte sich ein signifikanter Einfluss der Sonifikation auf das Gangbild der Patienten, der allerdings nicht den zuvor erwarteten Effekten entsprach. Aufgrund dieses Ergebnisses wurde in einer dritten Studie die Wirkung des spezifischen Klangparameters Lautstärke der Gangsonifikation auf das Gangbild von gesunden Personen analysiert. Dabei konnte ein Einfluss von asymmetrischer Lautstärke der Gangsonifikation auf die Bodenkontaktzeit nachgewiesen werden. Die Berücksichtigung dieses Ursache-Wirkungs-Zusammenhangs kann einen Baustein bei der Verbesserung der Gangsonifikation in der Rehabilitation darstellen. Insgesamt wird die Anwendbarkeit und Wirksamkeit von Bewegungssonifikation in der Gangrehabilitation bei Patienten nach unilateraler Hüftendoprothetik evident. Die gewonnenen Erkenntnisse verdeutlichen das Potential der Methode, die orthopädische Gangrehabilitation zukünftig effizient zu unterstützen. Ausschöpfen lässt sich dieses Potential auf Grundlage der vorgestellten Ergebnisse insbesondere anhand einer adäquaten Zuordnung von Bewegung zu Sound, einer systematischen Modifikation ausgewählter Soundparameter sowie einer zielgruppenspezifischen Wahl des Modus der Sonifikation. Neben einer differenzierten Untersuchung der genannten Faktoren, erscheint zukünftig eine Optimierung und Verfeinerung der Ganganalyse bei Patienten nach Endoprothetik unter Einsatz von Inertialsensorik notwendig

    Speech-based automatic depression detection via biomarkers identification and artificial intelligence approaches

    Get PDF
    Depression has become one of the most prevalent mental health issues, affecting more than 300 million people all over the world. However, due to factors such as limited medical resources and accessibility to health care, there are still a large number of patients undiagnosed. In addition, the traditional approaches to depression diagnosis have limitations because they are usually time-consuming, and depend on clinical experience that varies across different clinicians. From this perspective, the use of automatic depression detection can make the diagnosis process much faster and more accessible. In this thesis, we present the possibility of using speech for automatic depression detection. This is based on the findings in neuroscience that depressed patients have abnormal cognition mechanisms thus leading to the speech differs from that of healthy people. Therefore, in this thesis, we show two ways of benefiting from automatic depression detection, i.e., identifying speech markers of depression and constructing novel deep learning models to improve detection accuracy. The identification of speech markers tries to capture measurable depression traces left in speech. From this perspective, speech markers such as speech duration, pauses and correlation matrices are proposed. Speech duration and pauses take speech fluency into account, while correlation matrices represent the relationship between acoustic features and aim at capturing psychomotor retardation in depressed patients. Experimental results demonstrate that these proposed markers are effective at improving the performance in recognizing depressed speakers. In addition, such markers show statistically significant differences between depressed patients and non-depressed individuals, which explains the possibility of using these markers for depression detection and further confirms that depression leaves detectable traces in speech. In addition to the above, we propose an attention mechanism, Multi-local Attention (MLA), to emphasize depression-relevant information locally. Then we analyse the effectiveness of MLA on performance and efficiency. According to the experimental results, such a model can significantly improve performance and confidence in the detection while reducing the time required for recognition. Furthermore, we propose Cross-Data Multilevel Attention (CDMA) to emphasize different types of depression-relevant information, i.e., specific to each type of speech and common to both, by using multiple attention mechanisms. Experimental results demonstrate that the proposed model is effective to integrate different types of depression-relevant information in speech, improving the performance significantly for depression detection

    Multidisciplinary perspectives on Artificial Intelligence and the law

    Get PDF
    This open access book presents an interdisciplinary, multi-authored, edited collection of chapters on Artificial Intelligence (‘AI’) and the Law. AI technology has come to play a central role in the modern data economy. Through a combination of increased computing power, the growing availability of data and the advancement of algorithms, AI has now become an umbrella term for some of the most transformational technological breakthroughs of this age. The importance of AI stems from both the opportunities that it offers and the challenges that it entails. While AI applications hold the promise of economic growth and efficiency gains, they also create significant risks and uncertainty. The potential and perils of AI have thus come to dominate modern discussions of technology and ethics – and although AI was initially allowed to largely develop without guidelines or rules, few would deny that the law is set to play a fundamental role in shaping the future of AI. As the debate over AI is far from over, the need for rigorous analysis has never been greater. This book thus brings together contributors from different fields and backgrounds to explore how the law might provide answers to some of the most pressing questions raised by AI. An outcome of the Católica Research Centre for the Future of Law and its interdisciplinary working group on Law and Artificial Intelligence, it includes contributions by leading scholars in the fields of technology, ethics and the law.info:eu-repo/semantics/publishedVersio

    Deep Learning Based Multimodal with Two-phase Training Strategy for Daily Life Video Classification

    Full text link
    In this paper, we present a deep learning based multimodal system for classifying daily life videos. To train the system, we propose a two-phase training strategy. In the first training phase (Phase I), we extract the audio and visual (image) data from the original video. We then train the audio data and the visual data with independent deep learning based models. After the training processes, we obtain audio embeddings and visual embeddings by extracting feature maps from the pre-trained deep learning models. In the second training phase (Phase II), we train a fusion layer to combine the audio/visual embeddings and a dense layer to classify the combined embedding into target daily scenes. Our extensive experiments, which were conducted on the benchmark dataset of DCASE (IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events) 2021 Task 1B Development, achieved the best classification accuracy of 80.5%, 91.8%, and 95.3% with only audio data, with only visual data, both audio and visual data, respectively. The highest classification accuracy of 95.3% presents an improvement of 17.9% compared with DCASE baseline and shows very competitive to the state-of-the-art systems

    A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks

    Full text link
    Transformer is a deep neural network that employs a self-attention mechanism to comprehend the contextual relationships within sequential data. Unlike conventional neural networks or updated versions of Recurrent Neural Networks (RNNs) such as Long Short-Term Memory (LSTM), transformer models excel in handling long dependencies between input sequence elements and enable parallel processing. As a result, transformer-based models have attracted substantial interest among researchers in the field of artificial intelligence. This can be attributed to their immense potential and remarkable achievements, not only in Natural Language Processing (NLP) tasks but also in a wide range of domains, including computer vision, audio and speech processing, healthcare, and the Internet of Things (IoT). Although several survey papers have been published highlighting the transformer's contributions in specific fields, architectural differences, or performance evaluations, there is still a significant absence of a comprehensive survey paper encompassing its major applications across various domains. Therefore, we undertook the task of filling this gap by conducting an extensive survey of proposed transformer models from 2017 to 2022. Our survey encompasses the identification of the top five application domains for transformer-based models, namely: NLP, Computer Vision, Multi-Modality, Audio and Speech Processing, and Signal Processing. We analyze the impact of highly influential transformer-based models in these domains and subsequently classify them based on their respective tasks using a proposed taxonomy. Our aim is to shed light on the existing potential and future possibilities of transformers for enthusiastic researchers, thus contributing to the broader understanding of this groundbreaking technology

    Evaluation of Pre-Trained CNN Models for Cardiovascular Disease Classification: A Benchmark Study

    Get PDF
    In this paper, we present an up-to-date benchmarking of the most commonly used pre-trained CNN models using a merged set of three available public datasets to have a large enough sample range. From the 18th century up to the present day, cardiovascular diseases, which are considered among the most significant health risks globally, have been diagnosed by the auscultation of heart sounds using a stethoscope. This method is elusive, and a highly experienced physician is required to master it. Artificial intelligence and, subsequently, machine learning are being applied to equip modern medicine with powerful tools to improve medical diagnoses. Image and audio pre-trained convolution neural network (CNN) models have been used for classifying normal and abnormal heartbeats using phonocardiogram signals. We objectively benchmark more than two dozen image-pre-trained CNN models in addition to two of the most popular audio-based pre-trained CNN models: VGGish and YAMnet, which have been developed specifically for audio classification. The experimental results have shown that audio-based models are among the best- performing models. In particular, the VGGish model had the highest average validation accuracy and average true positive rate of 87% and 85%, respectively

    Detecting somatisation disorder via speech: introducing the Shenzhen Somatisation Speech Corpus

    Get PDF
    Objective Speech recognition technology is widely used as a mature technical approach in many fields. In the study of depression recognition, speech signals are commonly used due to their convenience and ease of acquisition. Though speech recognition is popular in the research field of depression recognition, it has been little studied in somatisation disorder recognition. The reason for this is the lack of a publicly accessible database of relevant speech and benchmark studies. To this end, we introduce our somatisation disorder speech database and give benchmark results. Methods By collecting speech samples of somatisation disorder patients, in cooperation with the Shenzhen University General Hospital, we introduce our somatisation disorder speech database, the Shenzhen Somatisation Speech Corpus (SSSC). Moreover, a benchmark for SSSC using classic acoustic features and a machine learning model is proposed in our work. Results To obtain a more scientific benchmark, we have compared and analysed the performance of different acoustic features, i. e., the full ComParE feature set, or only MFCCs, fundamental frequency (F0), and frequency and bandwidth of the formants (F1-F3). By comparison. the best result of our benchmark is the 76.0 % unweighted average recall achieved by a support vector machine with formants F1–F3. Conclusion The proposal of SSSC bridges a research gap in somatisation disorder, providing researchers with a publicly accessible speech database. In addition, the results of the benchmark show the scientific validity and feasibility of computer audition for speech recognition in somatization disorders

    Audio-Visual Deception Detection: DOLOS Dataset and Parameter-Efficient Crossmodal Learning

    Full text link
    Deception detection in conversations is a challenging yet important task, having pivotal applications in many fields such as credibility assessment in business, multimedia anti-frauds, and custom security. Despite this, deception detection research is hindered by the lack of high-quality deception datasets, as well as the difficulties of learning multimodal features effectively. To address this issue, we introduce DOLOS\footnote {The name ``DOLOS" comes from Greek mythology.}, the largest gameshow deception detection dataset with rich deceptive conversations. DOLOS includes 1,675 video clips featuring 213 subjects, and it has been labeled with audio-visual feature annotations. We provide train-test, duration, and gender protocols to investigate the impact of different factors. We benchmark our dataset on previously proposed deception detection approaches. To further improve the performance by fine-tuning fewer parameters, we propose Parameter-Efficient Crossmodal Learning (PECL), where a Uniform Temporal Adapter (UT-Adapter) explores temporal attention in transformer-based architectures, and a crossmodal fusion module, Plug-in Audio-Visual Fusion (PAVF), combines crossmodal information from audio-visual features. Based on the rich fine-grained audio-visual annotations on DOLOS, we also exploit multi-task learning to enhance performance by concurrently predicting deception and audio-visual features. Experimental results demonstrate the desired quality of the DOLOS dataset and the effectiveness of the PECL. The DOLOS dataset and the source codes are available at https://github.com/NMS05/Audio-Visual-Deception-Detection-DOLOS-Dataset-and-Parameter-Efficient-Crossmodal-Learning/tree/main.Comment: 11 pages, 6 figure
    corecore