492 research outputs found

    Automatic Recognition of Non-Verbal Acoustic Communication Events With Neural Networks

    Get PDF
    Non-verbal acoustic communication is of high importance to humans and animals: Infants use the voice as a primary communication tool. Animals of all kinds employ acoustic communication, such as chimpanzees, which use pant-hoot vocalizations for long-distance communication. Many applications require the assessment of such communication for a variety of analysis goals. Computational systems can support these areas through automatization of the assessment process. This is of particular importance in monitoring scenarios over large spatial and time scales, which are infeasible to perform manually. Algorithms for sound recognition have traditionally been based on conventional machine learning approaches. In recent years, so-called representation learning approaches have gained increasing popularity. This particularly includes deep learning approaches that feed raw data to deep neural networks. However, there remain open challenges in applying these approaches to automatic recognition of non-verbal acoustic communication events, such as compensating for small data set sizes. The leading question of this thesis is: How can we apply deep learning more effectively to automatic recognition of non-verbal acoustic communication events? The target communication types were specifically (1) infant vocalizations and (2) chimpanzee long-distance calls. This thesis comprises four studies that investigated aspects of this question: Study (A) investigated the assessment of infant vocalizations by laypersons. The central goal was to derive an infant vocalization classification scheme based on the laypersons' perception. The study method was based on the Nijmegen Protocol, where participants rated vocalization recordings through various items, such as affective ratings and class labels. Results showed a strong association between valence ratings and class labels, which was used to derive a classification scheme. Study (B) was a comparative study on various neural network types for the automatic classification of infant vocalizations. The goal was to determine the best performing network type among the currently most prevailing ones, while considering the influence of their architectural configuration. Results showed that convolutional neural networks outperformed recurrent neural networks and that the choice of the frequency and time aggregation layer inside the network is the most important architectural choice. Study (C) was a detailed investigation on computer vision-like convolutional neural networks for infant vocalization classification. The goal was to determine the most important architectural properties for increasing classification performance. Results confirmed the importance of the aggregation layer and additionally identified the input size of the fully-connected layers and the accumulated receptive field to be of major importance. Study (D) was an investigation on compensating class imbalance for chimpanzee call detection in naturalistic long-term recordings. The goal was to determine which compensation method among a selected group improved performance the most for a deep learning system. Results showed that spectrogram denoising was most effective, while methods for compensating relative imbalance either retained or decreased performance.:1. Introduction 2. Foundations in Automatic Recognition of Acoustic Communication 3. State of Research 4. Study (A): Investigation of the Assessment of Infant Vocalizations by Laypersons 5. Study (B): Comparison of Neural Network Types for Automatic Classification of Infant Vocalizations 6. Study (C): Detailed Investigation of CNNs for Automatic Classification of Infant Vocalizations 7. Study (D): Compensating Class Imbalance for Acoustic Chimpanzee Detection With Convolutional Recurrent Neural Networks 8. Conclusion and Collected Discussion 9. AppendixNonverbale akustische Kommunikation ist für Menschen und Tiere von großer Bedeutung: Säuglinge nutzen die Stimme als primäres Kommunikationsmittel. Schimpanse verwenden sogenannte 'Pant-hoots' und Trommeln zur Kommunikation über weite Entfernungen. Viele Anwendungen erfordern die Beurteilung solcher Kommunikation für verschiedenste Analyseziele. Algorithmen können solche Bereiche durch die Automatisierung der Beurteilung unterstützen. Dies ist besonders wichtig beim Monitoring langer Zeitspannen oder großer Gebiete, welche manuell nicht durchführbar sind. Algorithmen zur Geräuscherkennung verwendeten bisher größtenteils konventionelle Ansätzen des maschinellen Lernens. In den letzten Jahren hat eine alternative Herangehensweise Popularität gewonnen, das sogenannte Representation Learning. Dazu gehört insbesondere Deep Learning, bei dem Rohdaten in tiefe neuronale Netze eingespeist werden. Jedoch gibt es bei der Anwendung dieser Ansätze auf die automatische Erkennung von nonverbaler akustischer Kommunikation ungelöste Herausforderungen, wie z.B. die Kompensation der relativ kleinen Datenmengen. Die Leitfrage dieser Arbeit ist: Wie können wir Deep Learning effektiver zur automatischen Erkennung nonverbaler akustischer Kommunikation verwenden? Diese Arbeit konzentriert sich speziell auf zwei Kommunikationsarten: (1) vokale Laute von Säuglingen (2) Langstreckenrufe von Schimpansen. Diese Arbeit umfasst vier Studien, welche Aspekte dieser Frage untersuchen: Studie (A) untersuchte die Beurteilung von Säuglingslauten durch Laien. Zentrales Ziel war die Ableitung eines Klassifikationsschemas für Säuglingslaute auf der Grundlage der Wahrnehmung von Laien. Die Untersuchungsmethode basierte auf dem sogenannten Nijmegen-Protokoll. Hier beurteilten die Teilnehmenden Lautaufnahmen von Säuglingen anhand verschiedener Variablen, wie z.B. affektive Bewertungen und Klassenbezeichnungen. Die Ergebnisse zeigten eine starke Assoziation zwischen Valenzbewertungen und Klassenbezeichnungen, die zur Ableitung eines Klassifikationsschemas verwendet wurde. Studie (B) war eine vergleichende Studie verschiedener Typen neuronaler Netzwerke für die automatische Klassifizierung von Säuglingslauten. Ziel war es, den leistungsfähigsten Netzwerktyp unter den momentan verbreitetsten Typen zu ermitteln. Hierbei wurde der Einfluss verschiedener architektonischer Konfigurationen innerhalb der Typen berücksichtigt. Die Ergebnisse zeigten, dass Convolutional Neural Networks eine höhere Performance als Recurrent Neural Networks erreichten. Außerdem wurde gezeigt, dass die Wahl der Frequenz- und Zeitaggregationsschicht die wichtigste architektonische Entscheidung ist. Studie (C) war eine detaillierte Untersuchung von Computer Vision-ähnlichen Convolutional Neural Networks für die Klassifizierung von Säuglingslauten. Ziel war es, die wichtigsten architektonischen Eigenschaften zur Steigerung der Erkennungsperformance zu bestimmen. Die Ergebnisse bestätigten die Bedeutung der Aggregationsschicht. Zusätzlich Eigenschaften, die als wichtig identifiziert wurden, waren die Eingangsgröße der vollständig verbundenen Schichten und das akkumulierte rezeptive Feld. Studie (D) war eine Untersuchung zur Kompensation der Klassenimbalance zur Erkennung von Schimpansenrufen in Langzeitaufnahmen. Ziel war es, herauszufinden, welche Kompensationsmethode aus einer Menge ausgewählter Methoden die Performance eines Deep Learning Systems am meisten verbessert. Die Ergebnisse zeigten, dass Spektrogrammentrauschen am effektivsten war, während Methoden zur Kompensation des relativen Ungleichgewichts die Performance entweder gleichhielten oder verringerten.:1. Introduction 2. Foundations in Automatic Recognition of Acoustic Communication 3. State of Research 4. Study (A): Investigation of the Assessment of Infant Vocalizations by Laypersons 5. Study (B): Comparison of Neural Network Types for Automatic Classification of Infant Vocalizations 6. Study (C): Detailed Investigation of CNNs for Automatic Classification of Infant Vocalizations 7. Study (D): Compensating Class Imbalance for Acoustic Chimpanzee Detection With Convolutional Recurrent Neural Networks 8. Conclusion and Collected Discussion 9. Appendi

    A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models

    Full text link
    Prompt engineering is a technique that involves augmenting a large pre-trained model with task-specific hints, known as prompts, to adapt the model to new tasks. Prompts can be created manually as natural language instructions or generated automatically as either natural language instructions or vector representations. Prompt engineering enables the ability to perform predictions based solely on prompts without updating model parameters, and the easier application of large pre-trained models in real-world tasks. In past years, Prompt engineering has been well-studied in natural language processing. Recently, it has also been intensively studied in vision-language modeling. However, there is currently a lack of a systematic overview of prompt engineering on pre-trained vision-language models. This paper aims to provide a comprehensive survey of cutting-edge research in prompt engineering on three types of vision-language models: multimodal-to-text generation models (e.g. Flamingo), image-text matching models (e.g. CLIP), and text-to-image generation models (e.g. Stable Diffusion). For each type of model, a brief model summary, prompting methods, prompting-based applications, and the corresponding responsibility and integrity issues are summarized and discussed. Furthermore, the commonalities and differences between prompting on vision-language models, language models, and vision models are also discussed. The challenges, future directions, and research opportunities are summarized to foster future research on this topic

    A Systematic Review of Robustness in Deep Learning for Computer Vision: Mind the gap?

    Full text link
    Deep neural networks for computer vision are deployed in increasingly safety-critical and socially-impactful applications, motivating the need to close the gap in model performance under varied, naturally occurring imaging conditions. Robustness, ambiguously used in multiple contexts including adversarial machine learning, refers here to preserving model performance under naturally-induced image corruptions or alterations. We perform a systematic review to identify, analyze, and summarize current definitions and progress towards non-adversarial robustness in deep learning for computer vision. We find this area of research has received disproportionately less attention relative to adversarial machine learning, yet a significant robustness gap exists that manifests in performance degradation similar in magnitude to adversarial conditions. Toward developing a more transparent definition of robustness, we provide a conceptual framework based on a structural causal model of the data generating process and interpret non-adversarial robustness as pertaining to a model's behavior on corrupted images corresponding to low-probability samples from the unaltered data distribution. We identify key architecture-, data augmentation-, and optimization tactics for improving neural network robustness. This robustness perspective reveals that common practices in the literature correspond to causal concepts. We offer perspectives on how future research may mind this evident and significant non-adversarial robustness gap

    Comparative Analysis of MFO, GWO and GSO for Classification of Covid-19 Chest X-Ray Images

    Get PDF
    تلعب الصور الطبية دورًا حاسمًا في تصنيف الأمراض والحالات المختلفة. إحدى طرق التصوير هي الأشعة السينية التي توفر معلومات بصرية قيمة تساعد في تحديد وتوصيف مختلف الحالات الطبية. لطالما استخدمت الصور الشعاعية للصدر (CXR) لفحص ومراقبة العديد من اضطرابات الرئة، مثل السل والالتهاب الرئوي وانخماص الرئة والفتق. يمكن الكشف عن COVID-19 باستخدام صور CXR أيضًا. تم اكتشاف COVID-19، وهو فيروس يسبب التهابات في الرئتين والممرات الهوائية في الجهاز التنفسي العلوي، لأول مرة في عام 2019 في مقاطعة ووهان بالصين، ومنذ ذلك الحين يُعتقد أنه يتسبب في تلف كبير في مجرى الهواء، مما يؤثر بشدة على رئة الأشخاص المصابين. انتشر الفيروس بسرعة في جميع أنحاء العالم، وتم تسجيل الكثير من الوفيات والحالات المتزايدة بشكل يومي. يمكن استخدام CXR لمراقبة آثار COVID-19 على أنسجة الرئة. تبحث هذه الدراسة في تحليل مقارنة لأقرب جيران k (KNN)، و Extreme Gradient Boosting (XGboost)، و Support-Vector Machine (SVM)، وهي بعض مناهج التصنيف لاختيار الميزات في هذا المجال باستخدام خوارزمية Moth-Flame Optimization (MFO)، وخوارزمية Gray Wolf Optimizer (GWO)، وخوارزمية Glowworm Swarm Optimization (GSO). في هذه الدراسة، استخدم الباحثون مجموعة بيانات تتكون من مجموعتين على النحو التالي: 9544 صورة بالأشعة السينية ثنائية الأبعاد، والتي تم تصنيفها إلى مجموعتين باستخدام اختبارات التحقق من صحتها: 5500 صورة لرئتين سليمتين و4044 صورة للرئتين مع COVID-19. تتضمن المجموعة الثانية 800 صورة و400 صورة لرئتين سليمتين و400 رئة مصابة بـ COVID-19. تم تغيير حجم كل صورة إلى 200 × 200 بكسل. كانت الدقة والاستدعاء ودرجة F1 من بين معايير التقييم الكمي المستخدمة في هذه الدراسة.Medical images play a crucial role in the classification of various diseases and conditions. One of the imaging modalities is X-rays which provide valuable visual information that helps in the identification and characterization of various medical conditions. Chest radiograph (CXR) images have long been used to examine and monitor numerous lung disorders, such as tuberculosis, pneumonia, atelectasis, and hernia. COVID-19 detection can be accomplished using CXR images as well. COVID-19, a virus that causes infections in the lungs and the airways of the upper respiratory tract, was first discovered in 2019 in Wuhan Province, China, and has since been thought to cause substantial airway damage, badly impacting the lungs of affected persons. The virus was swiftly gone viral around the world and a lot of fatalities and cases growing were recorded on a daily basis. CXR can be used to monitor the effects of COVID-19 on lung tissue. This study examines a comparison analysis of k-nearest neighbors (KNN), Extreme Gradient Boosting (XGboost), and Support-Vector Machine (SVM) are some classification approaches for feature selection in this domain using The Moth-Flame Optimization algorithm (MFO), The Grey Wolf Optimizer algorithm (GWO), and The Glowworm Swarm Optimization algorithm (GSO). For this study, researchers employed a data set consisting of two sets as follows: 9,544 2D X-ray images, which were classified into two sets utilizing validated tests: 5,500 images of healthy lungs and 4,044 images of lungs with COVID-19. The second set includes 800 images, 400 of healthy lungs and 400 of lungs affected with COVID-19. Each image has been resized to 200x200 pixels. Precision, recall, and the F1-score were among the quantitative evaluation criteria used in this study

    Deep representation learning: Fundamentals, Perspectives, Applications, and Open Challenges

    Full text link
    Machine Learning algorithms have had a profound impact on the field of computer science over the past few decades. These algorithms performance is greatly influenced by the representations that are derived from the data in the learning process. The representations learned in a successful learning process should be concise, discrete, meaningful, and able to be applied across a variety of tasks. A recent effort has been directed toward developing Deep Learning models, which have proven to be particularly effective at capturing high-dimensional, non-linear, and multi-modal characteristics. In this work, we discuss the principles and developments that have been made in the process of learning representations, and converting them into desirable applications. In addition, for each framework or model, the key issues and open challenges, as well as the advantages, are examined

    Intelligent Biosignal Processing in Wearable and Implantable Sensors

    Get PDF
    This reprint provides a collection of papers illustrating the state-of-the-art of smart processing of data coming from wearable, implantable or portable sensors. Each paper presents the design, databases used, methodological background, obtained results, and their interpretation for biomedical applications. Revealing examples are brain–machine interfaces for medical rehabilitation, the evaluation of sympathetic nerve activity, a novel automated diagnostic tool based on ECG data to diagnose COVID-19, machine learning-based hypertension risk assessment by means of photoplethysmography and electrocardiography signals, Parkinsonian gait assessment using machine learning tools, thorough analysis of compressive sensing of ECG signals, development of a nanotechnology application for decoding vagus-nerve activity, detection of liver dysfunction using a wearable electronic nose system, prosthetic hand control using surface electromyography, epileptic seizure detection using a CNN, and premature ventricular contraction detection using deep metric learning. Thus, this reprint presents significant clinical applications as well as valuable new research issues, providing current illustrations of this new field of research by addressing the promises, challenges, and hurdles associated with the synergy of biosignal processing and AI through 16 different pertinent studies. Covering a wide range of research and application areas, this book is an excellent resource for researchers, physicians, academics, and PhD or master students working on (bio)signal and image processing, AI, biomaterials, biomechanics, and biotechnology with applications in medicine
    corecore