25,646 research outputs found

    CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap

    Get PDF
    After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in multimedia search engines, we have identified and analyzed gaps within European research effort during our second year. In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio- economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal challenges

    Multimodal Based Audio-Visual Speech Recognition for Hard-of-Hearing: State of the Art Techniques and Challenges

    Get PDF
    Multimodal Integration (MI) is the study of merging the knowledge acquired by the nervous system using sensory modalities such as speech, vision, touch, and gesture. The applications of MI expand over the areas of Audio-Visual Speech Recognition (AVSR), Sign Language Recognition (SLR), Emotion Recognition (ER), Bio Metrics Applications (BMA), Affect Recognition (AR), Multimedia Retrieval (MR), etc. The fusion of modalities such as hand gestures- facial, lip- hand position, etc., are mainly used sensory modalities for the development of hearing-impaired multimodal systems. This paper encapsulates an overview of multimodal systems available within literature towards hearing impaired studies. This paper also discusses some of the studies related to hearing-impaired acoustic analysis. It is observed that very less algorithms have been developed for hearing impaired AVSR as compared to normal hearing. Thus, the study of audio-visual based speech recognition systems for the hearing impaired is highly demanded for the people who are trying to communicate with natively speaking languages.  This paper also highlights the state-of-the-art techniques in AVSR and the challenges faced by the researchers for the development of AVSR systems

    The direction of technical change in AI and the trajectory effects of government funding

    Get PDF
    Government funding of innovation can have a significant impact not only on the rate of technical change, but also on its direction. In this paper, we examine the role that government grants and government departments played in the development of artificial intelligence (AI), an emergent general purpose technology with the potential to revolutionize many aspects of the economy and society. We analyze all AI patents filed at the US Patent and Trademark Office and develop network measures that capture each patentñ€ℱs influence on all possible sequences of follow-on innovation. By identifying the effect of patents on technological trajectories, we are able to account for the long-term cumulative impact of new knowledge that is not captured by standard patent citation measures. We show that patents funded by government grants, but above all patents filed by federal agencies and state departments, profoundly influenced the development of AI. These long-term effects were especially significant in early phases, and weakened over time as private incentives took over. These results are robust to alternative specifications and controlling for endogeneity

    Special Libraries, December 1954

    Get PDF
    Volume 45, Issue 10https://scholarworks.sjsu.edu/sla_sl_1954/1009/thumbnail.jp

    Challenging Social Media Threats using Collective Well-being Aware Recommendation Algorithms and an Educational Virtual Companion

    Full text link
    Social media (SM) have become an integral part of our lives, expanding our inter-linking capabilities to new levels. There is plenty to be said about their positive effects. On the other hand however, some serious negative implications of SM have repeatedly been highlighted in recent years, pointing at various SM threats for society, and its teenagers in particular: from common issues (e.g. digital addiction and polarization) and manipulative influences of algorithms to teenager-specific issues (e.g. body stereotyping). The full impact of current SM platform design -- both at an individual and societal level -- asks for a comprehensive evaluation and conceptual improvement. We extend measures of Collective Well-Being (CWB) to SM communities. As users' relationships and interactions are a central component of CWB, education is crucial to improve CWB. We thus propose a framework based on an adaptive "social media virtual companion" for educating and supporting the entire students' community to interact with SM. The virtual companion will be powered by a Recommender System (CWB-RS) that will optimize a CWB metric instead of engagement or platform profit, which currently largely drives recommender systems thereby disregarding any societal collateral effect. CWB-RS will optimize CWB both in the short term, by balancing the level of SM threat the students are exposed to, as well as in the long term, by adopting an Intelligent Tutor System role and enabling adaptive and personalized sequencing of playful learning activities. This framework offers an initial step on understanding how to design SM systems and embedded educational interventions that favor a more healthy and positive society

    Automatic Recognition of Non-Verbal Acoustic Communication Events With Neural Networks

    Get PDF
    Non-verbal acoustic communication is of high importance to humans and animals: Infants use the voice as a primary communication tool. Animals of all kinds employ acoustic communication, such as chimpanzees, which use pant-hoot vocalizations for long-distance communication. Many applications require the assessment of such communication for a variety of analysis goals. Computational systems can support these areas through automatization of the assessment process. This is of particular importance in monitoring scenarios over large spatial and time scales, which are infeasible to perform manually. Algorithms for sound recognition have traditionally been based on conventional machine learning approaches. In recent years, so-called representation learning approaches have gained increasing popularity. This particularly includes deep learning approaches that feed raw data to deep neural networks. However, there remain open challenges in applying these approaches to automatic recognition of non-verbal acoustic communication events, such as compensating for small data set sizes. The leading question of this thesis is: How can we apply deep learning more effectively to automatic recognition of non-verbal acoustic communication events? The target communication types were specifically (1) infant vocalizations and (2) chimpanzee long-distance calls. This thesis comprises four studies that investigated aspects of this question: Study (A) investigated the assessment of infant vocalizations by laypersons. The central goal was to derive an infant vocalization classification scheme based on the laypersons' perception. The study method was based on the Nijmegen Protocol, where participants rated vocalization recordings through various items, such as affective ratings and class labels. Results showed a strong association between valence ratings and class labels, which was used to derive a classification scheme. Study (B) was a comparative study on various neural network types for the automatic classification of infant vocalizations. The goal was to determine the best performing network type among the currently most prevailing ones, while considering the influence of their architectural configuration. Results showed that convolutional neural networks outperformed recurrent neural networks and that the choice of the frequency and time aggregation layer inside the network is the most important architectural choice. Study (C) was a detailed investigation on computer vision-like convolutional neural networks for infant vocalization classification. The goal was to determine the most important architectural properties for increasing classification performance. Results confirmed the importance of the aggregation layer and additionally identified the input size of the fully-connected layers and the accumulated receptive field to be of major importance. Study (D) was an investigation on compensating class imbalance for chimpanzee call detection in naturalistic long-term recordings. The goal was to determine which compensation method among a selected group improved performance the most for a deep learning system. Results showed that spectrogram denoising was most effective, while methods for compensating relative imbalance either retained or decreased performance.:1. Introduction 2. Foundations in Automatic Recognition of Acoustic Communication 3. State of Research 4. Study (A): Investigation of the Assessment of Infant Vocalizations by Laypersons 5. Study (B): Comparison of Neural Network Types for Automatic Classification of Infant Vocalizations 6. Study (C): Detailed Investigation of CNNs for Automatic Classification of Infant Vocalizations 7. Study (D): Compensating Class Imbalance for Acoustic Chimpanzee Detection With Convolutional Recurrent Neural Networks 8. Conclusion and Collected Discussion 9. AppendixNonverbale akustische Kommunikation ist fĂŒr Menschen und Tiere von großer Bedeutung: SĂ€uglinge nutzen die Stimme als primĂ€res Kommunikationsmittel. Schimpanse verwenden sogenannte 'Pant-hoots' und Trommeln zur Kommunikation ĂŒber weite Entfernungen. Viele Anwendungen erfordern die Beurteilung solcher Kommunikation fĂŒr verschiedenste Analyseziele. Algorithmen können solche Bereiche durch die Automatisierung der Beurteilung unterstĂŒtzen. Dies ist besonders wichtig beim Monitoring langer Zeitspannen oder großer Gebiete, welche manuell nicht durchfĂŒhrbar sind. Algorithmen zur GerĂ€uscherkennung verwendeten bisher grĂ¶ĂŸtenteils konventionelle AnsĂ€tzen des maschinellen Lernens. In den letzten Jahren hat eine alternative Herangehensweise PopularitĂ€t gewonnen, das sogenannte Representation Learning. Dazu gehört insbesondere Deep Learning, bei dem Rohdaten in tiefe neuronale Netze eingespeist werden. Jedoch gibt es bei der Anwendung dieser AnsĂ€tze auf die automatische Erkennung von nonverbaler akustischer Kommunikation ungelöste Herausforderungen, wie z.B. die Kompensation der relativ kleinen Datenmengen. Die Leitfrage dieser Arbeit ist: Wie können wir Deep Learning effektiver zur automatischen Erkennung nonverbaler akustischer Kommunikation verwenden? Diese Arbeit konzentriert sich speziell auf zwei Kommunikationsarten: (1) vokale Laute von SĂ€uglingen (2) Langstreckenrufe von Schimpansen. Diese Arbeit umfasst vier Studien, welche Aspekte dieser Frage untersuchen: Studie (A) untersuchte die Beurteilung von SĂ€uglingslauten durch Laien. Zentrales Ziel war die Ableitung eines Klassifikationsschemas fĂŒr SĂ€uglingslaute auf der Grundlage der Wahrnehmung von Laien. Die Untersuchungsmethode basierte auf dem sogenannten Nijmegen-Protokoll. Hier beurteilten die Teilnehmenden Lautaufnahmen von SĂ€uglingen anhand verschiedener Variablen, wie z.B. affektive Bewertungen und Klassenbezeichnungen. Die Ergebnisse zeigten eine starke Assoziation zwischen Valenzbewertungen und Klassenbezeichnungen, die zur Ableitung eines Klassifikationsschemas verwendet wurde. Studie (B) war eine vergleichende Studie verschiedener Typen neuronaler Netzwerke fĂŒr die automatische Klassifizierung von SĂ€uglingslauten. Ziel war es, den leistungsfĂ€higsten Netzwerktyp unter den momentan verbreitetsten Typen zu ermitteln. Hierbei wurde der Einfluss verschiedener architektonischer Konfigurationen innerhalb der Typen berĂŒcksichtigt. Die Ergebnisse zeigten, dass Convolutional Neural Networks eine höhere Performance als Recurrent Neural Networks erreichten. Außerdem wurde gezeigt, dass die Wahl der Frequenz- und Zeitaggregationsschicht die wichtigste architektonische Entscheidung ist. Studie (C) war eine detaillierte Untersuchung von Computer Vision-Ă€hnlichen Convolutional Neural Networks fĂŒr die Klassifizierung von SĂ€uglingslauten. Ziel war es, die wichtigsten architektonischen Eigenschaften zur Steigerung der Erkennungsperformance zu bestimmen. Die Ergebnisse bestĂ€tigten die Bedeutung der Aggregationsschicht. ZusĂ€tzlich Eigenschaften, die als wichtig identifiziert wurden, waren die EingangsgrĂ¶ĂŸe der vollstĂ€ndig verbundenen Schichten und das akkumulierte rezeptive Feld. Studie (D) war eine Untersuchung zur Kompensation der Klassenimbalance zur Erkennung von Schimpansenrufen in Langzeitaufnahmen. Ziel war es, herauszufinden, welche Kompensationsmethode aus einer Menge ausgewĂ€hlter Methoden die Performance eines Deep Learning Systems am meisten verbessert. Die Ergebnisse zeigten, dass Spektrogrammentrauschen am effektivsten war, wĂ€hrend Methoden zur Kompensation des relativen Ungleichgewichts die Performance entweder gleichhielten oder verringerten.:1. Introduction 2. Foundations in Automatic Recognition of Acoustic Communication 3. State of Research 4. Study (A): Investigation of the Assessment of Infant Vocalizations by Laypersons 5. Study (B): Comparison of Neural Network Types for Automatic Classification of Infant Vocalizations 6. Study (C): Detailed Investigation of CNNs for Automatic Classification of Infant Vocalizations 7. Study (D): Compensating Class Imbalance for Acoustic Chimpanzee Detection With Convolutional Recurrent Neural Networks 8. Conclusion and Collected Discussion 9. Appendi
    • 

    corecore