25,646 research outputs found
CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap
After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in
multimedia search engines, we have identified and analyzed gaps within European research effort during our second year.
In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio-
economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown
of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on
requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the
community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our
Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as
National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core
technological gaps that involve research challenges, and âenablersâ, which are not necessarily technical research
challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal
challenges
Multimodal Based Audio-Visual Speech Recognition for Hard-of-Hearing: State of the Art Techniques and Challenges
Multimodal Integration (MI) is the study of merging the knowledge acquired by the nervous system using sensory modalities such as speech, vision, touch, and gesture. The applications of MI expand over the areas of Audio-Visual Speech Recognition (AVSR), Sign Language Recognition (SLR), Emotion Recognition (ER), Bio Metrics Applications (BMA), Affect Recognition (AR), Multimedia Retrieval (MR), etc. The fusion of modalities such as hand gestures- facial, lip- hand position, etc., are mainly used sensory modalities for the development of hearing-impaired multimodal systems. This paper encapsulates an overview of multimodal systems available within literature towards hearing impaired studies. This paper also discusses some of the studies related to hearing-impaired acoustic analysis. It is observed that very less algorithms have been developed for hearing impaired AVSR as compared to normal hearing. Thus, the study of audio-visual based speech recognition systems for the hearing impaired is highly demanded for the people who are trying to communicate with natively speaking languages. This paper also highlights the state-of-the-art techniques in AVSR and the challenges faced by the researchers for the development of AVSR systems
The direction of technical change in AI and the trajectory effects of government funding
Government funding of innovation can have a significant impact not only on the rate of technical change, but also on its direction. In this paper, we examine the role that government grants and government departments played in the development of artificial intelligence (AI), an emergent general purpose technology with the potential to revolutionize many aspects of the economy and society. We analyze all AI patents filed at the US Patent and Trademark Office and develop network measures that capture each patentĂąâŹâąs influence on all possible sequences of follow-on innovation. By identifying the effect of patents on technological trajectories, we are able to account for the long-term cumulative impact of new knowledge that is not captured by standard patent citation measures. We show that patents funded by government grants, but above all patents filed by federal agencies and state departments, profoundly influenced the development of AI. These long-term effects were especially significant in early phases, and weakened over time as private incentives took over. These results are robust to alternative specifications and controlling for endogeneity
Special Libraries, December 1954
Volume 45, Issue 10https://scholarworks.sjsu.edu/sla_sl_1954/1009/thumbnail.jp
Challenging Social Media Threats using Collective Well-being Aware Recommendation Algorithms and an Educational Virtual Companion
Social media (SM) have become an integral part of our lives, expanding our
inter-linking capabilities to new levels. There is plenty to be said about
their positive effects. On the other hand however, some serious negative
implications of SM have repeatedly been highlighted in recent years, pointing
at various SM threats for society, and its teenagers in particular: from common
issues (e.g. digital addiction and polarization) and manipulative influences of
algorithms to teenager-specific issues (e.g. body stereotyping). The full
impact of current SM platform design -- both at an individual and societal
level -- asks for a comprehensive evaluation and conceptual improvement. We
extend measures of Collective Well-Being (CWB) to SM communities. As users'
relationships and interactions are a central component of CWB, education is
crucial to improve CWB. We thus propose a framework based on an adaptive
"social media virtual companion" for educating and supporting the entire
students' community to interact with SM. The virtual companion will be powered
by a Recommender System (CWB-RS) that will optimize a CWB metric instead of
engagement or platform profit, which currently largely drives recommender
systems thereby disregarding any societal collateral effect. CWB-RS will
optimize CWB both in the short term, by balancing the level of SM threat the
students are exposed to, as well as in the long term, by adopting an
Intelligent Tutor System role and enabling adaptive and personalized sequencing
of playful learning activities. This framework offers an initial step on
understanding how to design SM systems and embedded educational interventions
that favor a more healthy and positive society
Automatic Recognition of Non-Verbal Acoustic Communication Events With Neural Networks
Non-verbal acoustic communication is of high importance to humans and animals: Infants use the voice as a primary communication tool. Animals of all kinds employ acoustic communication, such as chimpanzees, which use pant-hoot vocalizations for long-distance communication.
Many applications require the assessment of such communication for a variety of analysis goals. Computational systems can support these areas through automatization of the assessment process. This is of particular importance in monitoring scenarios over large spatial and time scales, which are infeasible to perform manually.
Algorithms for sound recognition have traditionally been based on conventional machine learning approaches. In recent years, so-called representation learning approaches have gained increasing popularity. This particularly includes deep learning approaches that feed raw data to deep neural networks. However, there remain open challenges in applying these approaches to automatic recognition of non-verbal acoustic communication events, such as compensating for small data set sizes.
The leading question of this thesis is: How can we apply deep learning more effectively to automatic recognition of non-verbal acoustic communication events? The target communication types were specifically (1) infant vocalizations and (2) chimpanzee long-distance calls.
This thesis comprises four studies that investigated aspects of this question:
Study (A) investigated the assessment of infant vocalizations by laypersons. The central goal was to derive an infant vocalization classification scheme based on the laypersons' perception. The study method was based on the Nijmegen Protocol, where participants rated vocalization recordings through various items, such as affective ratings and class labels. Results showed a strong association between valence ratings and class labels, which was used to derive a classification scheme.
Study (B) was a comparative study on various neural network types for the automatic classification of infant vocalizations. The goal was to determine the best performing network type among the currently most prevailing ones, while considering the influence of their architectural configuration. Results showed that convolutional neural networks outperformed recurrent neural networks and that the choice of the frequency and time aggregation layer inside the network is the most important architectural choice.
Study (C) was a detailed investigation on computer vision-like convolutional neural networks for infant vocalization classification. The goal was to determine the most important architectural properties for increasing classification performance. Results confirmed the importance of the aggregation layer and additionally identified the input size of the fully-connected layers and the accumulated receptive field to be of major importance.
Study (D) was an investigation on compensating class imbalance for chimpanzee call detection in naturalistic long-term recordings. The goal was to determine which compensation method among a selected group improved performance the most for a deep learning system. Results showed that spectrogram denoising was most effective, while methods for compensating relative imbalance either retained or decreased performance.:1. Introduction
2. Foundations in Automatic Recognition of Acoustic Communication
3. State of Research
4. Study (A): Investigation of the Assessment of Infant Vocalizations by Laypersons
5. Study (B): Comparison of Neural Network Types for Automatic Classification of Infant Vocalizations
6. Study (C): Detailed Investigation of CNNs for Automatic Classification of Infant Vocalizations
7. Study (D): Compensating Class Imbalance for Acoustic Chimpanzee Detection With Convolutional Recurrent Neural Networks
8. Conclusion and Collected Discussion
9. AppendixNonverbale akustische Kommunikation ist fĂŒr Menschen und Tiere von groĂer Bedeutung: SĂ€uglinge nutzen die Stimme als primĂ€res Kommunikationsmittel. Schimpanse verwenden sogenannte 'Pant-hoots' und Trommeln zur Kommunikation ĂŒber weite Entfernungen.
Viele Anwendungen erfordern die Beurteilung solcher Kommunikation fĂŒr verschiedenste Analyseziele. Algorithmen können solche Bereiche durch die Automatisierung der Beurteilung unterstĂŒtzen. Dies ist besonders wichtig beim Monitoring langer Zeitspannen oder groĂer Gebiete, welche manuell nicht durchfĂŒhrbar sind.
Algorithmen zur GerĂ€uscherkennung verwendeten bisher gröĂtenteils konventionelle AnsĂ€tzen des maschinellen Lernens. In den letzten Jahren hat eine alternative Herangehensweise PopularitĂ€t gewonnen, das sogenannte Representation Learning. Dazu gehört insbesondere Deep Learning, bei dem Rohdaten in tiefe neuronale Netze eingespeist werden. Jedoch gibt es bei der Anwendung dieser AnsĂ€tze auf die automatische Erkennung von nonverbaler akustischer Kommunikation ungelöste Herausforderungen, wie z.B. die Kompensation der relativ kleinen Datenmengen.
Die Leitfrage dieser Arbeit ist: Wie können wir Deep Learning effektiver zur automatischen Erkennung nonverbaler akustischer Kommunikation verwenden? Diese Arbeit konzentriert sich speziell auf zwei Kommunikationsarten: (1) vokale Laute von SÀuglingen (2) Langstreckenrufe von Schimpansen.
Diese Arbeit umfasst vier Studien, welche Aspekte dieser Frage untersuchen:
Studie (A) untersuchte die Beurteilung von SĂ€uglingslauten durch Laien. Zentrales Ziel war die Ableitung eines Klassifikationsschemas fĂŒr SĂ€uglingslaute auf der Grundlage der Wahrnehmung von Laien. Die Untersuchungsmethode basierte auf dem sogenannten Nijmegen-Protokoll. Hier beurteilten die Teilnehmenden Lautaufnahmen von SĂ€uglingen anhand verschiedener Variablen, wie z.B. affektive Bewertungen und Klassenbezeichnungen. Die Ergebnisse zeigten eine starke Assoziation zwischen Valenzbewertungen und Klassenbezeichnungen, die zur Ableitung eines Klassifikationsschemas verwendet wurde.
Studie (B) war eine vergleichende Studie verschiedener Typen neuronaler Netzwerke fĂŒr die automatische Klassifizierung von SĂ€uglingslauten. Ziel war es, den leistungsfĂ€higsten Netzwerktyp unter den momentan verbreitetsten Typen zu ermitteln. Hierbei wurde der Einfluss verschiedener architektonischer Konfigurationen innerhalb der Typen berĂŒcksichtigt. Die Ergebnisse zeigten, dass Convolutional Neural Networks eine höhere Performance als Recurrent Neural Networks erreichten. AuĂerdem wurde gezeigt, dass die Wahl der Frequenz- und Zeitaggregationsschicht die wichtigste architektonische Entscheidung ist.
Studie (C) war eine detaillierte Untersuchung von Computer Vision-Ă€hnlichen Convolutional Neural Networks fĂŒr die Klassifizierung von SĂ€uglingslauten. Ziel war es, die wichtigsten architektonischen Eigenschaften zur Steigerung der Erkennungsperformance zu bestimmen. Die Ergebnisse bestĂ€tigten die Bedeutung der Aggregationsschicht. ZusĂ€tzlich Eigenschaften, die als wichtig identifiziert wurden, waren die EingangsgröĂe der vollstĂ€ndig verbundenen Schichten und das akkumulierte rezeptive Feld.
Studie (D) war eine Untersuchung zur Kompensation der Klassenimbalance zur Erkennung von Schimpansenrufen in Langzeitaufnahmen. Ziel war es, herauszufinden, welche Kompensationsmethode aus einer Menge ausgewÀhlter Methoden die Performance eines Deep Learning Systems am meisten verbessert. Die Ergebnisse zeigten, dass Spektrogrammentrauschen am effektivsten war, wÀhrend Methoden zur Kompensation des relativen Ungleichgewichts die Performance entweder gleichhielten oder verringerten.:1. Introduction
2. Foundations in Automatic Recognition of Acoustic Communication
3. State of Research
4. Study (A): Investigation of the Assessment of Infant Vocalizations by Laypersons
5. Study (B): Comparison of Neural Network Types for Automatic Classification of Infant Vocalizations
6. Study (C): Detailed Investigation of CNNs for Automatic Classification of Infant Vocalizations
7. Study (D): Compensating Class Imbalance for Acoustic Chimpanzee Detection With Convolutional Recurrent Neural Networks
8. Conclusion and Collected Discussion
9. Appendi
- âŠ