Search CORE

76 research outputs found

Embodied Processing at Six Linguistic Granularity Levels: A Consensus Paper

Author: Anita Körner
Arthur M. Glenberg
Fritz Günther
James P. Trujillo
Linda Drijvers
Luca Rinaldi
Marco Marelli
Martin H. Fischer
Mauricio Castillo
Oksana Tsaregorodtseva
Olesia Platonova
Samuel Shaki
Publication venue: Ubiquity Press
Publication date: 01/10/2023
Field of study

Language processing is influenced by sensorimotor experiences. Here, we review behavioral evidence for embodied and grounded influences in language processing across six linguistic levels of granularity. We examine (a) sub-word features, discussing grounded influences on iconicity (systematic associations between word form and meaning); (b) words, discussing boundary conditions and generalizations for the simulation of color, sensory modality, and spatial position; (c) sentences, discussing boundary conditions and applications of action direction simulation; (d) texts, discussing how the teaching of simulation can improve comprehension in beginning readers; (e) conversations, discussing how multi-modal cues improve turn taking and alignment; and (f) text corpora, discussing how distributional semantic models can reveal how grounded and embodied knowledge is encoded in texts. These approaches are converging on a convincing account of the psychology of language, but at the same time, there are important criticisms of the embodied approach and of specific experimental paradigms. The surest way forward requires the adoption of a wide array of scientific methods. By providing complimentary evidence, a combination of multiple methods on various levels of granularity can help us gain a more complete understanding of the role of embodiment and grounding in language processing

Directory of Open Access Journals

Speech Recognition

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

Directory of Open Access Books (DOAB)

Modelling multimodal language processing

Author: Smith A.
Publication venue: Radboud University Nijmegen
Publication date: 01/01/2015
Field of study

MPG.PuRe

Attention Mechanism for Adaptive Feature Modelling

Author: DUAN HAORAN
Publication venue
Publication date: 01/01/2024
Field of study

This thesis presents groundbreaking contributions in machine learning by exploring and advancing attention mechanisms within deep learning frameworks. We introduce innovative models and techniques that significantly enhance feature recognition and analysis in two key application areas: computer vision recognition and time series modeling. Our primary contributions include the development of a dual attention mechanism for crowd counting and the integration of supervised and unsupervised learning techniques for semi-supervised learning. Furthermore, we propose a novel Dynamic Unary Convolution in Transformer (DUCT) model for generalized visual recognition tasks, and investigate the efficacy of attention mechanisms in human activity recognition using time series data from wearable sensors based on the semi-supervised setting. The capacity of humans to selectively focus on specific elements within complex scenes has long inspired machine learning research. Attention mechanisms, which dynamically modify weights to emphasize different input elements, are central to replicating this human perceptual ability in deep learning. These mechanisms have proven crucial in achieving significant advancements across various tasks. In this thesis, we first provide a comprehensive review of the existing literature on attention mechanisms. We then introduce a dual attention mechanism for crowd counting, which employs both second-order and first-order attention to enhance spatial information processing and feature distinction. Additionally, we explore the convergence of supervised and unsupervised learning, focusing on a novel semi-supervised method that synergizes labeled and unlabeled data through an attention-driven recurrent unit and dual loss functions. This method aims to refine crowd counting in practical transportation scenarios. Moreover, our research extends to a hybrid attention model for broader visual recognition challenges. By merging convolutional and transformer layers, this model adeptly handles multi-level features, where the DUCT modules play a pivotal role. We rigorously evaluate DUCT's performance across critical computer vision tasks. Finally, recognizing the significance of time series data in domains like health surveillance, we apply our proposed attention mechanism to human activity recognition, analyzing correlations between various daily activities to enhance the adaptability of deep learning frameworks to temporal dynamics

Durham e-Theses

Proceedings of the Sixteenth Australasian International Conference on Speech Science and Technology

Author
Publication venue: ASSTA
Publication date: 31/12/2016
Field of study

UCL Discovery

Computer analysis of children's non-native English speech for language learning and assessment

Author: Qian Mengjie
Publication venue
Publication date: 08/12/2021
Field of study

Children's ASR appears to be more challenging than adults' and it's even more diﬃcult when it comes to non-native children's speech. This research investigates diﬀerent techniques to compensate for the eﬀects of non-native and children on the performance of ASR systems. The study mainly utilises hybrid DNN-HMM systems with conventional DNNs, LSTMs and more advanced TDNN models. This work uses the CALL-ST corpus and TLT-school corpus to study children's non-native English speech. Initially, data augmentation was explored on the CALL-ST corpus to address the lack of data problem using the AMI corpus and PF-STAR German corpus. Feature selection, acoustic model adaptation and selection were also investigated on CALL-ST. More aspects of the ASR system, including pronunciation modelling, acoustic modelling, language modelling and system fusion, were explored on the TLT-school corpus as this corpus has a bigger amount of data. Then, the relationships between the CALL-ST and TLT-school corpora were studied and utilised to improve ASR performance. The other part of the present work is text processing for non-native children's English speech. We focused on providing accept/reject feedback to learners based on the text generated by the ASR system from learners' spoken responses. A rule-based and a machine learning-based system were proposed for making the judgement, several aspects of the systems were evaluated. The inﬂuence of the ASR system on the text processing system was explored

University of Birmingham Research Archive, E-theses Repository

Integrating Gestures

Author
Publication venue: 'John Benjamins Publishing Company'
Publication date: 01/04/2020
Field of study

Gestures convey information about culture, discourse, thought, intentionality, emotion, intersubjectivity, cognition, and first and second language acquisition. Additionally, they are used by non-human primates to communicate with their peers and with humans. Consequently, the modern field of gesture studies has attracted researchers from a number of different disciplines such as anthropology, cognitive science, communication, neuroscience, psycholinguistics, primatology, psychology, robotics, sociology and semiotics. This volume presents an overview of the depth and breadth of current research in gesture. Its focus is on the interdisciplinary nature of gesture. The chapters included in the volume are divided into six themes: the nature and functions of gesture, first language development and gesture, second language effects on gesture, gesture in the classroom and in problem solving, gesture aspects of discourse and interaction, and gestural analysis of music and dance

Directory of Open Access Books (DOAB)

Semantic radical consistency and character transparency effects in Chinese: an ERP study

Author: Su IF
Weekes BS
Publication venue: 'United States Sports Academy'
Publication date: 01/01/2009
Field of study

BACKGROUND: This event-related potential (ERP) study aims to investigate the representation and temporal dynamics of Chinese orthography-to-semantics mappings by simultaneously manipulating character transparency and semantic radical consistency. Character components, referred to as radicals, make up the building blocks used dur...postprin

HKU Scholars Hub

Multi-modal surrogates for retrieving and making sense of videos: is synchronization between the multiple modalities optimal?

Author: Song Yaxiao
Publication venue: University of North Carolina at Chapel Hill
Publication date: 01/12/2010
Field of study

Video surrogates can help people quickly make sense of the content of a video before downloading or seeking more detailed information. Visual and audio features of a video are primary information carriers and might become important components of video retrieval and video sense-making. In the past decades, most research and development efforts on video surrogates have focused on visual features of the video, and comparatively little work has been done on audio surrogates and examining their pros and cons in aiding users' retrieval and sense-making of digital videos. Even less work has been done on multi-modal surrogates, where more than one modality are employed for consuming the surrogates, for example, the audio and visual modalities. This research examined the effectiveness of a number of multi-modal surrogates, and investigated whether synchronization between the audio and visual channels is optimal. A user study was conducted to evaluate six different surrogates on a set of six recognition and inference tasks to answer two main research questions: (1) How do automatically-generated multi-modal surrogates compare to manually-generated ones in video retrieval and video sense-making? and (2) Does synchronization between multiple surrogate channels enhance or inhibit video retrieval and video sense-making? Forty-eight participants participated in the study, in which the surrogates were measured on the the time participants spent on experiencing the surrogates, the time participants spent on doing the tasks, participants' performance accuracy on the tasks, participants' confidence in their task responses, and participants' subjective ratings on the surrogates. On average, the uncoordinated surrogates were more helpful than the coordinated ones, but the manually-generated surrogates were only more helpful than the automatically-generated ones in terms of task completion time. Participants' subjective ratings were more favorable for the coordinated surrogate C2 (Magic A + V) and the uncoordinated surrogate U1 (Magic A + Storyboard V) with respect to usefulness, usability, enjoyment, and engagement. The post-session questionnaire comments demonstrated participants' preference for the coordinated surrogates, but the comments also revealed the value of having uncoordinated sensory channels

Carolina Digital Repository