1,059 research outputs found
CNN Based Touch Interaction Detection for Infant Speech Development
In this paper, we investigate the detection of interaction in videos between two people, namely, a caregiver and an infant. We are interested in a particular type of human interaction known as touch, as touch is a key social and emotional signal used by caregivers when interacting with their children. We propose an automatic touch event recognition method to determine the potential time interval when the caregiver touches the infant. In addition to label the touch events, we also classify them into six touch types based on which body part of infant has been touched. CNN based human pose estimation and person segmentation are used to analyze the spatial relationship between the caregivers hands and the infants. We demonstrate promising results for touch detection and show great potential of reducing human effort in manually generating precise touch annotations
Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition
This paper presents a self-supervised method for visual detection of the
active speaker in a multi-person spoken interaction scenario. Active speaker
detection is a fundamental prerequisite for any artificial cognitive system
attempting to acquire language in social settings. The proposed method is
intended to complement the acoustic detection of the active speaker, thus
improving the system robustness in noisy conditions. The method can detect an
arbitrary number of possibly overlapping active speakers based exclusively on
visual information about their face. Furthermore, the method does not rely on
external annotations, thus complying with cognitive development. Instead, the
method uses information from the auditory modality to support learning in the
visual domain. This paper reports an extensive evaluation of the proposed
method using a large multi-person face-to-face interaction dataset. The results
show good performance in a speaker dependent setting. However, in a speaker
independent setting the proposed method yields a significantly lower
performance. We believe that the proposed method represents an essential
component of any artificial cognitive system or robotic platform engaging in
social interactions.Comment: 10 pages, IEEE Transactions on Cognitive and Developmental System
An end-to-end review of gaze estimation and its interactive applications on handheld mobile devices
In recent years we have witnessed an increasing number of interactive systems on handheld mobile devices which utilise gaze as a single or complementary interaction modality. This trend is driven by the enhanced computational power of these devices, higher resolution and capacity of their cameras, and improved gaze estimation accuracy obtained from advanced machine learning techniques, especially in deep learning. As the literature is fast progressing, there is a pressing need to review the state of the art, delineate the boundary, and identify the key research challenges and opportunities in gaze estimation and interaction. This paper aims to serve this purpose by presenting an end-to-end holistic view in this area, from gaze capturing sensors, to gaze estimation workflows, to deep learning techniques, and to gaze interactive applications.PostprintPeer reviewe
Multimodal Based Audio-Visual Speech Recognition for Hard-of-Hearing: State of the Art Techniques and Challenges
Multimodal Integration (MI) is the study of merging the knowledge acquired by the nervous system using sensory modalities such as speech, vision, touch, and gesture. The applications of MI expand over the areas of Audio-Visual Speech Recognition (AVSR), Sign Language Recognition (SLR), Emotion Recognition (ER), Bio Metrics Applications (BMA), Affect Recognition (AR), Multimedia Retrieval (MR), etc. The fusion of modalities such as hand gestures- facial, lip- hand position, etc., are mainly used sensory modalities for the development of hearing-impaired multimodal systems. This paper encapsulates an overview of multimodal systems available within literature towards hearing impaired studies. This paper also discusses some of the studies related to hearing-impaired acoustic analysis. It is observed that very less algorithms have been developed for hearing impaired AVSR as compared to normal hearing. Thus, the study of audio-visual based speech recognition systems for the hearing impaired is highly demanded for the people who are trying to communicate with natively speaking languages. This paper also highlights the state-of-the-art techniques in AVSR and the challenges faced by the researchers for the development of AVSR systems
ChimpACT: A Longitudinal Dataset for Understanding Chimpanzee Behaviors
Understanding the behavior of non-human primates is crucial for improving
animal welfare, modeling social behavior, and gaining insights into
distinctively human and phylogenetically shared behaviors. However, the lack of
datasets on non-human primate behavior hinders in-depth exploration of primate
social interactions, posing challenges to research on our closest living
relatives. To address these limitations, we present ChimpACT, a comprehensive
dataset for quantifying the longitudinal behavior and social relations of
chimpanzees within a social group. Spanning from 2015 to 2018, ChimpACT
features videos of a group of over 20 chimpanzees residing at the Leipzig Zoo,
Germany, with a particular focus on documenting the developmental trajectory of
one young male, Azibo. ChimpACT is both comprehensive and challenging,
consisting of 163 videos with a cumulative 160,500 frames, each richly
annotated with detection, identification, pose estimation, and fine-grained
spatiotemporal behavior labels. We benchmark representative methods of three
tracks on ChimpACT: (i) tracking and identification, (ii) pose estimation, and
(iii) spatiotemporal action detection of the chimpanzees. Our experiments
reveal that ChimpACT offers ample opportunities for both devising new methods
and adapting existing ones to solve fundamental computer vision tasks applied
to chimpanzee groups, such as detection, pose estimation, and behavior
analysis, ultimately deepening our comprehension of communication and sociality
in non-human primates.Comment: NeurIPS 202
Exploring the Creation and Humanization of Digital Life: Consciousness Simulation and Human-Machine Interaction
Digital life, a form of life generated by computer programs or artificial
intelligence systems, it possesses self-awareness, thinking abilities,
emotions, and subjective consciousness. Achieving it involves complex neural
networks, multi-modal sensory integration [1, 2], feedback mechanisms, and
self-referential processing [3]. Injecting prior knowledge into digital life
structures is a critical step. It guides digital entities' understanding of the
world, decision-making, and interactions. We can customize and personalize
digital life, it includes adjusting intelligence levels, character settings,
personality traits, and behavioral characteristics. Virtual environments
facilitate efficient and controlled development, allowing user interaction,
observation, and active participation in digital life's growth. Researchers
benefit from controlled experiments, driving technological advancements. The
fusion of digital life into the real world offers exciting possibilities for
human-digital entity collaboration and coexistence.Comment: 10 page
A Survey on Deep Learning in Medical Image Analysis
Deep learning algorithms, in particular convolutional networks, have rapidly
become a methodology of choice for analyzing medical images. This paper reviews
the major deep learning concepts pertinent to medical image analysis and
summarizes over 300 contributions to the field, most of which appeared in the
last year. We survey the use of deep learning for image classification, object
detection, segmentation, registration, and other tasks and provide concise
overviews of studies per application area. Open challenges and directions for
future research are discussed.Comment: Revised survey includes expanded discussion section and reworked
introductory section on common deep architectures. Added missed papers from
before Feb 1st 201
- …