24,879 research outputs found

    Rehabilitation Exercise Repetition Segmentation and Counting using Skeletal Body Joints

    Full text link
    Physical exercise is an essential component of rehabilitation programs that improve quality of life and reduce mortality and re-hospitalization rates. In AI-driven virtual rehabilitation programs, patients complete their exercises independently at home, while AI algorithms analyze the exercise data to provide feedback to patients and report their progress to clinicians. To analyze exercise data, the first step is to segment it into consecutive repetitions. There has been a significant amount of research performed on segmenting and counting the repetitive activities of healthy individuals using raw video data, which raises concerns regarding privacy and is computationally intensive. Previous research on patients' rehabilitation exercise segmentation relied on data collected by multiple wearable sensors, which are difficult to use at home by rehabilitation patients. Compared to healthy individuals, segmenting and counting exercise repetitions in patients is more challenging because of the irregular repetition duration and the variation between repetitions. This paper presents a novel approach for segmenting and counting the repetitions of rehabilitation exercises performed by patients, based on their skeletal body joints. Skeletal body joints can be acquired through depth cameras or computer vision techniques applied to RGB videos of patients. Various sequential neural networks are designed to analyze the sequences of skeletal body joints and perform repetition segmentation and counting. Extensive experiments on three publicly available rehabilitation exercise datasets, KIMORE, UI-PRMD, and IntelliRehabDS, demonstrate the superiority of the proposed method compared to previous methods. The proposed method enables accurate exercise analysis while preserving privacy, facilitating the effective delivery of virtual rehabilitation programs.Comment: 8 pages, 1 figure, 2 table

    Security and Privacy Problems in Voice Assistant Applications: A Survey

    Full text link
    Voice assistant applications have become omniscient nowadays. Two models that provide the two most important functions for real-life applications (i.e., Google Home, Amazon Alexa, Siri, etc.) are Automatic Speech Recognition (ASR) models and Speaker Identification (SI) models. According to recent studies, security and privacy threats have also emerged with the rapid development of the Internet of Things (IoT). The security issues researched include attack techniques toward machine learning models and other hardware components widely used in voice assistant applications. The privacy issues include technical-wise information stealing and policy-wise privacy breaches. The voice assistant application takes a steadily growing market share every year, but their privacy and security issues never stopped causing huge economic losses and endangering users' personal sensitive information. Thus, it is important to have a comprehensive survey to outline the categorization of the current research regarding the security and privacy problems of voice assistant applications. This paper concludes and assesses five kinds of security attacks and three types of privacy threats in the papers published in the top-tier conferences of cyber security and voice domain.Comment: 5 figure

    The Metaverse: Survey, Trends, Novel Pipeline Ecosystem & Future Directions

    Full text link
    The Metaverse offers a second world beyond reality, where boundaries are non-existent, and possibilities are endless through engagement and immersive experiences using the virtual reality (VR) technology. Many disciplines can benefit from the advancement of the Metaverse when accurately developed, including the fields of technology, gaming, education, art, and culture. Nevertheless, developing the Metaverse environment to its full potential is an ambiguous task that needs proper guidance and directions. Existing surveys on the Metaverse focus only on a specific aspect and discipline of the Metaverse and lack a holistic view of the entire process. To this end, a more holistic, multi-disciplinary, in-depth, and academic and industry-oriented review is required to provide a thorough study of the Metaverse development pipeline. To address these issues, we present in this survey a novel multi-layered pipeline ecosystem composed of (1) the Metaverse computing, networking, communications and hardware infrastructure, (2) environment digitization, and (3) user interactions. For every layer, we discuss the components that detail the steps of its development. Also, for each of these components, we examine the impact of a set of enabling technologies and empowering domains (e.g., Artificial Intelligence, Security & Privacy, Blockchain, Business, Ethics, and Social) on its advancement. In addition, we explain the importance of these technologies to support decentralization, interoperability, user experiences, interactions, and monetization. Our presented study highlights the existing challenges for each component, followed by research directions and potential solutions. To the best of our knowledge, this survey is the most comprehensive and allows users, scholars, and entrepreneurs to get an in-depth understanding of the Metaverse ecosystem to find their opportunities and potentials for contribution

    Multi-modal Facial Affective Analysis based on Masked Autoencoder

    Full text link
    Human affective behavior analysis focuses on analyzing human expressions or other behaviors to enhance the understanding of human psychology. The CVPR 2023 Competition on Affective Behavior Analysis in-the-wild (ABAW) is dedicated to providing high-quality and large-scale Aff-wild2 for the recognition of commonly used emotion representations, such as Action Units (AU), basic expression categories(EXPR), and Valence-Arousal (VA). The competition is committed to making significant strides in improving the accuracy and practicality of affective analysis research in real-world scenarios. In this paper, we introduce our submission to the CVPR 2023: ABAW5. Our approach involves several key components. First, we utilize the visual information from a Masked Autoencoder(MAE) model that has been pre-trained on a large-scale face image dataset in a self-supervised manner. Next, we finetune the MAE encoder on the image frames from the Aff-wild2 for AU, EXPR and VA tasks, which can be regarded as a static and uni-modal training. Additionally, we leverage the multi-modal and temporal information from the videos and implement a transformer-based framework to fuse the multi-modal features. Our approach achieves impressive results in the ABAW5 competition, with an average F1 score of 55.49\% and 41.21\% in the AU and EXPR tracks, respectively, and an average CCC of 0.6372 in the VA track. Our approach ranks first in the EXPR and AU tracks, and second in the VA track. Extensive quantitative experiments and ablation studies demonstrate the effectiveness of our proposed method

    Robust Multiview Multimodal Driver Monitoring System Using Masked Multi-Head Self-Attention

    Full text link
    Driver Monitoring Systems (DMSs) are crucial for safe hand-over actions in Level-2+ self-driving vehicles. State-of-the-art DMSs leverage multiple sensors mounted at different locations to monitor the driver and the vehicle's interior scene and employ decision-level fusion to integrate these heterogenous data. However, this fusion method may not fully utilize the complementarity of different data sources and may overlook their relative importance. To address these limitations, we propose a novel multiview multimodal driver monitoring system based on feature-level fusion through multi-head self-attention (MHSA). We demonstrate its effectiveness by comparing it against four alternative fusion strategies (Sum, Conv, SE, and AFF). We also present a novel GPU-friendly supervised contrastive learning framework SuMoCo to learn better representations. Furthermore, We fine-grained the test split of the DAD dataset to enable the multi-class recognition of drivers' activities. Experiments on this enhanced database demonstrate that 1) the proposed MHSA-based fusion method (AUC-ROC: 97.0\%) outperforms all baselines and previous approaches, and 2) training MHSA with patch masking can improve its robustness against modality/view collapses. The code and annotations are publicly available.Comment: 9 pages (1 for reference); accepted by the 6th Multimodal Learning and Applications Workshop (MULA) at CVPR 202

    One Small Step for Generative AI, One Giant Leap for AGI: A Complete Survey on ChatGPT in AIGC Era

    Full text link
    OpenAI has recently released GPT-4 (a.k.a. ChatGPT plus), which is demonstrated to be one small step for generative AI (GAI), but one giant leap for artificial general intelligence (AGI). Since its official release in November 2022, ChatGPT has quickly attracted numerous users with extensive media coverage. Such unprecedented attention has also motivated numerous researchers to investigate ChatGPT from various aspects. According to Google scholar, there are more than 500 articles with ChatGPT in their titles or mentioning it in their abstracts. Considering this, a review is urgently needed, and our work fills this gap. Overall, this work is the first to survey ChatGPT with a comprehensive review of its underlying technology, applications, and challenges. Moreover, we present an outlook on how ChatGPT might evolve to realize general-purpose AIGC (a.k.a. AI-generated content), which will be a significant milestone for the development of AGI.Comment: A Survey on ChatGPT and GPT-4, 29 pages. Feedback is appreciated ([email protected]

    In-situ crack and keyhole pore detection in laser directed energy deposition through acoustic signal and deep learning

    Full text link
    Cracks and keyhole pores are detrimental defects in alloys produced by laser directed energy deposition (LDED). Laser-material interaction sound may hold information about underlying complex physical events such as crack propagation and pores formation. However, due to the noisy environment and intricate signal content, acoustic-based monitoring in LDED has received little attention. This paper proposes a novel acoustic-based in-situ defect detection strategy in LDED. The key contribution of this study is to develop an in-situ acoustic signal denoising, feature extraction, and sound classification pipeline that incorporates convolutional neural networks (CNN) for online defect prediction. Microscope images are used to identify locations of the cracks and keyhole pores within a part. The defect locations are spatiotemporally registered with acoustic signal. Various acoustic features corresponding to defect-free regions, cracks, and keyhole pores are extracted and analysed in time-domain, frequency-domain, and time-frequency representations. The CNN model is trained to predict defect occurrences using the Mel-Frequency Cepstral Coefficients (MFCCs) of the lasermaterial interaction sound. The CNN model is compared to various classic machine learning models trained on the denoised acoustic dataset and raw acoustic dataset. The validation results shows that the CNN model trained on the denoised dataset outperforms others with the highest overall accuracy (89%), keyhole pore prediction accuracy (93%), and AUC-ROC score (98%). Furthermore, the trained CNN model can be deployed into an in-house developed software platform for online quality monitoring. The proposed strategy is the first study to use acoustic signals with deep learning for insitu defect detection in LDED process.Comment: 36 Pages, 16 Figures, accepted at journal Additive Manufacturin

    Audio-Visual Automatic Speech Recognition Towards Education for Disabilities

    Get PDF
    Education is a fundamental right that enriches everyone’s life. However, physically challenged people often debar from the general and advanced education system. Audio-Visual Automatic Speech Recognition (AV-ASR) based system is useful to improve the education of physically challenged people by providing hands-free computing. They can communicate to the learning system through AV-ASR. However, it is challenging to trace the lip correctly for visual modality. Thus, this paper addresses the appearance-based visual feature along with the co-occurrence statistical measure for visual speech recognition. Local Binary Pattern-Three Orthogonal Planes (LBP-TOP) and Grey-Level Co-occurrence Matrix (GLCM) is proposed for visual speech information. The experimental results show that the proposed system achieves 76.60 % accuracy for visual speech and 96.00 % accuracy for audio speech recognition

    Taller de subtitulaci√≥n con herramientas profesionales: c√≥mo y por qu√©‚Äā ense√Īar a subtitular con WinCAPS Qu4ntum

    Get PDF
    Este taller pr√°ctico est√° dirigido a profesores de traducci√≥n que ense√Īen alguna modalidad de traducci√≥n audiovisual (en particular, subtitulaci√≥n) o accesibilidad para los medios audiovisuales. El taller combina una introducci√≥n al uso del programa comercial de subtitulado WinCAPS Q4 con ejercicios pr√°cticos para que los asistentes se familiaricen con el programa. Se cubrir√°n las funciones principales del programa, as√≠ como los pasos a seguir para crear una plantilla de subt√≠tulos y posteriormente traducirla o revisarla. Se har√° hincapi√© en cu√°les son los principales obst√°culos que presenta el uso de este programa, as√≠ como lo que debe saber el profesor para resolverlos durante las clases. Desde la interfaz hasta los atajos de teclado principales, este taller pr√°ctico pretende abordar tanto los aspectos m√°s generales como los m√°s espec√≠ficos del programa.Universidad de M√°laga. Campus de Excelencia Internacional Andaluc√≠a Tech

    Application of Multichannel Active Vibration Control in a Multistage Gear Transmission System

    Get PDF
    Gears are the most important parts of rotating machinery and power transmission devices. When gears are engaged in meshing transmission, vibration will occur due to factors such as gear machining errors, meshing rigidity, and meshing impact. The traditional FxLMS algorithm, as a common active vibration algorithm, has been widely studied and applied in gear transmission system active vibration control in recent years. However, it is difficult to achieve good performance in convergence speed and convergence precision at the same time. This paper proposes a variable-step-size multichannel FxLMS algorithm based on the sampling function, which accelerates the convergence speed in the initial stage of iteration, improves the convergence accuracy in the steady-state adaptive stage, and makes the modified algorithm more robust. Simulations verify the effectiveness of the algorithm. An experimental platform for active vibration control of the secondary gear transmission system is built. A piezoelectric actuator is installed on an additional gear shaft to form an active structure and equipped with a signal acquisition system and a control system; the proposed variable-step-size multichannel FxLMS algorithm is experimentally verified. The experimental results show that the proposed multichannel variable-step-size FxLMS algorithm has more accurate convergence accuracy than the traditional FxLMS algorithm, and the convergence accuracy can be increased up to 123%
    • ‚Ķ
    corecore