737 research outputs found

    Analysis of Vision based Techniques for the Translation of Indian Sign Language

    Get PDF
    Sign language acts as a medium of communication among those of the hearing impaired and mute community. However, it cannot be easily understood by common people. Various research has been done to bridge this gap by developing Sign Language Recognition (SLR) methodologies. Studies say that 1 in every 5 deaf people is Indian. In this paper, a thorough review of these methodologies has been done, to compare and contrast various aspects of them. This includes an overview on different preprocessing methods used like segmentation, image morphological processing, cropping, etc, feature extraction techniques like Fourier Descriptors, Image Moments, Eigen values, Mediapipe and others. This study also covered classification models spanning from Distance metrics to Kernel based approaches and feedforward neural networks, along with Deep Learning based methods such as CNNs, LSTMs, GANs, Transformers etc

    On-device Real-time Custom Hand Gesture Recognition

    Full text link
    Most existing hand gesture recognition (HGR) systems are limited to a predefined set of gestures. However, users and developers often want to recognize new, unseen gestures. This is challenging due to the vast diversity of all plausible hand shapes, e.g. it is impossible for developers to include all hand gestures in a predefined list. In this paper, we present a user-friendly framework that lets users easily customize and deploy their own gesture recognition pipeline. Our framework provides a pre-trained single-hand embedding model that can be fine-tuned for custom gesture recognition. Users can perform gestures in front of a webcam to collect a small amount of images per gesture. We also offer a low-code solution to train and deploy the custom gesture recognition model. This makes it easy for users with limited ML expertise to use our framework. We further provide a no-code web front-end for users without any ML expertise. This makes it even easier to build and test the end-to-end pipeline. The resulting custom HGR is then ready to be run on-device for real-time scenarios. This can be done by calling a simple function in our open-sourced model inference API, MediaPipe Tasks. This entire process only takes a few minutes.Comment: 5 pages, 6 figures; Accepted to ICCV Workshop on Computer Vision for Metaverse, Paris, France, 202

    PENGENALAN GESTUR GERAKAN JARI UNTUK MENGONTROL VOLUME DI KOMPUTER MENGGUNAKAN LIBRARY OPENCV DAN MEDIAPIPE

    Get PDF
    Gesture recognition is a part of artificial intelligence in the field of computer vision. With gesture recognition, the computer is able to understand the movements captured on the camera/webcam. The benefits of gesture recognition are many, one of which is what researchers are doing regarding hand-tracking gesture recognition of the human right-hand finger to adjust the volume control on a computer or laptop. Based on this background, this research is intended to apply machine learning developed from the OpenCV and MediaPipe libraries to carry out the process of training and testing finger gestures as gestures to control one of the functions in Windows, one of which is volume control. This process uses the OpenCV Library and MediaPipe because they are capable of multiprocessing with real-time data, so the gesture identification process is faster and more accurate. When the camera/webcam captures the frame of the movement of the human's right-hand finger gesture, an augmentation process is carried out and the provision of keypoint localization landmarks is carried out for each knuckle. In this study, only the fingertip landmarks and index finger landmarks were recognized. Machine learning will perform calculations from the distance between the tip of the thumb and the tip of the forefinger which is used to determine changes in the volume of the sound. From the test results of nine trials with different finger poses, 88.89% was obtained. One of the test results failed to read finger movement gestures, due to the landmark position of the tip of the index finger which was closed with the other fingers.Gesture recognition merupakan salah satu bagian yang ada dalam kecerdasan buatan dibidang computer vision. Dengan gesture recognition tersebut komputer mampu memahami gerakan yang tertangkap pada kamera/webcam. Manfaat dari gesture recognition ini banyak sekali, salah satunya yang sedang peneliti lakukan mengenai handtracking gesture recognition jari tangan kanan manusia untuk mengantur kontrol volume pada komputer atau laptop. Berdasarkan latar belakang tersebut, penelitian ini dimaksudkan untuk menerapkan machine learning yang dikembangkan dari library OpenCV dan MediaPipe untuk melakukan proses pelatihan dan testing gesture jari tangan sebagai isyarat untuk mengontrol salah satu fungsi yang ada di Windows, salah satunya adalah mengontrol volume. Proses ini menggunakan Library OpenCV dan MediaPipe karena mampu melakukan multiprocessing dengan data yang real-time, sehingga proses identifikasi gesture lebih cepat dan akurat. Ketika kamera/webcam menangkap frame pergerakan gesture jari tangan kanan manusia, maka dilakukan proses augmentasi dan pemberian landmark keypoint localization dari setiap ruas jari. Pada penelitian ini, jari yang dikenali hanya landmark ujung jempol dan landmark ujung jari telunjuk. Machine learning akan melakukan perhitungan dari jarak ujung jempol dengan ujung telunjukn yang mana digunakan untuk menentukan perubahan besaran volume suara. Dari hasil pengujian sebanyak sembilan kali uji coba dengan pose-pose jari berbeda diperoleh 88,89%. Satu dari hasil pengujian tidak berhasil membaca gesture gerakan jari, dikarenakan posisi landmark ujung jari telunjuk yang tertutup dengan jari lainnya. &nbsp

    KONTROL KIPAS ANGIN SECARA JARAK JAUH MELALUI PENGENALAN BENTUK GESTUR JARI TANGAN BERBASIS COMPUTER VISION

    Get PDF
    Artificial Intelligence berkembang sangat pesat, visi komputer bagian dari kecerdasan buatan untuk memproses data visual. Disisi lain mikrokontroler juga berkembang dengan pesat guna memudahkan tugas manusia. Pada penelitian ini melakukan pengontrolan kipas angin secara jarak jauh melalui proses deteksi gestur jari tangan menggunakan metode Mediapipe serta modul WeMos berbasis protokol MQTT. Hasil percobaan dari penelitian ini mendapatkan akurasi deteksi gestur tangan mematikan kipas (0, tangan mengepal) pada jarak 100 cm sebesar 100% dengan 28,6 FPS, akurasi deteksi gestur jari tangan untuk menghidupkan kipas angin pada level kecepatan 1 sebesar 94% pada 29,9 FPS, sedangkan akurasi deteksi gestur 2 untuk mengontrol kecepatan putaran kipas angin pada level 2 adalah 99% pada 28,6 FPS, dan hasil deteksi gestur jari tangan untuk mengendalikan kecepatan putaran kipas angin pada level 3 mencapai 99. % akurasi pada 31,4 FPS. Pengujian pada jarak 150 cm menunjukkan akurasi 100% untuk deteksi isyarat 0, 1, 2, dan 3 pada lima orang. Pada jarak 200 cm akurasi deteksi gestur 0 dan 1 sebesar 80%, sedangkan gestur 2 dan 3 mempertahankan akurasi 100%. Di atas 200 cm, gestur 0 dan 1 mencapai akurasi 80%, gestur 2 mencapai akurasi 100%, dan gestur 3 mencapai akurasi 90%

    Towards the extraction of robust sign embeddings for low resource sign language recognition

    Full text link
    Isolated Sign Language Recognition (SLR) has mostly been applied on datasets containing signs executed slowly and clearly by a limited group of signers. In real-world scenarios, however, we are met with challenging visual conditions, coarticulated signing, small datasets, and the need for signer independent models. To tackle this difficult problem, we require a robust feature extractor to process the sign language videos. One could expect human pose estimators to be ideal candidates. However, due to a domain mismatch with their training sets and challenging poses in sign language, they lack robustness on sign language data and image-based models often still outperform keypoint-based models. Furthermore, whereas the common practice of transfer learning with image-based models yields even higher accuracy, keypoint-based models are typically trained from scratch on every SLR dataset. These factors limit their usefulness for SLR. From the existing literature, it is also not clear which, if any, pose estimator performs best for SLR. We compare the three most popular pose estimators for SLR: OpenPose, MMPose and MediaPipe. We show that through keypoint normalization, missing keypoint imputation, and learning a pose embedding, we can obtain significantly better results and enable transfer learning. We show that keypoint-based embeddings contain cross-lingual features: they can transfer between sign languages and achieve competitive performance even when fine-tuning only the classifier layer of an SLR model on a target sign language. We furthermore achieve better performance using fine-tuned transferred embeddings than models trained only on the target sign language. The embeddings can also be learned in a multilingual fashion. The application of these embeddings could prove particularly useful for low resource sign languages in the future

    Design and Build a Prayer Rak’ah Reminder Device for Elderly People with Pose Detection Using MediaPipe Based on Raspberry Pi

    Get PDF
    Establishing the five obligatory prayers is a necessity that Muslims must undertake. Problems often occur in people with memory problems, such as the elderly. Obstacles that often occur include forgetting the rak'ah and difficulty remembering the next pose to be performed. New technologies continue to emerge including digital imagery. Digital imagery can be used to help with these problems by utilizing pose detection using the MediaPipe library. MediaPipe is used to determine body parts visibility and joint angles captured by the webcam to detect performed pose. By detecting the pose, the output is then generated in an LED Matrix display namely the rak'ah and pose. The results of this study showed that the percentage of success in identifying ruku’ is 93.73%, i’tidal is 94.12%, sujud is 92.55%, first tahiyah is 89.17%, final tahiyah is 82%, with the highest percentage of 98.04% in standing pose. The pose detection success percentages based on the distance between the performer and the webcam are from 150cm is 91.88% success percentage, at 200cm success percentage is 92.42%, and at a distance of 250cm is 93.75%, with the highest success percentage at the distance of 250cm. The system average delay for detecting poses is 1.028 seconds

    EgoBlur: Responsible Innovation in Aria

    Full text link
    Project Aria pushes the frontiers of Egocentric AI with large-scale real-world data collection using purposely designed glasses with privacy first approach. To protect the privacy of bystanders being recorded by the glasses, our research protocols are designed to ensure recorded video is processed by an AI anonymization model that removes bystander faces and vehicle license plates. Detected face and license plate regions are processed with a Gaussian blur such that these personal identification information (PII) regions are obscured. This process helps to ensure that anonymized versions of the video is retained for research purposes. In Project Aria, we have developed a state-of-the-art anonymization system EgoBlur. In this paper, we present extensive analysis of EgoBlur on challenging datasets comparing its performance with other state-of-the-art systems from industry and academia including extensive Responsible AI analysis on recently released Casual Conversations V2 dataset

    Pilates Pose Classification Using MediaPipe and Convolutional Neural Networks with Transfer Learning

    Get PDF
    A sedentary lifestyle can lead to heart disease, cancer, and type 2 diabetes. An anaerobic exercise called pilates can address these problems. Although pilates training can provide health benefits, the heavy load of pilates poses may cause severe muscle injury if not done properly. Surveys have found that many teenagers are unaware of the movements in pilates poses. Therefore, a system is needed to help users classify pilates poses accurately. MediaPipe is a system that accurately extracts the real time human body skeleton. Convolutional Neural Network (CNN) with transfer learning is an accurate method for image classification. There have been several studies investigated pilates poses classification. However, there is still no research applies the MediaPipe as a skeleton feature extractor and CNN with a transfer learning to classify pilates poses. In addition, previous research still does not implement the pilates poses classification in real-time. Based on this problem, this study creates a system using MediaPipe as a feature extractor and CNN with transfer learning as a real-time pilates poses classifier. This system runs on a mobile device and gets information from a camera sensor. The results from MediaPipe then be classified by pre-trained CNN architectures with transfer learning: MobileNetV2, Xception, and ResNet50. The best model was obtained by MobileNetV2, which had an f1 score of 98%. Ten people who didn't know much about Pilates also tested the system. They all agreed that the app could accurately identify Pilates poses, make people more interested in Pilates, and help them learn more about Pilates

    Design and development of a telerehabilitation app

    Get PDF
    Background Neuromusculoskeletal injuries are a common condition and after medication and surgery, most patients still suffer from physical deficits and even psychological disorders due to the lack of scientific and effective rehabilitation. However, the high cost of rehabilitation makes it impossible for many patients to receive a complete and scientific post-operative rehabilitation, so self-training at home can provide the opportunity for most patients to receive good treatment. Objective In order to enable patients to rehabilitate themselves at home, a new solution is proposed for telemedicine, where patients can play a serious game of simple rehabilitation with just an ordinary computer with a camera. Apart from the development phase, no further involvement of occupational therapists is required, significantly reducing the cost and complexity of rehabilitation. Methods This study develops a simple serious game for playing the piano based on OpenCV, MediaPipe and Unity. The game is developed with the user in mind, recording game data to facilitate the analysis and processing of user data and recording each user's progress, making the game much more beneficial and applicable. Results This study saves all results on the web, allows simple rehabilitation using only a computer, records patient progress saves observed behavioural information and expressions, receives positive feedback during user use and inferred from the study that this is indeed a viable solution. Conclusion The combined results suggest that serious game as a rehabilitation treatment is a viable solution, not only as an excellent rehabilitation solution for those who are financially disadvantaged, but also as a good complementary training for those who can afford expensive rehabilitation treatment. Overall, Serious Play can prove beneficial to rehabilitation by providing measurable data on each session and quantifying the patient's training

    Using Cost Simulation and Computer Vision to Inform Probabilistic Cost Estimates

    Get PDF
    Cost estimating is a critical task in the construction process. Building cost estimates using historical data from previously performed projects have long been recognized as one of the better methods to generate precise construction bids. However, the cost and productivity data are typically gathered at the summary level for cost-control purposes. The possible ranges of production rates and costs associated with the construction activities lack accuracy and comprehensiveness. In turn, the robustness of cost estimates is minimal. Thus, this study proposes exploring a range of cost and productivity data to better inform potential outcomes of cost estimates by using probabilistic cost simulation and computer vision techniques for activity production rate analysis. Chapter two employed the Monte Carlo Simulation approach to computing a range of cost outcomes to find the optimal construction methods for large-scale concrete construction. The probabilistic cost simulation approach helps the decision-makers better understand the probable cost consequences of different construction methods and to make more informed decisions based on the project characteristics. Chapter three experimented with the computer vision-based skeletal pose estimation model and recurrent neural network to recognize human activities. The activity recognition algorithm was employed to help interpret the construction activities into productivity information for automated labor productivity tracking. Chapter four implemented computer vision-based object detection and object tracking algorithms to automatically track the construction productivity data. The productivity data collected was used to inform the probabilistic cost estimates. The Monte Carlo Simulation was adopted to explore potential cost outcomes and sensitive cost factors in the overall construction project. The study demonstrated how the computer vision techniques and probabilistic cost simulation optimize the reliability of the cost estimates to support construction decision-making. Advisor: Philip Baruth
    • …
    corecore