This study presents the development of a speech-to-text(STT) keyword spotting (KWS) system for the Papiamento language designed to be used in healthcare settings in Aruba. Although widely spoken across the former
Leeward Netherlands Antilles, Papiamento remains absent from most mainstream voice recognition technologies, such as Google Assistant, Siri, and Alexa. This gap reflects a broader issue in AI technologies where low-resource languages, such as Papiamento, face barriers to digital inclusion due to limited data availability, lack of localization tools, and minimal investment in tailored solutions.
This research adopts a machine learning-based approach inspired by the Speech Commands dataset developed by Warden, 2018. Collaborating with medical professionals from the Instituto Medico San Nicolas Hospital (ImSan) provided essential Papiamento healthcare keywords that were eventually recorded by local participants using a custom-built web-based recording tool, resulting in a dataset containing 16800 samples. Then, a convolutional neural network (CNN) was trained to classify these keywords accurately and was later converted to a TensorFlow Lite (TFLite) model for deployment on a Raspberry Pi smart speaker prototype. This implementation applies core software engineering practices, stakeholder interviews for requirements elicitation, iterative refinement of system goals, and use-case modeling based on real-world Aruban healthcare scenarios, to ensure both technical robustness and practical relevance.
This study also contributes to technology and engineering by demonstrating a deployable, edge-optimized speech system for Papiamento. It offers a blueprint for similar efforts in other underrepresented communities. The results of this study indicate that the model achieved an accuracy of 96.7%, suggesting that it is feasible to implement Papiamento-compatible STT systems in real-world healthcare settings. However, this study also acknowledges limitations, such as data set size, pronunciation variability, and audio quality issues that need to be addressed in future studies. Although this topic is still in development, this research lays the foundation for AI-driven language inclusivity. It provides a promising starting point for future studies to expand STT support for underrepresented languages
Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.