4 research outputs found

    Arabic Phoneme Learning Challenges for Madurese Students and the Solutions

    Get PDF
    This article discussed the challenges in pronouncing Arabic phonemes by students at INSTIKA Madura. Phoneme pronunciation is the most important principle in Arabic. Without the correct phoneme pronunciation, a language cannot be understood. The problem of phoneme pronunciation was investigated and a solution was found based on factor analysis. Qualitative descriptive research design was used with a case study approach. Data collection methods include interviews with lecturers and students, direct observation of in-class learning and documentation of the results of lecturer notes. The data analysis model adhered the interactive model of Miles, Huberman, and Saldana. Validity was ensured through passion, observation, triangulation, and expert views. The finding of the research showed problems with Arabic phonemes, which were categorized as Akhtha’ al-Harakat, Akhtha’ al-Ibdal, Akhtha’ al-Hadzf, and Akhtha’ al-Tahrif. Factors include language problems (characteristics of the first and second languages) and non-language problems (student characteristics, lecturer competence, learning strategies, lesson materials, and learning facilities). The proposed solutions included error and comparative analysis for language problems, motivation, diagnosis, cooperative learning, detailed examples, pronunciation exercises, and adequate facilities for non-language problems. This research provided a comprehensive study of the challenges of pronouncing Arabic phonemes at INSTIKA Madura. Certain types of errors and the underlying factors that affect pronunciation were identified. Practical solutions were proposed, addressing both language and non-language aspects to improve students' pronunciation skills. These findings offered valuable insights for educators, curriculum developers and language instructors, facilitating targeted interventions and effective teaching strategies to students struggling with Arabic phonetics

    Real-time object detection method based on improved YOLOv4-tiny

    Full text link
    The "You only look once v4"(YOLOv4) is one type of object detection methods in deep learning. YOLOv4-tiny is proposed based on YOLOv4 to simple the network structure and reduce parameters, which makes it be suitable for developing on the mobile and embedded devices. To improve the real-time of object detection, a fast object detection method is proposed based on YOLOv4-tiny. It firstly uses two ResBlock-D modules in ResNet-D network instead of two CSPBlock modules in Yolov4-tiny, which reduces the computation complexity. Secondly, it designs an auxiliary residual network block to extract more feature information of object to reduce detection error. In the design of auxiliary network, two consecutive 3x3 convolutions are used to obtain 5x5 receptive fields to extract global features, and channel attention and spatial attention are also used to extract more effective information. In the end, it merges the auxiliary network and backbone network to construct the whole network structure of improved YOLOv4-tiny. Simulation results show that the proposed method has faster object detection than YOLOv4-tiny and YOLOv3-tiny, and almost the same mean value of average precision as the YOLOv4-tiny. It is more suitable for real-time object detection.Comment: 14pages,7figures,2table

    Deep Learning for Audio Segmentation and Intelligent Remixing

    Get PDF
    Audio segmentation divides an audio signal into homogenous sections such as music and speech. It is useful as a preprocessing step to index, store, and modify audio recordings, radio broadcasts and TV programmes. Machine learning models for audio segmentation are generally trained on copyrighted material, which cannot be shared across research groups. Furthermore, annotating these datasets is a time-consuming and expensive task. In this thesis, we present a novel approach that artificially synthesises data that resembles radio signals. We replicate the workflow of a radio DJ in mixing audio and investigate parameters like fade curves and audio ducking. Using this approach, we obtained state-of-the-art performance for music-speech detection on in-house and public datasets. After demonstrating the efficacy of training set synthesis, we investigate how audio ducking of background music impacts the precision and recall of the machine learning algorithm. Interestingly, we observed that the minimum level of audio ducking preferred by the machine learning algorithm was similar to that of human listeners. Furthermore, we observe that our proposed synthesis technique outperforms real-world data in some cases and serves as a promising alternative. This project also proposes a novel deep learning system called You Only Hear Once (YOHO), which is inspired by the YOLO algorithm popularly adopted in Computer Vision. We convert the detection of acoustic boundaries into a regression problem instead of frame-based classification. The relative improvement for F-measure of YOHO, compared to the state-of-the-art Convolutional Recurrent Neural Network, ranged from 1% to 6% across multiple datasets. As YOHO predicts acoustic boundaries directly, the speed of inference and post-processing steps are 6 times faster than frame-based classification. Furthermore, we investigate domain generalisation methods such as transfer learning and adversarial training. We demonstrated that these methods helped our algorithm perform better in unseen domains. In addition to audio segmentation, another objective of this project is to explore real-time radio remixing. This is a step towards building a customised radio and consequently, integrating it with the schedule of the listener. The system would remix music from the user’s personal playlist and play snippets of diary reminders at appropriate transition points. The intelligent remixing is governed by the underlying audio segmentation and other deep learning methods. We also explore how individuals can communicate with intelligent mixing systems through non-technical language. We demonstrated that word embeddings help in understanding representations of semantic descriptors
    corecore