Search CORE

765 research outputs found

Program latihan industri di Kolej Universiti Teknologi Tun Hussein Onn : kajian terhadap perlaksanaan sistem penilaian

Author: Mukhtar Affidah Mardziah
Publication venue
Publication date: 01/09/2004
Field of study

Kajian yang dijalankan adalah bertajuk "Program Lalilian lndustri Di Kolej Universiti Teknologi Tun Hussein Onn : Kajian Terhadap Perlaksanaan Sistem Penilaian". Sampel terdin daripada 6 orang pakar serta 63 orang pelajar yang terlibat dalam latihan industri. Maklumat yang diperolehi berdasarkan kaedah kualitatif dan kuantitatif Data dianalisis untuk meninjau kaedah penilaian yang dijalankan dan seterusnya memastikan apakali sistem penilaian yang perlu diperbaiki. Secara keseluruhannya, kebanyakan responden berpendapat bahawa sistem penilaian yang sedia ada adalah perlu diperbaki dan disistematikkan selaras dengan ISO 9000 : 2001. Berdasarkan daripada keputusan yang diperolehi dan bimbingnan pakar dari Unit Latihan lndustri KUiTTHO, maka satu "Buku Panduan Penilaian Latihan lndustri" dihasilkan dengan panduan yang ringkas dan lampiran borang-borang yang telah diperbaiki dan diubahsuai. Diharapkan produk mi dapat digunakan untuk masa-masa akan datang

Models and analysis of vocal emissions for biomedical applications

Author
Publication venue: 'Firenze University Press'
Publication date: 31/05/2022
Field of study

This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies

SnakeGAN: A Universal Vocoder Leveraging DDSP Prior Knowledge and Periodic Inductive Bias

Author: Bian Yanyao
Li Sipan
Li Xiang
Liu Songxiang
Meng Helen
Weng Chao
Wu Zhiyong
Zhang Luwen
Publication venue
Publication date: 14/09/2023
Field of study

Generative adversarial network (GAN)-based neural vocoders have been widely used in audio synthesis tasks due to their high generation quality, efficient inference, and small computation footprint. However, it is still challenging to train a universal vocoder which can generalize well to out-of-domain (OOD) scenarios, such as unseen speaking styles, non-speech vocalization, singing, and musical pieces. In this work, we propose SnakeGAN, a GAN-based universal vocoder, which can synthesize high-fidelity audio in various OOD scenarios. SnakeGAN takes a coarse-grained signal generated by a differentiable digital signal processing (DDSP) model as prior knowledge, aiming at recovering high-fidelity waveform from a Mel-spectrogram. We introduce periodic nonlinearities through the Snake activation function and anti-aliased representation into the generator, which further brings the desired inductive bias for audio synthesis and significantly improves the extrapolation capacity for universal vocoding in unseen scenarios. To validate the effectiveness of our proposed method, we train SnakeGAN with only speech data and evaluate its performance for various OOD distributions with both subjective and objective metrics. Experimental results show that SnakeGAN significantly outperforms the compared approaches and can generate high-fidelity audio samples including unseen speakers with unseen styles, singing voices, instrumental pieces, and nonverbal vocalization.Comment: Accepted by ICME 202

arXiv.org e-Print Archive

Pushing the boundaries of photoconductive sampling in solids

Author: Altwaijry Najd Abdulaziz S.
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 05/06/2023
Field of study

The advent of laser-based optical tools featuring few-cycle pulses with durations of less than a hundred femtoseconds in the late 1980s enabled scientists to initiate and observe the evolution of chemical reactions. This powerful approach combined the interactions of light and matter and unleashed an unprecedented metrology concept that tracks the interactions of atoms and molecules in their natural timescales. Electron wavepacket dynamics take place in the attosecond range, a thousand times faster than molecules. In optical terms, such durations typically last less than the half-cycle duration of optical fields. Consequently, the investigation of such electronic processes necessitates measurement techniques capable of resolving the oscillations of the electric field of light. The primary objective of this thesis is to develop and advance novel field characterisation techniques based on photoconductive sampling. The first portion of this thesis addresses broadband field characterisation based on nonlinear photoconductive sampling. A theoretical analysis of current formation and localisation in solids is presented, prompting the fabrication of a heterostructured sample with the aim of enhancing the magnitude of the signal obtained from the measurement technique. A thorough proof-of-principle experiment is performed, whereby a significant enhancement in signal magnitude is established. As a consequence of signal improvement, the heterostructured sample reaches the desired stability regime earlier than its traditional bulk counterparts. Moreover, the performance of the heterostructured sample for field characterisation is compared to fused silica and benchmarked against the well-established technique of electro-optic sampling. These results pave the way towards field sampling in low pulse energy systems. The following section details broadband field characterisation based on linear photoconductive sampling by employing tailored pulses from a waveform synthe- siser. Visible-ultraviolet pulses are utilised to inject carriers in a common semi- conductive material (gallium phosphide), enabling the complete characterisation of a mid-infrared test field. Furthermore, the technique is validated against electro-optic sampling. When compared to electro-optic sampling, the response function of linear photoconductive sampling is concerned with the intensity envelope of the gating field, relaxing the strict requisites on the temporal phase of the gate. The demonstrated results represent a significant achievement in extending field sampling techniques beyond 100 THz and towards the visible range. Finally, a machine learning-based algorithm for denoising waveforms obtained from a laboratory setting is developed and implemented. The algorithm is based on a one-dimensional convolutional neural network, ideal for processing data presented on an evenly spaced grid. The model is compared with well-established methodologies, namely denoising via the fast Fourier transform and wavelet analysis and exhibits excellent performance, extending the repertoire of tools typically used for combating noise. The field characterisation methodologies presented in this thesis pave the way towards accessible and cost-effective field sampling techniques, enabling researchers to study field-induced electron dynamics in matter and usher in ultrafast optoelectronic signal processing towards the PHz range. In general, the field characterisation techniques presented occupy a small footprint, and the measurements take place in ambient air conditions, facilitating their integration in existing experimental infrastructures. With the aid of AI-accelerator chips, the machine learning tool developed in this thesis can be implemented during laboratory measurements as a concurrent denoising technique

Digitale Hochschulschriften der LMU

DeepVOX: Discovering Features from Raw Audio for Speaker Recognition in Degraded Audio Signals

Author: Chowdhury Anurag
Ross Arun
Publication venue
Publication date: 26/08/2020
Field of study

Automatic speaker recognition algorithms typically use pre-defined filterbanks, such as Mel-Frequency and Gammatone filterbanks, for characterizing speech audio. The design of these filterbanks is based on domain-knowledge and limited empirical observations. The resultant features, therefore, may not generalize well to different types of audio degradation. In this work, we propose a deep learning-based technique to induce the filterbank design from vast amounts of speech audio. The purpose of such a filterbank is to extract features robust to degradations in the input audio. To this effect, a 1D convolutional neural network is designed to learn a time-domain filterbank called DeepVOX directly from raw speech audio. Secondly, an adaptive triplet mining technique is developed to efficiently mine the data samples best suited to train the filterbank. Thirdly, a detailed ablation study of the DeepVOX filterbanks reveals the presence of both vocal source and vocal tract characteristics in the extracted features. Experimental results on VOXCeleb2, NIST SRE 2008 and 2010, and Fisher speech datasets demonstrate the efficacy of the DeepVOX features across a variety of audio degradations, multi-lingual speech data, and varying-duration speech audio. The DeepVOX features also improve the performance of existing speaker recognition algorithms, such as the xVector-PLDA and the iVector-PLDA

arXiv.org e-Print Archive