226 research outputs found

    Word And Speaker Recognition System

    Get PDF
    In this report, a system which combines user dependent Word Recognition and text dependent speaker recognition is described. Word recognition is the process of converting an audio signal, captured by a microphone, to a word. Speaker Identification is the ability to recognize a person identity base on the specific word he/she uttered. A person's voice contains various parameters that convey information such as gender, emotion, health, attitude and identity. Speaker recognition identifies who is the speaker based on the unique voiceprint from the speech data. Voice Activity Detection (VAD), Spectral Subtraction (SS), Mel-Frequency Cepstrum Coefficient (MFCC), Vector Quantization (VQ), Dynamic Time Warping (DTW) and k-Nearest Neighbour (k-NN) are methods used in word recognition part of the project to implement using MATLAB software. For Speaker Recognition part, Vector Quantization (VQ) is used. The recognition rate for word and speaker recognition system that was successfully implemented is 84.44% for word recognition while for speaker recognition is 54.44%

    Word And Speaker Recognition System

    Get PDF
    In this report, a system which combines user dependent Word Recognition and text dependent speaker recognition is described. Word recognition is the process of converting an audio signal, captured by a microphone, to a word. Speaker Identification is the ability to recognize a person identity base on the specific word he/she uttered. A person's voice contains various parameters that convey information such as gender, emotion, health, attitude and identity. Speaker recognition identifies who is the speaker based on the unique voiceprint from the speech data. Voice Activity Detection (VAD), Spectral Subtraction (SS), Mel-Frequency Cepstrum Coefficient (MFCC), Vector Quantization (VQ), Dynamic Time Warping (DTW) and k-Nearest Neighbour (k-NN) are methods used in word recognition part of the project to implement using MATLAB software. For Speaker Recognition part, Vector Quantization (VQ) is used. The recognition rate for word and speaker recognition system that was successfully implemented is 84.44% for word recognition while for speaker recognition is 54.44%

    Otimização de soluções de fotónica integrada para sistemas óticos de nova geração

    Get PDF
    Next generation optical systems can highly benefit from optimized photonic integrated solutions. Photonic integrated circuits (PIC) appear as a promising technology under the current demand for flexibility/reconfigurability in optical systems and telecommunications networks. PIC-based optical systems offer an efficient and cost-effective solution to data transmission increasing claims. In order to contribute to the development of integrated photonic technology, optimized PIC solutions addressing different steps of the PIC development chain, mainly design, testing, and packaging processes, are investigated. Optical signal data compression techniques are progressing to sustain the fast processing/storing of large amounts of bandwidth demanding data, with the advantage of resorting to photonic integrated solutions for the implementation of optical transforms, e.g., Haar transform (HT). This demand motivated the research of an optimized PIC design solution in silicon nitride (Si3N4) based platform comprising a two-level HT network for compression, and a switching network as a framework that supplies all logical inputs of the HT network for testing/characterization purposes. Optimized design models for the multimode interference key building block structure of the PIC design solution, are proposed. Additionally, a first test and characterization of PIC solutions implementing the HT for compression applications in indium phosphide (InP) based platform and in a new organic-inorganic hybrid material were realized. Taking advantage of a tunable lattice filter dispersion compensator in Si3N4-based integrated platform, it was demonstrated a real-time extended reach PAM-4 transmission over 40 km enabled by the photonic integrated dispersion compensator, with application in data center interconnects. Under photonic integrated high-Q resonators need for accurate performance measurement, a technique based on RF calibrated Mach-Zehnder interferometer, and Brillouin gain measurements through Lorentzian fitting analysis were successfully attained. Finally, as technical and functional requirements of PIC demand a thorough characterization/testing to provide an accurate prediction of its performance, and current testing platforms can be expensive and have low flexibility, a proof of concept of a new soft-packaging flexible platform for photonic integrated processors and spatial division multiplexing systems, based in spatial light modulation operation principle is proposed.Os sistemas óticos de nova geração beneficiam com a otimização de fotónica integrada. Com os circuitos de fotónica integrada (PIC) avançados a surgir como uma tecnologia promissora, dentro da crescente procura por flexibilidade/ reconfigurabilidade dos sistemas óticos e redes de telecomunicações. Os sistemas óticos baseados em PIC oferecem soluções eficientes e rentáveis em resposta às necessidades crescentes de transmissão de dados. De modo a contribuir para o desenvolvimento tecnológico associado à fotónica integrada, são investigados no âmbito desta dissertação diferentes soluções otimizadas de PIC, abordando diferentes estágios do seu desenvolvimento, nomeadamente projeto/design, teste e encapsulamento. Técnicas de compressão de sinais óticos estão a progredir no sentido de apoiar a expansão de velocidade de processamento e quantidade de armazenamento com elevada largura de banda associada. São esperadas vantagens recorrendo a PIC para a implementação de transformadas óticas, e.g., transformada de Haar (HT). Esta necessidade motivou a investigação de soluções de PIC com design otimizado, desenvolvidas em plataforma integrada de nitreto de silício (Si3N4). O PIC desenhado é constituído por uma rede 2D a executar a HT para fins de compressão e uma rede de comutação para produzir todas as entradas lógicas esperadas para teste e caracterização. São propostos modelos de design otimizados para a estrutura elementar que compõe o PIC, i.e., componente de interferência multimodal. Adicionalmente, foi realizado o primeiro teste e caracterização experimental de um PIC implementando a HT para fins de compressão, numa plataforma integrada de fosfato de índio (InP) e num material orgânico-inorgânico híbrido. Tirando partido de um filtro sintonizável para compensação de dispersão, desenvolvido em plataforma integrada de Si3N4, foi demostrado um link de transmissão alargada (40 km) em modulação PAM-4, com possível aplicação em centros de processamento de dados de interconexão. A necessidade de medições precisas de desempenho para a caracterização efetiva de soluções integradas de ressoadores de elevado fator de qualidade, motivou a implementação de uma técnica de medição eficaz. Esta é baseada num interferómetro de Mach-Zehnder calibrado em rádio frequência e na realização de mediações de ganho de Brillouin por análise Lorentziana de ajuste de curva. Por fim, tendo em conta os rigorosos requisitos técnicos e funcionais associados ao teste/caracterização precisa de PIC e o facto de as atuais soluções serem dispendiosas e pouco flexíveis. Uma prova de conceito de uma nova plataforma flexível de encapsulamento por software é proposta com aplicação em processadores PIC e sistemas com multiplexagem por divisão espacial.Programa Doutoral em Telecomunicaçõe

    A Speech recognition-based telephone auto-attendant

    Get PDF
    This dissertation details the implementation of a real-time, speaker-independent telephone auto attendant from first principles on limited quality speech data. An auto attendant is a computerized agent that answers the phone and switches the caller through to the desired person's extension after conducting a limited dialogue to determine the wishes of the caller, through the use of speech recognition technology. The platform is a computer with a telephone interface card. The speech recognition engine uses whole word hidden Markov modelling, with limited vocabulary and constrained (finite state) grammar. The feature set used is based on Mel frequency spaced cepstral coefficients. The Viterbi search is used together with the level building algorithm to recognise speech within the utterances. Word-spotting techniques including a "garbage" model, are used. Various techniques compensating for noise and a varying channel transfer function are employed to improve the recognition rate. An Afrikaans conversational interface prompts the caller for information. Detailed experiments illustrate the dependence and sensitivity of the system on its parameters, and show the influence of several techniques aimed at improving the recognition rate.Dissertation (MEng (Computer Engineering))--University of Pretoria, 2006.Electrical, Electronic and Computer Engineeringunrestricte

    Fusion of Audio and Visual Information for Implementing Improved Speech Recognition System

    Get PDF
    Speech recognition is a very useful technology because of its potential to develop applications, which are suitable for various needs of users. This research is an attempt to enhance the performance of a speech recognition system by combining the visual features (lip movement) with audio features. The results were calculated using utterances of numerals collected from participants inclusive of both male and female genders. Discrete Cosine Transform (DCT) coefficients were used for computing visual features and Mel Frequency Cepstral Coefficients (MFCC) were used for computing audio features. The classification was then carried out using Support Vector Machine (SVM). The results obtained from the combined/fused system were compared with the recognition rates of two standalone systems (Audio only and visual only)

    Advanced statistical methods for detecting the Epoch of Reionisation

    Get PDF
    This thesis investigates the widefield foreground contamination on the expected 21cm power spectrum, by modelling these sources with multi-component 2D Gaussians. Additionally, we calculate the skew spectrum for a set of seven cosmological 21cm simulations, characterising how the higher order statistic changes with respect to the X-ray heating and ionisation topologies. We then develop an analytical framework for estimating the interferometric skew spectrum, applying this to realistic foreground models, and to a fiducial 21cm simulation

    Advances in image acquisition and filtering for MRI neuroimaging at 7 tesla

    Get PDF
    Performing magnetic resonance imaging at high magnetic field strength promises many improvements over low fields that are of direct benefit in functional neuroimaging. This includes the possibility of improved signal-to-noise levels, and increased BOLD functional contrast and spatial specificity. However, human MRI at 7T and above suffers from unique engineering challenges that limit the achievable gains. In this thesis, three technological developments are introduced, all of which address separate issues associated with functional magnetic resonance neuroimaging at very high magnetic field strengths. First, the image homogeneity problem is addressed by investigating methods of RF shimming — modifying the excitation portion of the MRI experiment for use with multi-channel RF coils. It is demonstrated that in 2D MRI experiments, shimming on a slice-by slice basis allows utilization of an extra degree of freedom available from the slice dimension, resulting in significant gains in image homogeneity and reduced RF power requirements. After acceptable images are available, we move to address complications of high field imaging that manifest in the fMRI time series. In the second paper, the increased physiological noise present in BOLD time series at high field is addressed with a unique data-driven noise regressor scheme based upon information in the phase component of the MRI signal. It is demonstrated that this method identifies and removes a significant portion of physiological signals, and performs as good or better than other popular data driven methods that use only the magnitude signal information. Lastly, the BOLD phase signal is again leveraged to address the confounding role of veins in resting state BOLD fMRI experiments. The phase regressor technique (previously developed by Dr. Menon) is modified and applied to resting state fMRI to remove macro vascular contributions in the datasets, leading to changes in spatial extent and connectivity of common resting state networks on single subjects and at the group level

    Speech recognition on DSP: algorithm optimization and performance analysis.

    Get PDF
    Yuan Meng.Thesis (M.Phil.)--Chinese University of Hong Kong, 2004.Includes bibliographical references (leaves 85-91).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- History of ASR development --- p.2Chapter 1.2 --- Fundamentals of automatic speech recognition --- p.3Chapter 1.2.1 --- Classification of ASR systems --- p.3Chapter 1.2.2 --- Automatic speech recognition process --- p.4Chapter 1.3 --- Performance measurements of ASR --- p.7Chapter 1.3.1 --- Recognition accuracy --- p.7Chapter 1.3.2 --- Complexity --- p.7Chapter 1.3.3 --- Robustness --- p.8Chapter 1.4 --- Motivation and goal of this work --- p.8Chapter 1.5 --- Thesis outline --- p.10Chapter 2 --- Signal processing techniques for front-end --- p.12Chapter 2.1 --- Basic feature extraction principles --- p.13Chapter 2.1.1 --- Pre-emphasis --- p.13Chapter 2.1.2 --- Frame blocking and windowing --- p.13Chapter 2.1.3 --- Discrete Fourier Transform (DFT) computation --- p.15Chapter 2.1.4 --- Spectral magnitudes --- p.15Chapter 2.1.5 --- Mel-frequency filterbank --- p.16Chapter 2.1.6 --- Logarithm of filter energies --- p.18Chapter 2.1.7 --- Discrete Cosine Transformation (DCT) --- p.18Chapter 2.1.8 --- Cepstral Weighting --- p.19Chapter 2.1.9 --- Dynamic featuring --- p.19Chapter 2.2 --- Practical issues --- p.20Chapter 2.2.1 --- Review of practical problems and solutions in ASR appli- cations --- p.20Chapter 2.2.2 --- Model of environment --- p.23Chapter 2.2.3 --- End-point detection (EPD) --- p.23Chapter 2.2.4 --- Spectral subtraction (SS) --- p.25Chapter 3 --- HMM-based Acoustic Modeling --- p.26Chapter 3.1 --- HMMs for ASR --- p.26Chapter 3.2 --- Output probabilities --- p.27Chapter 3.3 --- Viterbi search engine --- p.29Chapter 3.4 --- Isolated word recognition (IWR) & Connected word recognition (CWR) --- p.30Chapter 3.4.1 --- Isolated word recognition --- p.30Chapter 3.4.2 --- Connected word recognition (CWR) --- p.31Chapter 4 --- DSP for embedded applications --- p.32Chapter 4.1 --- "Classification of embedded systems (DSP, ASIC, FPGA, etc.)" --- p.32Chapter 4.2 --- Description of hardware platform --- p.34Chapter 4.3 --- I/O operation for real-time processing --- p.36Chapter 4.4 --- Fixed point algorithm on DSP --- p.40Chapter 5 --- ASR algorithm optimization --- p.42Chapter 5.1 --- Methodology --- p.42Chapter 5.2 --- Floating-point to fixed-point conversion --- p.43Chapter 5.3 --- Computational complexity consideration --- p.45Chapter 5.3.1 --- Feature extraction techniques --- p.45Chapter 5.3.2 --- Viterbi search module --- p.50Chapter 5.4 --- Memory requirements consideration --- p.51Chapter 6 --- Experimental results and performance analysis --- p.53Chapter 6.1 --- Cantonese isolated word recognition (IWR) --- p.54Chapter 6.1.1 --- Execution time --- p.54Chapter 6.1.2 --- Memory requirements --- p.57Chapter 6.1.3 --- Recognition performance --- p.57Chapter 6.2 --- Connected word recognition (CWR) --- p.61Chapter 6.2.1 --- Execution time consideration --- p.62Chapter 6.2.2 --- Recognition performance --- p.62Chapter 6.3 --- Summary & discussion --- p.66Chapter 7 --- Implementation of practical techniques --- p.67Chapter 7.1 --- End-point detection (EPD) --- p.67Chapter 7.2 --- Spectral subtraction (SS) --- p.71Chapter 7.3 --- Experimental results --- p.72Chapter 7.3.1 --- Isolated word recognition (IWR) --- p.72Chapter 7.3.2 --- Connected word recognition (CWR) --- p.75Chapter 7.4 --- Results --- p.77Chapter 8 --- Conclusions and future work --- p.78Chapter 8.1 --- Summary and Conclusions --- p.78Chapter 8.2 --- Suggestions for future research --- p.80Appendices --- p.82Chapter A --- "Interpolation of data entries without floating point, divides or conditional branches" --- p.82Chapter B --- Vocabulary for Cantonese isolated word recognition task --- p.84Bibliography --- p.8

    The SARAO MeerKAT 1.3 GHz Galactic Plane Survey

    Get PDF
    We present the SARAO MeerKAT Galactic Plane Survey (SMGPS), a 1.3 GHz continuum survey of almost half of the Galactic Plane (251\deg l\le l \le 358\deg and 2\deg l\le l \le 61\deg at b1.5deg|b| \le 1.5\deg ). SMGPS is the largest, most sensitive and highest angular resolution 1 GHz survey of the Plane yet carried out, with an angular resolution of 8" and a broadband RMS sensitivity of \sim10--20 μ\mu Jy/beam. Here we describe the first publicly available data release from SMGPS which comprises data cubes of frequency-resolved images over 908--1656 MHz, power law fits to the images, and broadband zeroth moment integrated intensity images. A thorough assessment of the data quality and guidance for future usage of the data products are given. Finally, we discuss the tremendous potential of SMGPS by showcasing highlights of the Galactic and extragalactic science that it permits. These highlights include the discovery of a new population of non-thermal radio filaments; identification of new candidate supernova remnants, pulsar wind nebulae and planetary nebulae; improved radio/mid-IR classification of rare Luminous Blue Variables and discovery of associated extended radio nebulae; new radio stars identified by Bayesian cross-matching techniques; the realisation that many of the largest radio-quiet WISE HII region candidates are not true HII regions; and a large sample of previously undiscovered background HI galaxies in the Zone of Avoidance
    corecore