6,314 research outputs found

    Extensão artificial de largura de banda para sinais de fala em telefonia usando classificação fonética

    Get PDF
    Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Engenharia Elétrica, Florianópolis, 2016.Este trabalho de pesquisa apresenta uma nova estratégia para implementar sistemas de extensão artificial de largura de banda (artificial bandwidth extension- ABWE) para sinais de fala aplicados à rede pública (convencional) de telefonia (public switched telephone network - PSTN). Especificamente, aqui é proposta uma estratégia baseada em classificação fonética visando representar satisfatoriamente segmentos de fala com energia concentrada em altas frequências, superando outros resultados apresentados na literatura. Para tal,técnicas de seleção de atributos aplicadas a sinais de fala de banda limitada são investigadas, aprimorando a classificação em grupos fonéticos abrangentes (broad group phonetic - BGP) com ênfase na discriminação de fonemas pertencentes ao grupo fonético fricativo. Adicionalmente, neste trabalho é discutida a integração do sistema de ABWE proposto em sistemas de reconhecimento automático de fala (automatic speech recognition - ASR) para o português brasileiro aplicados à PSTN. Particularmente, visando o aprimoramento de ASR em PSTN, as etapas de extração de atributos do sinal da fala e a etapa de construção do modelo acústico são desenvolvidas baseadas em sinais sintéticos de banda larga (wideband - WB) estimados a partir do realce de sinais de banda estreita (narrowband - NB) usando ABWE. Os resultados obtidos apresentam realce na qualidade subjetiva dos sinais de fala reconstruídos e ganho no desempenho do ASR, confirmando a eficácia das estratégias propostas neste trabalho de pesquisa.Abstract: This research work presents a new strategy for implementing artificial bandwidth extension (ABWE) systems for speech signals applied to the public switched telephone network (PSTN). Specifically, a strategy based on phonetic classification is proposed here aiming to represent speech segments with concentrated energy at high frequencies, out performing other approaches from the open literature. In this context, feature selection techniques applied to limited band width speech signals are investigated, improving thebroad group phonetic (BGP) classification with an emphasis on discriminationof phonemes belonging to the fricative phonetic group. In addition, the integration of the proposed ABWE approach in automatic speech recognition(ASR) systems for Brazilian Portuguese applied to the PSTN is also discussed. Particularly, in order to improve PSTN ASR systems, synthetically estimated wide band (WB) signals, from the narrow band (NB) enhancement by ABWE, are used to obtain more discriminating attributes of speech signals as well as for achieving better performance of acoustic models (AM). Theobtained results show an enhancement in the quality of reconstructed speech signals with very good performance in ASR systems, confirming the effectivenessof the proposed strategies in this research work

    Glottal-synchronous speech processing

    No full text
    Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity of voiced speech is exploited. Traditionally, speech processing involves segmenting and processing short speech frames of predefined length; this may fail to exploit the inherent periodic structure of voiced speech which glottal-synchronous speech frames have the potential to harness. Glottal-synchronous frames are often derived from the glottal closure instants (GCIs) and glottal opening instants (GOIs). The SIGMA algorithm was developed for the detection of GCIs and GOIs from the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and GOI detection from speech signals, the YAGA algorithm provides a measured accuracy of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to reverberation than single-channel algorithms. The GCIs are applied to real-world applications including speech dereverberation, where SNR is improved by up to 5 dB, and to prosodic manipulation where the importance of voicing detection in glottal-synchronous algorithms is demonstrated by subjective testing. The GCIs are further exploited in a new area of data-driven speech modelling, providing new insights into speech production and a set of tools to aid deployment into real-world applications. The technique is shown to be applicable in areas of speech coding, identification and artificial bandwidth extension of telephone speec

    Artificial Bandwidth Extension of Speech Signals using Neural Networks

    Get PDF
    Although mobile wideband telephony has been standardized for over 15 years, many countries still do not have a nationwide network with good coverage. As a result, many cellphone calls are still downgraded to narrowband telephony. The resulting loss of quality can be reduced by artificial bandwidth extension. There has been great progress in bandwidth extension in recent years due to the use of neural networks. The topic of this thesis is the enhancement of artificial bandwidth extension using neural networks. A special focus is given to hands-free calls in a car, where the risk is high that the wideband connection is lost due to the fast movement. The bandwidth of narrowband transmission is not only reduced towards higher frequencies above 3.5 kHz but also towards lower frequencies below 300 Hz. There are already methods that estimate the low-frequency components quite well, which will therefore not be covered in this thesis. In most bandwidth extension algorithms, the narrowband signal is initially separated into a spectral envelope and an excitation signal. Both parts are then extended separately in order to finally combine both parts again. While the extension of the excitation can be implemented using simple methods without reducing the speech quality compared to wideband speech, the estimation of the spectral envelope for frequencies above 3.5 kHz is not yet solved satisfyingly. Current bandwidth extension algorithms are just able to reduce the quality loss due to narrowband transmission by a maximum of 50% in most evaluations. In this work, a modification for an existing method for excitation extension is proposed which achieves slight improvements while not generating additional computational complexity. In order to enhance the wideband envelope estimation with neural networks, two modifications of the training process are proposed. On the one hand, the loss function is extended with a discriminative part to address the different characteristics of phoneme classes. On the other hand, by using a GAN (generative adversarial network) for the training phase, a second network is added temporarily to evaluate the quality of the estimation. The neural networks that were trained are compared in subjective and objective evaluations. A final listening test addressed the scenario of a hands-free call in a car, which was simulated acoustically. The quality loss caused by the missing high frequency components could be reduced by 60% with the proposed approach.Obwohl die mobile Breitbandtelefonie bereits seit über 15 Jahren standardisiert ist, gibt es oftmals noch kein flächendeckendes Netz mit einer guten Abdeckung. Das führt dazu, dass weiterhin viele Mobilfunkgespräche auf Schmalbandtelefonie heruntergestuft werden. Der damit einhergehende Qualitätsverlust kann mit künstlicher Bandbreitenerweiterung reduziert werden. Das Thema dieser Arbeit sind Methoden zur weiteren Verbesserungen der Qualität des erweiterten Sprachsignals mithilfe neuronaler Netze. Ein besonderer Fokus liegt auf der Freisprech-Telefonie im Auto, da dabei das Risiko besonders hoch ist, dass durch die schnelle Fortbewegung die Breitbandverbindung verloren geht. Bei der Schmalbandübertragung fehlen neben den hochfrequenten Anteilen (etwa 3.5–7 kHz) auch tiefe Frequenzen unterhalb von etwa 300 Hz. Diese tieffrequenten Anteile können mit bereits vorhandenen Methoden gut geschätzt werden und sind somit nicht Teil dieser Arbeit. In vielen Algorithmen zur Bandbreitenerweiterung wird das Schmalbandsignal zu Beginn in eine spektrale Einhüllende und ein Anregungssignal aufgeteilt. Beide Anteile werden dann separat erweitert und schließlich wieder zusammengeführt. Während die Erweiterung der Anregung nahezu ohne Qualitätsverlust durch einfache Methoden umgesetzt werden kann ist die Schätzung der spektralen Einhüllenden für Frequenzen über 3.5 kHz noch nicht zufriedenstellend gelöst. Mit aktuellen Methoden können im besten Fall nur etwa 50% der durch Schmalbandübertragung reduzierten Qualität zurückgewonnen werden. Für die Anregungserweiterung wird in dieser Arbeit eine Variation vorgestellt, die leichte Verbesserungen erzielt ohne dabei einen Mehraufwand in der Berechnung zu erzeugen. Für die Schätzung der Einhüllenden des Breitbandsignals mithilfe neuronaler Netze werden zwei Änderungen am Trainingsprozess vorgeschlagen. Einerseits wird die Kostenfunktion um einen diskriminativen Anteil erweitert, der das Netz besser zwischen verschiedenen Phonemen unterscheiden lässt. Andererseits wird als Architektur ein GAN (Generative adversarial network) verwendet, wofür in der Trainingsphase ein zweites Netz verwendet wird, das die Qualität der Schätzung bewertet. Die trainierten neuronale Netze wurden in subjektiven und objektiven Tests verglichen. Ein abschließender Hörtest diente zur Evaluierung des Freisprechens im Auto, welches akustisch simuliert wurde. Der Qualitätsverlust durch Wegfallen der hohen Frequenzanteile konnte dabei mit dem vorgeschlagenen Ansatz um etwa 60% reduziert werden
    corecore