8 research outputs found

    Bilateral Waveform Similarity Overlap-and-Add Based Packet Loss Concealment for Voice over IP

    Get PDF
    This paper invested a bilateral waveform similarity overlap-and-add algorithm for voice packet lost. Since Packet lost will cause the semantic misunderstanding, it has become one of the most essential problems in speech communication. This investment is based on waveform similarity measure using overlap-and-Add algorithm and provides the bilateral information to enhance the speech signal reconstruction. Traditionally, it has been improved that waveform similarity overlap-and-add (WSOLA) technique is an effective algorithm to deal with packet loss concealment (PLC) for real-time time communication. WSOLA algorithm is widely applied to deal with the length adaptation and packet loss concealment of speech signal. Time scale modification of audio signal is one of the most essential research topics in data communication, especially in voice of IP (VoIP). Herein, the proposed the bilateral WSOLA (BWSOLA) that is derived from WSOLA. Instead of only exploitation one direction speech data, the proposed method will reconstruct the lost voice data according to the preceding and cascading data. The related algorithms have been developed to achieve the optimal reconstructing estimation. The experimental results show that the quality of the reconstructed speech signal of the bilateral WSOLA is much better compared to the standard WSOLA and GWSOLA on different packet loss rate and length using the metrics PESQ and MOS. The significant improvement is obtained by bilateral information and proposed method. The proposed bilateral waveform similarity overlap-and-add (BWSOLA) outperforms the traditional approaches especially in the long duration data loss

    Improving Time-Scale Modification of Music Signals Using Harmonic-Percussive Separation

    Get PDF
    A major problem in time-scale modification (TSM) of music signals is that percussive transients are often perceptually degraded. To prevent this degradation, some TSM approaches try to explicitly identify transients in the input signal and to handle them in a special way. However, such approaches are problematic for two reasons. First, errors in the transient detection have an immediate influence on the final TSM result and, second, a perceptual transparent preservation of transients is by far not a trivial task. In this paper we present a TSM approach that handles transients implicitly by first separating the signal into a harmonic component as well as a percussive component which typically contains the transients. While the harmonic component is modified with a phase vocoder approach using a large frame size, the noise-like percussive component is modified with a simple time-domain overlap-add technique using a short frame size, which preserves the transients to a hig h degree without any explicit transient detection

    A transient-preserving audio time-stretching algorithm and a real-time realization for a commercial music product

    Get PDF
    The core of this work is a sub-band transient detection/preservation scheme based on the complex domain transient detection, and inspired by Robel’s work. This proposed technique can be integrated in a real-time phase vocoder analysis/synthesis scheme without introducing latency at relatively low computational cost

    Modifikasi Skala Waktu pada Rekaman Suara Menggunakan Waveform Similarity Overlap and Add (WSOLA)

    Get PDF
    Setiap manusia memiliki kemampuan berbeda beda dalam mendengarkan dan melafalkan ucapan.Orang menglafalkan ucapan dengan cepat atau lambat. Begitu juga kemampuan mendengar ada yang dapat mendengar dengan normal dan ada yang pendengarannya menurun yang disebabkan oleh faktor keturunan, usia, penyakit dan sebagainya. Agar suara yang cepat dan bising tersebut dapat terdengar jelas, pada umumnya pendengar menggunakan cara konvensional untuk mengatasi permasalahan tersebut dengan menggunakan aplikasi atau tape recorder untuk memperlambat rekaman suara. Pada penelitian ini dilakukan proses time stretching, yaitu pergeseran kerapatan waktu sinyal suara tanpa mengubah frekuensi dasar menggunakan metode WSOLA. Menghitung panjang maksimal waktu yang dapat digeser dengan frekuensi tetap dan suara masih terdengar dengan baik dan membandingkan dari metode WSOLA dan PSOLA tersebut yang mana bisa menghasilkan suara yang lebih baik. Diharapkan penelitian ini dapat membantu pendengaran kita untuk mendengar suara lebih baik atau jelas walaupun kecepatan suaranya dirubah. Dari hasil penelitian menunjukkan bahwa tidak memberi nilai toleransi pada metode WSOLA mengakibatkan frekuensi maksimum suara menjadi berbeda dengan suara asli. Dari pengujian pergeseran kerapatan waktu sinyal suara saron lima slendro menggunakan WSOLA dapat dilakukan modifikasi skala waktu sinyal suara saron dan suara manusia dengan tetap mempertahankan frekuensi aslinya dengan error rata-rata suara saron 0.847% dan suara manusia 5,094%. =============================================================================================================== Every people has a different ability to listen and pronounce speech. People recite speech quickly or slowly. Likewise, there is a listening ability that can hear normally and there is a decreased hearing caused by heredity, age, disease and so forth. So that the fast and noisy sound can be heard clearly, listeners generally use conventional methods to overcome these problems by using an application or tape recorder to slow down the sound recording. In this research, the time stretching process is used, which is the time noise signal density shift without changing the basic frequency using WSOLA method. Calculating the maximum length of time that can be shifted with a fixed frequency and sound still sounds well and compares with the WSOLA and PSOLA methods which can produce better sound. It is hoped that this research can help our hearing to hear sounds better or clearer even though the speed of the voice is changed. The results of the study showed that not giving the tolerance value on the WSOLA method resulted in the maximum frequency of sound being different from the original sound. From testing the time density shift of the saron five slendro sound signal using WSOLA can be modified the time scale of the saron sound signal and human voice while maintaining the original frequency with an average error of saron 0.847% and 5.094% human voice

    Closing the gap: human factors in cross-device media synchronization

    Get PDF
    The continuing growth in the mobile phone arena, particularly in terms of device capabilities and ownership is having a transformational impact on media consumption. It is now possible to consider orchestrated multi-stream experiences delivered across many devices, rather than the playback of content from a single device. However, there are significant challenges in realising such a vision, particularly around the management of synchronicity between associated media streams. This is compounded by the heterogeneous nature of user devices, the networks upon which they operate, and the perceptions of users. This paper describes IMSync, an open inter-stream synchronisation framework that is QoE-aware. IMSync adopts efficient monitoring and control mechanisms, alongside a QoE perception model that has been derived from a series of subjective user experiments. Based on an observation of lag, IMSync is able to use this model of impact to determine an appropriate strategy to catch-up with playback whilst minimising the potential detrimental impacts on a users QoE. The impact model adopts a balanced approach: trading off the potential impact on QoE of initiating a re-synchronisation process compared with retaining the current levels of non-synchronicity, in order to maintain high levels of QoE. A series of experiments demonstrate the potential of the framework as a basis for enabling new, immersive media experiences

    Models and analysis of vocal emissions for biomedical applications

    Get PDF
    This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies

    Violin Augmentation Techniques for Learning Assistance

    Get PDF
    PhDLearning violin is a challenging task requiring execution of pitch tasks with the left hand using a strong aural feedback loop for correctly adjusting pitch, concurrent with the right hand moving a bow precisely with correct pressure across strings. Real-time technological assistance can help a student gain feedback and understanding helpful for learning and maintaining motivation. This thesis presents real-time low-cost low-latency violin augmentations that can be used to assist learning the violin along with other real-time performance tasks. To capture bow performance, we demonstrate a new means of bow tracking by measuring bow hair de ection from the bow hair being pressed against the string. Using near- eld optical sensors placed along the bow we are able to estimate bow position and pressure through linear regression from training samples. For left hand pitch tracking, we introduce low cost means for tracking nger position and illustrate the combination of sensed results with audio processing to achieve high accuracy low-latency pitch tracking. We subsequently verify our new tracking methods' e ectiveness and usefulness demonstrating low-latency note onset detection and control of real-time performance visuals. To help tackle the challenge of intonation, we used our pitch estimation to develop low latency pitch correction. Using expert performers, we veri ed that fully correcting pitch is not only disconcerting but breaks a violinist's learned pitch feedback loop resulting in worse asplayed performance. However, partial pitch correction, though also linked to worse as-played performance, did not lead to a signi cantly negative experience con rming its potential for use to temporarily reduce barriers to success. Subsequently, in a study with beginners, we veri ed that when the pitch feedback loop is underdeveloped, automatic pitch correction did not signi cantly hinder performance, but o ered an enjoyable low-pitch error experience and that providing an automatic target guide pitch was helpful in correcting performed pitch error
    corecore