1,641 research outputs found

    Adaptive algorithms and structures with potential application in reverberation time estimation in occupied rooms

    Get PDF
    Realistic and accurate room reverberation time (RT) extraction is very important in room acoustics. Occupied room RT extraction is even more attractive but it is technically challenging, since the presence of the audience changes the room acoustics. Recently, some methods have been proposed to solve the occupied room RT extraction problem by utilizing passively received speech signals, such as the maximum likelihood estimation (MLE) technique and the artificial neural network (ANN) scheme. Although reasonable RT estimates can be extracted by these methods, noise may affect their accuracy, especially for occupied rooms, where noise is inevitable due to the presence of the audience. To improve the accuracy of the RT estimates from high noise occupied rooms, adaptive techniques are utilized in this thesis as a preprocess ing stage for RT estimation. As a demonstration, this preprocessing together with the MLE method will be applied to extract the RT of a room in which there is significant noise from passively received speech signals. This preprocessing can also be potentially used to aid in the extraction of other acoustic parameters, such as the early decay time (EDT) and speech transmission index (STI). The motivation of the proposed approach is to utilize adaptive techniques, namely blind source separation (BSS) and adaptive noise cancellation (ANC), based upon the least mean square (LMS) algorithm, to reduce the noise level contained in the received speech signal, so that the RT extracted from the signal output generated by the preprocessing can be more accurate. Further research is also performed on some fundamental topics re lated to adaptive techniques. The first topic is variable step size LMS (VSSLMS) algorithms, which are designed to enhance the convergence rate of the LMS algorithm. The concept of gradient based VSSLMS algorithms is described, and new gradient based VSSLMS algorithms are proposed for applications where the input signal is statistically stationary and the signal-to-noise ratio (SNR) is zero decibels or less. The second topic is variable tap-length LMS (VTLMS) algorithms. VTLMS algorithms are designed for applications where the tap-length of the adaptive filter coefficient vector is unknown. The target of these algorithms is to establish a good steady-state tap-length for the LMS algorithm. A steady-state performance analysis for a VTLMS algorithm, the fractional tap-length (FT) algorithm is therefore provided. To improve the performance of the FT algorithm in high noise conditions, a convex combination approach for the FT algorithm is proposed. Furthermore, a new practical VTLMS algorithm is also designed for applications in which the optimal filter has an exponential decay impulse response, commonplace in enclosed acoustic environments. These original research outputs provide deep understanding of the VTLMS algorithms. Finally, the idea of variable tap-length is introduced for the first time into the BSS algorithm. Similar to the FT algorithm, the tap-length of the natural gradient (NG) algorithm, which is one of the most important sequential BSS algorithms is also made variable rather than fixed. A new variable tap-length NG algorithm is proposed to search for a steady-state adaptive filter vector tap-length, and thereby provide a good compromise between steady-state performance and computational complexity. The research recorded in this thesis gives a first step in introducing adaptive techniques into acoustic parameter extraction. Limited by the performance of such adaptive techniques, only simulated studies and comparisons are performed to evaluate the proposed new approach. With further development of the associated adaptive techniques, practical applications of the proposed approach may be obtained in the future.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Adaptive algorithms and structures with potential application in reverberation time estimation in occupied rooms

    Get PDF
    Realistic and accurate room reverberation time (RT) extraction is very important in room acoustics. Occupied room RT extraction is even more attractive but it is technically challenging, since the presence of the audience changes the room acoustics. Recently, some methods have been proposed to solve the occupied room RT extraction problem by utilizing passively received speech signals, such as the maximum likelihood estimation (MLE) technique and the artificial neural network (ANN) scheme. Although reasonable RT estimates can be extracted by these methods, noise may affect their accuracy, especially for occupied rooms, where noise is inevitable due to the presence of the audience. To improve the accuracy of the RT estimates from high noise occupied rooms, adaptive techniques are utilized in this thesis as a preprocess ing stage for RT estimation. As a demonstration, this preprocessing together with the MLE method will be applied to extract the RT of a room in which there is significant noise from passively received speech signals. This preprocessing can also be potentially used to aid in the extraction of other acoustic parameters, such as the early decay time (EDT) and speech transmission index (STI). The motivation of the proposed approach is to utilize adaptive techniques, namely blind source separation (BSS) and adaptive noise cancellation (ANC), based upon the least mean square (LMS) algorithm, to reduce the noise level contained in the received speech signal, so that the RT extracted from the signal output generated by the preprocessing can be more accurate. Further research is also performed on some fundamental topics re lated to adaptive techniques. The first topic is variable step size LMS (VSSLMS) algorithms, which are designed to enhance the convergence rate of the LMS algorithm. The concept of gradient based VSSLMS algorithms is described, and new gradient based VSSLMS algorithms are proposed for applications where the input signal is statistically stationary and the signal-to-noise ratio (SNR) is zero decibels or less. The second topic is variable tap-length LMS (VTLMS) algorithms. VTLMS algorithms are designed for applications where the tap-length of the adaptive filter coefficient vector is unknown. The target of these algorithms is to establish a good steady-state tap-length for the LMS algorithm. A steady-state performance analysis for a VTLMS algorithm, the fractional tap-length (FT) algorithm is therefore provided. To improve the performance of the FT algorithm in high noise conditions, a convex combination approach for the FT algorithm is proposed. Furthermore, a new practical VTLMS algorithm is also designed for applications in which the optimal filter has an exponential decay impulse response, commonplace in enclosed acoustic environments. These original research outputs provide deep understanding of the VTLMS algorithms. Finally, the idea of variable tap-length is introduced for the first time into the BSS algorithm. Similar to the FT algorithm, the tap-length of the natural gradient (NG) algorithm, which is one of the most important sequential BSS algorithms is also made variable rather than fixed. A new variable tap-length NG algorithm is proposed to search for a steady-state adaptive filter vector tap-length, and thereby provide a good compromise between steady-state performance and computational complexity. The research recorded in this thesis gives a first step in introducing adaptive techniques into acoustic parameter extraction. Limited by the performance of such adaptive techniques, only simulated studies and comparisons are performed to evaluate the proposed new approach. With further development of the associated adaptive techniques, practical applications of the proposed approach may be obtained in the future

    A multimodal approach to blind source separation of moving sources

    Get PDF
    A novel multimodal approach is proposed to solve the problem of blind source separation (BSS) of moving sources. The challenge of BSS for moving sources is that the mixing filters are time varying; thus, the unmixing filters should also be time varying, which are difficult to calculate in real time. In the proposed approach, the visual modality is utilized to facilitate the separation for both stationary and moving sources. The movement of the sources is detected by a 3-D tracker based on video cameras. Positions and velocities of the sources are obtained from the 3-D tracker based on a Markov Chain Monte Carlo particle filter (MCMC-PF), which results in high sampling efficiency. The full BSS solution is formed by integrating a frequency domain blind source separation algorithm and beamforming: if the sources are identified as stationary for a certain minimum period, a frequency domain BSS algorithm is implemented with an initialization derived from the positions of the source signals. Once the sources are moving, a beamforming algorithm which requires no prior statistical knowledge is used to perform real time speech enhancement and provide separation of the sources. Experimental results confirm that by utilizing the visual modality, the proposed algorithm not only improves the performance of the BSS algorithm and mitigates the permutation problem for stationary sources, but also provides a good BSS performance for moving sources in a low reverberant environment

    Blind estimation of room acoustic parameters from speech and music signals

    Get PDF
    The acoustic character of a space is often quantified using objective room acoustic parameters. The measurement of these parameters is difficult in occupied conditions and thus measurements are usually performed when the space is un-occupied. This is despite the knowledge that occupancy can impact significantly on the measured parameter value. Within this thesis new methods are developed by which naturalistic signals such as speech and music can be used to perform acoustic parameter measurement. Adoption of naturalistic signals enables passive measurement during orchestral performances and spoken announcements, thus facilitating easy in-situ measurement. Two methods are described within this work; (1) a method utilising artificial neural networks where a network is taught to recognise acoustic parameters from received, reverberated signals and (2) a method based on the maximum likelihood estimation of the decay curve of the room from which parameters are then calculated. (1) The development of the neural network method focuses on a new pre-processor for use with music signals. The pre-processor utilises a narrow band filter bank with centre frequencies chosen based on the equal temperament scale. The success of a machine learning method is linked to the quality of the training data and therefore realistic acoustic simulation algorithms were used to generate a large database of room impulse responses. Room models were defined with realistic randomly generated geometries and surface properties; these models were then used to predict the room impulse responses. (2) In the second approach, a statistical model of the decay of sound in a room was further developed. This model uses a maximum likelihood (ML) framework to yield a number of decay curve estimates from a received reverberant signal. The success of the method depends on a number of stages developed for the algorithm; (a) a pre-processor to select appropriate decay phases for estimation purposes, (b) a rigorous optimisation algorithm to ensure the correct maximum likelihood estimate is found and (c) a method to yield a single optimum decay curve estimate from which the parameters are calculated. The ANN and ML methods were tested using orchestral music and speech signals. The ANN method tended to perform well when estimating the early decay time (EDT), for speech and music signals the error was within the subjective difference limens. However, accuracy was reduced for the reverberation time (Rt) and other parameters. By contrast the ML method performed well for Rt with results for both speech and music within the difference limens for reasonable (<4s) reverberation time. In addition reasonable accuracy was found for EDT, Clarity (C80), Centre time (Ts) and Deutichkeit (D). The ML method is also capable of producing accurate estimates of the binaural parameters Early Lateral Energy Fraction (LEF) and the late lateral strength (LG). A number of real world measurements were carried out in concert halls where the ML accuracy was shown to be sufficient for most parameters. The ML method has the advantage over the ANN method due to its truly blind nature (the ANN method requires a period of learning and is therefore semi-blind). The ML method uses gaps of silence between notes or utterances, when these silence regions are not present the method does not produce an estimate. Accurate estimation requires a long recording (hours of music or many minutes of speech) to ensure that at least some silent regions are present. This thesis shows that, given a sufficiently long recording, accurate estimates of many acoustic parameters can be obtained directly from speech and music. Further extensions to the ML method detailed in this thesis combine the ML estimated decay curve with cepstral methods which detect the locations of early reflections. This improves the accuracy of many of the parameter estimates.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Blind estimation of room acoustic parameters from speech and music signals

    Get PDF
    The acoustic character of a space is often quantified using objective room acoustic parameters. The measurement of these parameters is difficult in occupied conditions and thus measurements are usually performed when the space is un-occupied. This is despite the knowledge that occupancy can impact significantly on the measured parameter value. Within this thesis new methods are developed by which naturalistic signals such as speech and music can be used to perform acoustic parameter measurement. Adoption of naturalistic signals enables passive measurement during orchestral performances and spoken announcements, thus facilitating easy in-situ measurement.Two methods are described within this work; (1) a method utilising artificial neural networks where a network is taught to recognise acoustic parameters from received, reverberated signals and (2) a method based on the maximum likelihood estimation of the decay curve of the room from which parameters are then calculated. (1)The development of the neural network method focuses on a new pre-processor for use with music signals. The pre-processor utilises a narrow band filter bank with centre frequencies chosen based on the equal temperament scale. The success of a machine learning method is linked to the quality of the training data and therefore realistic acoustic simulation algorithms were used to generate a large database of room impulse responses. Room models were defined with realistic randomly generated geometries and surface properties; these models were then used to predict the room impulse responses.(2)In the second approach, a statistical model of the decay of sound in a room was further developed. This model uses a maximum likelihood (ML) framework to yield a number of decay curve estimates from a received reverberant signal. The success of the method depends on a number of stages developed for the algorithm; (a) a pre-processor to select appropriate decay phases for estimation purposes, (b) a rigorous optimisation algorithm to ensure the correct maximum likelihood estimate is found and (c) a method to yield a single optimum decay curve estimate from which the parameters are calculated.The ANN and ML methods were tested using orchestral music and speech signals. The ANN method tended to perform well when estimating the early decay time (EDT), for speech and music signals the error was within the subjective difference limens. However, accuracy was reduced for the reverberation time (Rt) and other parameters. By contrast the ML method performed well for Rt with results for both speech and music within the difference limens for reasonable (<4s) reverberation time. In addition reasonable accuracy was found for EDT, Clarity (C80), Centre time (Ts) and Deutichkeit (D). The ML method is also capable of producing accurate estimates of the binaural parameters Early Lateral Energy Fraction (LEF) and the late lateral strength (LG).A number of real world measurements were carried out in concert halls where the ML accuracy was shown to be sufficient for most parameters. The ML method has the advantage over the ANN method due to its truly blind nature (the ANN method requires a period of learning and is therefore semi-blind). The ML method uses gaps of silence between notes or utterances, when these silence regions are not present the method does not produce an estimate. Accurate estimation requires a long recording (hours of music or many minutes of speech) to ensure that at least some silent regions are present. This thesis shows that, given a sufficiently long recording, accurate estimates of many acoustic parameters can be obtained directly from speech and music.Further extensions to the ML method detailed in this thesis combine the ML estimated decay curve with cepstral methods which detect the locations of early reflections. This improves the accuracy of many of the parameter estimates

    Source Separation for Hearing Aid Applications

    Get PDF
    • …
    corecore