989 research outputs found

    Adabook and Multibook: adaptive boosting with chance correction

    Get PDF
    There has been considerable interest in boosting and bagging, including the combination of the adaptive techniques of AdaBoost with the random selection with replacement techniques of Bagging. At the same time there has been a revisiting of the way we evaluate, with chance-corrected measures like Kappa, Informedness, Correlation or ROC AUC being advocated. This leads to the question of whether learning algorithms can do better by optimizing an appropriate chance corrected measure. Indeed, it is possible for a weak learner to optimize Accuracy to the detriment of the more reaslistic chance-corrected measures, and when this happens the booster can give up too early. This phenomenon is known to occur with conventional Accuracy-based AdaBoost, and the MultiBoost algorithm has been developed to overcome such problems using restart techniques based on bagging. This paper thus complements the theoretical work showing the necessity of using chance-corrected measures for evaluation, with empirical work showing how use of a chance-corrected measure can improve boosting. We show that the early surrender problem occurs in MultiBoost too, in multiclass situations, so that chance-corrected AdaBook and Multibook can beat standard Multiboost or AdaBoost, and we further identify which chance-corrected measures to use when

    Objective and Subjective Evaluation of Wideband Speech Quality

    Get PDF
    Traditional landline and cellular communications use a bandwidth of 300 - 3400 Hz for transmitting speech. This narrow bandwidth impacts quality, intelligibility and naturalness of transmitted speech. There is an impending change within the telecommunication industry towards using wider bandwidth speech, but the enlarged bandwidth also introduces a few challenges in speech processing. Echo and noise are two challenging issues in wideband telephony, due to increased perceptual sensitivity by users. Subjective and/or objective measurements of speech quality are important in benchmarking speech processing algorithms and evaluating the effect of parameters like noise, echo, and delay in wideband telephony. Subjective measures include ratings of speech quality by listeners, whereas objective measures compute a metric based on the reference and degraded speech samples. While subjective quality ratings are the gold - standard\u27\u27, they are also time- and resource- consuming. An objective metric that correlates highly with subjective data is attractive, as it can act as a substitute for subjective quality scores in gauging the performance of different algorithms and devices. This thesis reports results from a series of experiments on subjective and objective speech quality evaluation for wideband telephony applications. First, a custom wideband noise reduction database was created that contained speech samples corrupted by different background noises at different signal to noise ratios (SNRs) and processed by six different noise reduction algorithms. Comprehensive subjective evaluation of this database revealed an interaction between the algorithm performance, noise type and SNR. Several auditory-based objective metrics such as the Loudness Pattern Distortion (LPD) measure based on the Moore - Glasberg auditory model were evaluated in predicting the subjective scores. In addition, the performance of Bayesian Multivariate Regression Splines(BMLS) was also evaluated in terms of mapping the scores calculated by the objective metrics to the true quality scores. The combination of LPD and BMLS resulted in high correlation with the subjective scores and was used as a substitution for fine - tuning the noise reduction algorithms. Second, the effect of echo and delay on the wideband speech was evaluated in both listening and conversational context, through both subjective and objective measures. A database containing speech samples corrupted by echo with different delay and frequency response characteristics was created, and was later used to collect subjective quality ratings. The LPD - BMLS objective metric was then validated using the subjective scores. Third, to evaluate the effect of echo and delay in conversational context, a realtime simulator was developed. Pairs of subjects conversed over the simulated system and rated the quality of their conversations which were degraded by different amount of echo and delay. The quality scores were analysed and LPD+BMLS combination was found to be effective in predicting subjective impressions of quality for condition-averaged data

    Compensating first reflections in non-anechoic head-related transfer function measurements

    Full text link
    [EN] Personalized Head-Related Transfer Functions (HRTFs) are needed as part of the binaural sound individ- ualization process in order to provide a high-quality immersive experience for a specific user. Signal processing methods for performing HRTF measurements in non-anechoic conditions are of high interest to avoid the complex and inconvenient access to anechoic facilities. Non-anechoic HRTF measurements capture the effect of room reflections, which should be correctly identified and eliminated to obtain HRTFs estimates comparable to ones acquired in an anechoic setup. This paper proposes a sub-band frequency-dependent processing method for reflection suppression in non-anechoic HRTF signals. Array processing techniques based on Plane Wave Decomposition (PWD) are adopted as an essential part of the solution for low frequency ranges, whereas the higher frequencies are easily handled by means of time-crop windowing methods. The formulation of the model, extraction of parameters and evaluation of the method are described in detail. In addition, a validation case study is presented showing the suppression of reflections from an HRTF measured in a real system. The results confirm that the method allows to obtain processed HRTFs comparable to those acquired in anechoic conditions.This work has received funding from the Spanish Ministry of Science, Innovation and Universities, through projects RTI2018097045-B-C21 and RTI2018-097045-B-C22, and Generalitat Valenciana under the AICO/2020/154 project grant.López Monfort, JJ.; Gutierrez-Parera, P.; Cobos, M. (2022). Compensating first reflections in non-anechoic head-related transfer function measurements. Applied Acoustics. 188:1-13. https://doi.org/10.1016/j.apacoust.2021.10852311318

    Study to determine potential flight applications and human factors design guidelines for voice recognition and synthesis systems

    Get PDF
    A study was conducted to determine potential commercial aircraft flight deck applications and implementation guidelines for voice recognition and synthesis. At first, a survey of voice recognition and synthesis technology was undertaken to develop a working knowledge base. Then, numerous potential aircraft and simulator flight deck voice applications were identified and each proposed application was rated on a number of criteria in order to achieve an overall payoff rating. The potential voice recognition applications fell into five general categories: programming, interrogation, data entry, switch and mode selection, and continuous/time-critical action control. The ratings of the first three categories showed the most promise of being beneficial to flight deck operations. Possible applications of voice synthesis systems were categorized as automatic or pilot selectable and many were rated as being potentially beneficial. In addition, voice system implementation guidelines and pertinent performance criteria are proposed. Finally, the findings of this study are compared with those made in a recent NASA study of a 1995 transport concept

    Overt social interaction and resting state in young adult males with autism: core and contextual neural features

    Get PDF
    Conversation is an important and ubiquitous social behavior. Individuals with Autism Spectrum Disorder (autism) without intellectual disability often have normal structural language abilities but deficits in social aspects of communication like pragmatics, prosody, and eye contact. Previous studies of resting state activity suggest that intrinsic connections among neural circuits involved with social processing are disrupted in autism, but to date no neuroimaging study has examined neural activity during the most commonplace yet challenging social task: spontaneous conversation. Here we used functional MRI to scan autistic males (N=19) without intellectual disability and age- and IQ-matched typically developing controls (N=20) while they engaged in a total of 193 face-to-face interactions. Participants completed two kinds of tasks: Conversation, which had high social demand, and Repetition, which had low social demand. Autistic individuals showed abnormally increased task-driven inter-regional temporal correlation relative to controls, especially among social processing regions and during high social demand. Furthermore, these increased correlations were associated with parent ratings of participants’ social impairments. These results were then compared with previously-acquired resting-state data (56 Autism, 62 Control participants). While some inter-regional correlation levels varied by task or rest context, others were strikingly similar across both task and rest, namely increased correlation among the thalamus, dorsal and ventral striatum, somatomotor, temporal and prefrontal cortex in the autistic individuals, relative to the control groups. These results suggest a basic distinction. Autistic cortico-cortical interactions vary by context, tending to increase relative to controls during Task and decrease during Rest. In contrast, striato- and thalamocortical relationships with socially engaged brain regions are increased in both Task and Rest, and may be core to the condition of autism

    Voice interfaces in mobile human-robot collaboration for advanced manufacturing systems.

    Get PDF
    Masters Degree. University of KwaZulu-Natal, Durban.Abstract available in PDF

    Psycholinguistic and neurolinguistic investigations of scalar implicature

    Get PDF
    The present study examines the representation and composition of meaning in scalar implicatures. Scalar implicature is the phenomenon whereby the use of a less informative term (e.g., some) is inferred to mean the negation of a more informative term (e.g., to mean not all). The experiments reported here investigate how the processing of the implicature-based aspect of meaning (e.g., the interpretation of some as meaning not all) differs from other types of meaning processing, and how that aspect of meaning is initially realized. The first three experiments measure event-related potentials (ERPs) to examine whether inferential pragmatic aspects of meaning are processed using different mechanisms than lexical or combinatorial semantic aspects of meaning, and whether inferential aspects of meaning can be realized rapidly. Participants read infelicitous quantifiers for which the semantic meaning (at least one of) was correct with respect to the context but the pragmatic meaning (not all of) was not, compared to quantifiers for which the semantic meaning was inconsistent with the context and no additional pragmatic meaning is available. Across experiments, quantifiers that were pragmatically inconsistent but not semantically inconsistent with the context elicited a broadly distributed, sustained negative component. This sustained negativity contrasts with the N400 effect typically elicited by nouns that are incongruent with their context, suggesting that the recognition of scalar implicature errors elicits a qualitatively different ERP signature than the recognition of lexico-semantic errors. The effect was also distinct from the ERP response elicited by quantifiers that were semantically inconsistent with a context. The sustained negativity may reflect cancellation of the pragmatic inference and retrieval of the semantic meaning. This process was also found to be independent from lexico-semantic processing: the N400 elicited by lexico-semantic violations was not modulated by the presence of a pragmatic inconsistency. These findings suggest there is a dissociation between the mechanisms for processing combinatorial semantic meaning and those for inference-based pragmatic meaning, that inferential pragmatic meaning can be realized rapidly, and that the computation of meaning involves continuous negotiation between different aspects of meaning. The next set of experiments examined how scalar implicature-based meanings are realized initially. Default processing accounts assume that the interpretation of some of as meaning not all of is realized easily and automatically (regardless of context), whereas context-driven processing accounts assume that it is realized effortfully and only in certain contexts. In two experiments, participants' self-paced reading times were recorded as they read vignettes in which the context did or did not bias the participants to make a scalar inference (to interpret some of as meaning not all of). The reading times in the first experiment suggested that the realization of the inference was influenced by the context: reading times to a target word later in the vignette were facilitated in contexts in which the scalar inference should be realized but not in contexts where it should not be realized. Importantly, however, reading times did not provide evidence for processing cost at the time the inference is realized, contrary to the predictions of context-driven processing accounts. The results raise the question of why inferencing occurs only in certain contexts if it does not involve extra processing effort. In the subsequent experiment, reading times suggested that the inference may not have been realized when participants engaged in a secondary task that increased processing load. These results, together with the results of other recent experiments, suggest that inferencing may be effortless in certain contexts but effortful with other contexts, and not computed at all in still other contexts, depending on the strength of the bias created by the context. These findings may all be accountable for under a recently-proposed constraint-based processing model of scalar implicature

    An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony

    Get PDF
    In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique
    • …
    corecore