9 research outputs found

    Nonintrusive speech quality estimation using Gaussian mixture models

    Full text link

    An intelligent healthcare system for detection and classification to discriminate vocal fold disorders

    Get PDF
    The growing population of senior citizens around the world will appear as a big challenge in the future and they will engage a significant portion of the healthcare facilities. Therefore, it is necessary to develop intelligent healthcare systems so that they can be deployed in smart homes and cities for remote diagnosis. To overcome the problem, an intelligent healthcare system is proposed in this study. The proposed intelligent system is based on the human auditory mechanism and capable of detection and classification of various types of the vocal fold disorders. In the proposed system, critical bandwidth phenomena by using the bandpass filters spaced over Bark scale is implemented to simulate the human auditory mechanism. Therefore, the system acts like an expert clinician who can evaluate the voice of a patient by auditory perception. The experimental results show that the proposed system can detect the pathology with an accuracy of 99.72%. Moreover, the classification accuracy for vocal fold polyp, keratosis, vocal fold paralysis, vocal fold nodules, and adductor spasmodic dysphonia is 97.54%, 99.08%, 96.75%, 98.65%, 95.83%, and 95.83%, respectively. In addition, an experiment for paralysis versus all other disorders is also conducted, and an accuracy of 99.13% is achieved. The results show that the proposed system is accurate and reliable in vocal fold disorder assessment and can be deployed successfully for remote diagnosis. Moreover, the performance of the proposed system is better as compared to existing disorder assessment systems

    Prepositional Phrase Attachment Ambiguities in Declarative and Interrogative Contexts: Oral Reading Data

    Full text link
    Certain English sentences containing multiple prepositional phrases (e.g., She had planned to cram the paperwork in the drawer into her briefcase) have been reported to be prone to mis-parsing of a kind that is standardly called a “garden path.” The mis-parse stems from the temporary ambiguity of the first prepositional phrase (PP1: in the drawer), which tends to be interpreted initially as the goal argument of the verb cram. If the sentence ended there, that would be correct. But that analysis is overridden when the second prepositional phrase (PP2: into her briefcase) is encountered, since the into phrase can only be interpreted as the goal argument of the verb. Thus, PP2 necessarily supplants PP1’s initially assigned position as goal, and PP1 must be reanalyzed as a modifier of the object NP (the paperwork). Interrogative versions of the same sentence structure (Had she planned to cram the paperwork in the drawer into her briefcase?) may have a different profile. They have been informally judged to be easier to process than their declarative counterparts, because they are less susceptible to the initial garden path analysis. The study presented here represents an attempt to find a behavioral correlate of this intuitive difference in processing difficulty. The experiment employs the Double Reading Paradigm (Fodor, Macaulay, Ronkos, Callahan, and Peckenpaugh, 2019). Participants were asked to read aloud a visually presented sentence twice, first without taking any time at all to preview the sentence content (Reading 1), and then again after unlimited preview (Reading 2). The experimental items were created in a 2 x 2 design with one factor being Speech Act (declarative vs. interrogative) and the other being PP2 Status, i.e., PP2 could only be an argument of the verb iv (Arg), as above, or else PP2 could be interpreted as a modifier (Mod) of the NP within the preceding PP, as in She had / Had she planned to cram the paperwork in the drawer of her filing cabinet(?). Participants’ recordings of Reading 1 and Reading 2 were subjected to prosodic coding by a linguist who was naive to the research question. Distributions of prosodic boundaries were statistically analyzed to extract any significant differences in prosodic boundary patterns as a function of Speech Act, Reading, or PP2 Status. Logistic mixed effect regression models indicated, as anticipated, a significant effect of PP2 Status across all analyses of prosodic phrasing, and a significant effect of Reading for both analyses of prosodic phrasing that included boundary strength. Speech Act was a significant predictor in one of prosodic phrasing, but the hypothesized interaction (between Speech Act and PP2 Status) was not significant in any model. Another analysis concerned the amount of time a participant spent silently studying a sentence after Reading 1 to be confident they had understood it before reading it aloud again (Reading 2). The time between readings is referred to as the inter-reading time (IRT). It was assumed that a longer IRT signifies greater processing difficulty of the sentence. Thus, IRT was hypothesized to provide a behavioral correlate of the intuitive judgement that the interrogative garden paths are easier to process than the declarative ones. If a correlate had been found, it would have taken the form of an interaction between the two factors (Speech Act and PP2 Status) such that the IRT difference between Arg and Mod sentence versions was smaller for interrogatives than for declaratives. Ultimately, however, no statistically significant interaction between Speech Act and PP2 Status was found. Further studies seeking behavioral evidence of the informal intuition motivating this research are proposed. Also offered are possible explanations for why the intuition is apparently so strong for some English speakers, and why, if so, it is not reflected in IRT. Significant ancillary findings are that interrogatives are in general more difficult to process than corresponding declaratives. Also, inter-reading time (IRT) in the Double Reading paradigm is confirmed as a useful measure of sentence processing difficulty given that within the declarative sentences, the garden-path (Arg) versions showed significantly longer IRTs than the non-garden-path (Mod) versions

    Reverberation: models, estimation and application

    No full text
    The use of reverberation models is required in many applications such as acoustic measurements, speech dereverberation and robust automatic speech recognition. The aim of this thesis is to investigate different models and propose a perceptually-relevant reverberation model with suitable parameter estimation techniques for different applications. Reverberation can be modelled in both the time and frequency domain. The model parameters give direct information of both physical and perceptual characteristics. These characteristics create a multidimensional parameter space of reverberation, which can be to a large extent captured by a time-frequency domain model. In this thesis, the relationship between physical and perceptual model parameters will be discussed. In the first application, an intrusive technique is proposed to measure the reverberation or reverberance, perception of reverberation and the colouration. The room decay rate parameter is of particular interest. In practical applications, a blind estimate of the decay rate of acoustic energy in a room is required. A statistical model for the distribution of the decay rate of the reverberant signal named the eagleMax distribution is proposed. The eagleMax distribution describes the reverberant speech decay rates as a random variable that is the maximum of the room decay rates and anechoic speech decay rates. Three methods were developed to estimate the mean room decay rate from the eagleMax distributions alone. The estimated room decay rates form a reverberation model that will be discussed in the context of room acoustic measurements, speech dereverberation and robust automatic speech recognition individually

    Speech assessment and characterization for law enforcement applications

    No full text
    Speech signals acquired, transmitted or stored in non-ideal conditions are often degraded by one or more effects including, for example, additive noise. These degradations alter the signal properties in a manner that deteriorates the intelligibility or quality of the speech signal. In the law enforcement context such degradations are commonplace due to the limitations in the audio collection methodology, which is often required to be covert. In severe degradation conditions, the acquired signal may become unintelligible, losing its value in an investigation and in less severe conditions, a loss in signal quality may be encountered, which can lead to higher transcription time and cost. This thesis proposes a non-intrusive speech assessment framework from which algorithms for speech quality and intelligibility assessment are derived, to guide the collection and transcription of law enforcement audio. These methods are trained on a large database labelled using intrusive techniques (whose performance is verified with subjective scores) and shown to perform favorably when compared with existing non-intrusive techniques. Additionally, a non-intrusive CODEC identification and verification algorithm is developed which can identify a CODEC with an accuracy of 96.8 % and detect the presence of a CODEC with an accuracy higher than 97 % in the presence of additive noise. Finally, the speech description taxonomy framework is developed, with the aim of characterizing various aspects of a degraded speech signal, including the mechanism that results in a signal with particular characteristics, the vocabulary that can be used to describe those degradations and the measurable signal properties that can characterize the degradations. The taxonomy is implemented as a relational database that facilitates the modeling of the relationships between various attributes of a signal and promises to be a useful tool for training and guiding audio analysts

    Contribution to quality of user experience provision over wireless networks

    Get PDF
    The widespread expansion of wireless networks has brought new attractive possibilities to end users. In addition to the mobility capabilities provided by unwired devices, it is worth remarking the easy configuration process that a user has to follow to gain connectivity through a wireless network. Furthermore, the increasing bandwidth provided by the IEEE 802.11 family has made possible accessing to high-demanding services such as multimedia communications. Multimedia traffic has unique characteristics that make it greatly vulnerable against network impairments, such as packet losses, delay, or jitter. Voice over IP (VoIP) communications, video-conference, video-streaming, etc., are examples of these high-demanding services that need to meet very strict requirements in order to be served with acceptable levels of quality. Accomplishing these tough requirements will become extremely important during the next years, taking into account that consumer video traffic will be the predominant traffic in the Internet during the next years. In wired systems, these requirements are achieved by using Quality of Service (QoS) techniques, such as Differentiated Services (DiffServ), traffic engineering, etc. However, employing these methodologies in wireless networks is not that simple as many other factors impact on the quality of the provided service, e.g., fading, interferences, etc. Focusing on the IEEE 802.11g standard, which is the most extended technology for Wireless Local Area Networks (WLANs), it defines two different architecture schemes. On one hand, the infrastructure mode consists of a central point, which manages the network, assuming network controlling tasks such as IP assignment, routing, accessing security, etc. The rest of the nodes composing the network act as hosts, i.e., they send and receive traffic through the central point. On the other hand, the IEEE 802.11 ad-hoc configuration mode is less extended than the infrastructure one. Under this scheme, there is not a central point in the network, but all the nodes composing the network assume both host and router roles, which permits the quick deployment of a network without a pre-existent infrastructure. This type of networks, so called Mobile Ad-hoc NETworks (MANETs), presents interesting characteristics for situations when the fast deployment of a communication system is needed, e.g., tactics networks, disaster events, or temporary networks. The benefits provided by MANETs are varied, including high mobility possibilities provided to the nodes, network coverage extension, or network reliability avoiding single points of failure. The dynamic nature of these networks makes the nodes to react to topology changes as fast as possible. Moreover, as aforementioned, the transmission of multimedia traffic entails real-time constraints, necessary to provide these services with acceptable levels of quality. For those reasons, efficient routing protocols are needed, capable of providing enough reliability to the network and with the minimum impact to the quality of the service flowing through the nodes. Regarding quality measurements, the current trend is estimating what the end user actually perceives when consuming the service. This paradigm is called Quality of user Experience (QoE) and differs from the traditional Quality of Service (QoS) approach in the human perspective given to quality estimations. In order to measure the subjective opinion that a user has about a given service, different approaches can be taken. The most accurate methodology is performing subjective tests in which a panel of human testers rates the quality of the service under evaluation. This approach returns a quality score, so-called Mean Opinion Score (MOS), for the considered service in a scale 1 - 5. This methodology presents several drawbacks such as its high expenses and the impossibility of performing tests at real time. For those reasons, several mathematical models have been presented in order to provide an estimation of the QoE (MOS) reached by different multimedia services In this thesis, the focus is on evaluating and understanding the multimedia-content transmission-process in wireless networks from a QoE perspective. To this end, firstly, the QoE paradigm is explored aiming at understanding how to evaluate the quality of a given multimedia service. Then, the influence of the impairments introduced by the wireless transmission channel on the multimedia communications is analyzed. Besides, the functioning of different WLAN schemes in order to test their suitability to support highly demanding traffic such as the multimedia transmission is evaluated. Finally, as the main contribution of this thesis, new mechanisms or strategies to improve the quality of multimedia services distributed over IEEE 802.11 networks are presented. Concretely, the distribution of multimedia services over ad-hoc networks is deeply studied. Thus, a novel opportunistic routing protocol, so-called JOKER (auto-adJustable Opportunistic acK/timEr-based Routing) is presented. This proposal permits better support to multimedia services while reducing the energy consumption in comparison with the standard ad-hoc routing protocols.Universidad Politécnica de CartagenaPrograma Oficial de Doctorado en Tecnologías de la Información y Comunicacione
    corecore