270 research outputs found

    Perceptually optimised sign language video coding

    Get PDF

    Content-prioritised video coding for British Sign Language communication.

    Get PDF
    Video communication of British Sign Language (BSL) is important for remote interpersonal communication and for the equal provision of services for deaf people. However, the use of video telephony and video conferencing applications for BSL communication is limited by inadequate video quality. BSL is a highly structured, linguistically complete, natural language system that expresses vocabulary and grammar visually and spatially using a complex combination of facial expressions (such as eyebrow movements, eye blinks and mouth/lip shapes), hand gestures, body movements and finger-spelling that change in space and time. Accurate natural BSL communication places specific demands on visual media applications which must compress video image data for efficient transmission. Current video compression schemes apply methods to reduce statistical redundancy and perceptual irrelevance in video image data based on a general model of Human Visual System (HVS) sensitivities. This thesis presents novel video image coding methods developed to achieve the conflicting requirements for high image quality and efficient coding. Novel methods of prioritising visually important video image content for optimised video coding are developed to exploit the HVS spatial and temporal response mechanisms of BSL users (determined by Eye Movement Tracking) and the characteristics of BSL video image content. The methods implement an accurate model of HVS foveation, applied in the spatial and temporal domains, at the pre-processing stage of a current standard-based system (H.264). Comparison of the performance of the developed and standard coding systems, using methods of video quality evaluation developed for this thesis, demonstrates improved perceived quality at low bit rates. BSL users, broadcasters and service providers benefit from the perception of high quality video over a range of available transmission bandwidths. The research community benefits from a new approach to video coding optimisation and better understanding of the communication needs of deaf people

    Semi-synchronous video for deaf telephony with an adapted synchronous codec

    Get PDF
    Magister Scientiae - MScCommunication tools such as text-based instant messaging, voice and video relay services, real-time video chat and mobile SMS and MMS have successfully been used among Deaf people. Several years of field research with a local Deaf community revealed that disadvantaged South African Deaf people preferred to communicate with both Deaf and hearing peers in South African Sign Language as opposed to text. Synchronous video chat and video relay services provided such opportunities. Both types of services are commonly available in developed regions, but not in developing countries like South Africa. This thesis reports on a workaround approach to design and develop an asynchronous video communication tool that adapted synchronous video codecs to store-and-forward video delivery. This novel asynchronous video tool provided high quality South African Sign Language video chat at the expense of some additional latency. Synchronous video codec adaptation consisted of comparing codecs, and choosing one to optimise in order to minimise latency and preserve video quality. Traditional quality of service metrics only addressed real-time video quality and related services. There was no such standard for asynchronous video communication. Therefore, we also enhanced traditional objective video quality metrics with subjective assessment metrics conducted with the local Deaf community.South Afric

    ΠŸΡ€Π΅Π΄ΠΎΡΡ‚Π°Π²Π»Π΅Π½ΠΈΠ΅ мСдицинских услуг Π½Π° расстоянии для дистанционной ΠΎΡ†Π΅Π½ΠΊΠΈ слуха ΠΈ Π»ΠΎΠ³ΠΎΠΏΠ΅Π΄ΠΈΠΈ Π³Π»ΡƒΡ…ΠΈΠΌ людям послС ΠΊΠΎΡ…Π»Π΅Π°Ρ€Π½ΠΎΠΉ ΠΈΠΌΠΏΠ»Π°Π½Ρ‚Π°Ρ†ΠΈΠΈ

    Get PDF
    Π’ этой ΡΡ‚Π°Ρ‚ΡŒΠ΅ прСдставлСна комплСксная тСстовая систСма дистанционной Ρ€Π΅Ρ‡Π΅Π²ΠΎΠΉ Ρ‚Π΅Ρ€Π°ΠΏΠΈΠΈ, которая ΠΎΠ±ΡŠΠ΅Π΄ΠΈΠ½ΡΠ΅Ρ‚ Ρ€Π΅Ρ‡Π΅Π²ΠΎΠΉ тСст для ΠΏΠ°Ρ†ΠΈΠ΅Π½Ρ‚ΠΎΠ² с ΠΈΠΌΠΏΠ»Π°Π½Ρ‚Π°Ρ‚Π°ΠΌΠΈ CI, ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Π΅ Π½Π°ΠΏΡ€ΡΠΌΡƒΡŽ связаны с ΠΊΠΎΠΌΠΏΡŒΡŽΡ‚Π΅Ρ€ΠΎΠΌ с устройством AUX ΠΈΠ»ΠΈ Bluetooth. Π˜ΡΠΏΠΎΠ»ΡŒΠ·ΡƒΡ ΠΏΠΎΠ΄Ρ…ΠΎΠ΄ Ρ‚Π΅Π»Π΅Π·Ρ€ΠΈΡ‚Π΅Π»Π΅ΠΉ, ΠΌΡ‹ ΠΌΠΎΠΆΠ΅ΠΌ ΠΈΠ·Π±Π΅ΠΆΠ°Ρ‚ΡŒ участия Π»ΠΎΠ³ΠΎΠΏΠ΅Π΄Π°, сурдолога, ΠΈΠ»ΠΈ ΡƒΠΌΠ΅Π½ΡŒΡˆΠΈΡ‚ΡŒ Π½Π°Π³Ρ€ΡƒΠ·ΠΊΡƒ Π½Π° Π»ΠΎΠ³ΠΎΠΏΠ΅Π΄Π°. ΠŸΡ€Π΅ΠΈΠΌΡƒΡ‰Π΅ΡΡ‚Π²ΠΎ этой Ρ€Π°Π±ΠΎΡ‚Ρ‹ Π·Π°ΠΊΠ»ΡŽΡ‡Π°Π΅Ρ‚ΡΡ Π² Ρ‚ΠΎΠΌ, Ρ‡Ρ‚ΠΎΠ±Ρ‹ ΡΠ²ΡΠ·Π°Ρ‚ΡŒ ΠΎΠ΄Π½ΠΎΠ³ΠΎ Π²Ρ€Π°Ρ‡Π° с ΠΎΠ΄Π½ΠΈΠΌ ΠΏΠ°Ρ†ΠΈΠ΅Π½Ρ‚ΠΎΠΌ ΠΈΠ»ΠΈ мноТСством ΠΏΠ°Ρ†ΠΈΠ΅Π½Ρ‚ΠΎΠ², ΠΌΠ½ΠΎΠ³ΠΎ Π²Ρ€Π°Ρ‡Π΅ΠΉ с мноТСством ΠΏΠ°Ρ†ΠΈΠ΅Π½Ρ‚ΠΎΠ², Π° Ρ‚Π°ΠΊΠΆΠ΅ ΠΏΡ€Π΅Π΄ΠΎΡΡ‚Π°Π²Π»ΡΡ‚ΡŒ услуги Π² ΠΎΡ‚Π΄Π°Π»Π΅Π½Π½Ρ‹Ρ… Ρ€Π°ΠΉΠΎΠ½Π°Ρ… для ΡƒΠ»ΡƒΡ‡ΡˆΠ΅Π½ΠΈΡ здравоохранСния.This article presents a comprehensive remote Speech Therapy test system that integrates a speech test for CI implants patients which are directly connected with a computer with AUX or Bluetooth device. By using telehealth approach we can eliminate speech therapist or speech-language pathologist or may we can reduce the load on speech therapist. Advantage of this work is to connect number of patients with one doctor or number of doctors like one to one or one to many or many to many to provide services in remote area to improve health care

    An Investigation Into the Feasibility of Streamlining Language Sample Analysis Through Computer-Automated Transcription and Scoring

    Get PDF
    The purpose of the study was to investigate the feasibility of streamlining the transcription and scoring portion of language sample analysis (LSA) through computer-automation. LSA is a gold-standard procedure for examining childrens’ language abilities that is underutilized by speech language pathologists due to its time-consuming nature. To decrease the time associated with the process, the accuracy of transcripts produced automatically with Google Cloud Speech and the accuracy of scores generated by a hard-coded scoring function called the Literate Language Use in Narrative Analysis (LLUNA) were evaluated. A collection of narrative transcripts and audio recordings of narrative samples were selected to evaluate the accuracy of these automated systems. Samples were previously elicited from school-age children between the ages of 6;0-11;11 who were either typically developing (TD), at-risk for language-related learning disabilities (AR), or had developmental language disorder (DLD). Transcription error of Google Cloud Speech transcripts was evaluated with a weighted word-error rate (WERw). Score accuracy was evaluated with a quadratic weighted kappa (Kqw). Results indicated an average WERw of 48% across all language sample recordings, with a median WERw of 40%. Several recording characteristics of samples were associated with transcription error including the codec used to recorded the audio sample and the presence of background noise. Transcription error was lower on average for samples collected using a lossless codec, that contained no background noise. Scoring accuracy of LLUNA was high across all six measures of literate language when generated from traditionally produced transcripts, regardless of age or language ability (TD, DLD, AR). Adverbs were most variable in their score accuracy. Scoring accuracy dropped when LLUNA generated scores from transcripts produced by Google Cloud Speech, however, LLUNA was more likely to generate accurate scores when transcripts had low to moderate levels of transcription error. This work provides additional support for the use of automated transcription under the right recording conditions and automated scoring of literate language indices. It also provides preliminary support for streamlining the entire LSA process by automating both transcription and scoring, when high quality recordings of language samples are utilized

    The Effect of Narrow-Band Transmission on Recognition of Paralinguistic Information From Human Vocalizations

    No full text
    Practically, no knowledge exists on the effects of speech coding and recognition for narrow-band transmission of speech signals within certain frequency ranges especially in relation to the recognition of paralinguistic cues in speech. We thus investigated the impact of narrow-band standard speech coders on the machine-based classification of affective vocalizations and clinical vocal recordings. In addition, we analyzed the effect of speech low-pass filtering by a set of different cut-off frequencies, either chosen as static values in the 0.5-5-kHz range or given dynamically by different upper limits from the first five speech formants (F1-F5). Speech coding and recognition were tested, first, according to short-term speaker states by using affective vocalizations as given by the Geneva Multimodal Emotion Portrayals. Second, in relation to long-term speaker traits, we tested vocal recording from clinical populations involving speech impairments as found in the Child Pathological Speech Database. We employ a large acoustic feature space derived from the Interspeech Computational Paralinguistics Challenge. Besides analysis of the sheer corruption outcome, we analyzed the potential of matched and multicondition training as opposed to miss-matched condition. In the results, first, multicondition and matched-condition training significantly increase performances as opposed to mismatched condition. Second, downgrades in classification accuracy occur, however, only at comparably severe levels of low-pass filtering. The downgrades especially appear for multi-categorical rather than for binary decisions. These can be dealt with reasonably by the alluded strategies
    • …
    corecore