270 research outputs found
Content-prioritised video coding for British Sign Language communication.
Video communication of British Sign Language (BSL) is important for remote interpersonal communication and for the equal provision of services for deaf people. However, the use of video telephony and video conferencing applications for BSL communication is limited by inadequate video quality. BSL is a highly structured, linguistically complete, natural language system that expresses vocabulary and grammar visually and spatially using a complex combination of facial expressions (such as eyebrow movements, eye blinks and mouth/lip shapes), hand gestures, body movements and finger-spelling that change in space and time. Accurate natural BSL communication places specific demands on visual media applications which must compress video image data for efficient transmission. Current video compression schemes apply methods to reduce statistical redundancy and perceptual irrelevance in video image data based on a general model of Human Visual System (HVS) sensitivities. This thesis presents novel video image coding methods developed to achieve the conflicting requirements for high image quality and efficient coding. Novel methods of prioritising visually important video image content for optimised video coding are developed to exploit the HVS spatial and temporal response mechanisms of BSL users (determined by Eye Movement Tracking) and the characteristics of BSL video image content. The methods implement an accurate model of HVS foveation, applied in the spatial and temporal domains, at the pre-processing stage of a current standard-based system (H.264). Comparison of the performance of the developed and standard coding systems, using methods of video quality evaluation developed for this thesis, demonstrates improved perceived quality at low bit rates. BSL users, broadcasters and service providers benefit from the perception of high quality video over a range of available transmission bandwidths. The research community benefits from a new approach to video coding optimisation and better understanding of the communication needs of deaf people
Semi-synchronous video for deaf telephony with an adapted synchronous codec
Magister Scientiae - MScCommunication tools such as text-based instant messaging, voice and video relay services, real-time video chat and mobile SMS and MMS have successfully been used among Deaf people. Several years of field research with a local Deaf community revealed that disadvantaged South African Deaf people preferred to communicate with both Deaf and hearing peers in South African Sign Language as opposed to text. Synchronous video chat and video relay services provided such opportunities. Both types of services are commonly available in developed regions, but not in developing countries like South Africa. This thesis reports on a workaround approach to design and develop an asynchronous video communication tool that adapted synchronous video codecs to store-and-forward video delivery. This novel asynchronous video tool provided high quality South African Sign Language video chat at the expense of some additional latency. Synchronous video codec adaptation consisted of comparing codecs, and choosing one to optimise in order to minimise latency and preserve video quality. Traditional quality of service metrics only addressed real-time video quality and related services. There was no such standard for asynchronous video communication. Therefore, we also enhanced traditional objective video quality metrics with subjective assessment metrics conducted with the local Deaf community.South Afric
ΠΡΠ΅Π΄ΠΎΡΡΠ°Π²Π»Π΅Π½ΠΈΠ΅ ΠΌΠ΅Π΄ΠΈΡΠΈΠ½ΡΠΊΠΈΡ ΡΡΠ»ΡΠ³ Π½Π° ΡΠ°ΡΡΡΠΎΡΠ½ΠΈΠΈ Π΄Π»Ρ Π΄ΠΈΡΡΠ°Π½ΡΠΈΠΎΠ½Π½ΠΎΠΉ ΠΎΡΠ΅Π½ΠΊΠΈ ΡΠ»ΡΡ Π° ΠΈ Π»ΠΎΠ³ΠΎΠΏΠ΅Π΄ΠΈΠΈ Π³Π»ΡΡ ΠΈΠΌ Π»ΡΠ΄ΡΠΌ ΠΏΠΎΡΠ»Π΅ ΠΊΠΎΡ Π»Π΅Π°ΡΠ½ΠΎΠΉ ΠΈΠΌΠΏΠ»Π°Π½ΡΠ°ΡΠΈΠΈ
Π ΡΡΠΎΠΉ ΡΡΠ°ΡΡΠ΅ ΠΏΡΠ΅Π΄ΡΡΠ°Π²Π»Π΅Π½Π° ΠΊΠΎΠΌΠΏΠ»Π΅ΠΊΡΠ½Π°Ρ ΡΠ΅ΡΡΠΎΠ²Π°Ρ ΡΠΈΡΡΠ΅ΠΌΠ° Π΄ΠΈΡΡΠ°Π½ΡΠΈΠΎΠ½Π½ΠΎΠΉ ΡΠ΅ΡΠ΅Π²ΠΎΠΉ ΡΠ΅ΡΠ°ΠΏΠΈΠΈ, ΠΊΠΎΡΠΎΡΠ°Ρ ΠΎΠ±ΡΠ΅Π΄ΠΈΠ½ΡΠ΅Ρ ΡΠ΅ΡΠ΅Π²ΠΎΠΉ ΡΠ΅ΡΡ Π΄Π»Ρ ΠΏΠ°ΡΠΈΠ΅Π½ΡΠΎΠ² Ρ ΠΈΠΌΠΏΠ»Π°Π½ΡΠ°ΡΠ°ΠΌΠΈ CI, ΠΊΠΎΡΠΎΡΡΠ΅ Π½Π°ΠΏΡΡΠΌΡΡ ΡΠ²ΡΠ·Π°Π½Ρ Ρ ΠΊΠΎΠΌΠΏΡΡΡΠ΅ΡΠΎΠΌ Ρ ΡΡΡΡΠΎΠΉΡΡΠ²ΠΎΠΌ AUX ΠΈΠ»ΠΈ Bluetooth. ΠΡΠΏΠΎΠ»ΡΠ·ΡΡ ΠΏΠΎΠ΄Ρ
ΠΎΠ΄ ΡΠ΅Π»Π΅Π·ΡΠΈΡΠ΅Π»Π΅ΠΉ, ΠΌΡ ΠΌΠΎΠΆΠ΅ΠΌ ΠΈΠ·Π±Π΅ΠΆΠ°ΡΡ ΡΡΠ°ΡΡΠΈΡ Π»ΠΎΠ³ΠΎΠΏΠ΅Π΄Π°, ΡΡΡΠ΄ΠΎΠ»ΠΎΠ³Π°, ΠΈΠ»ΠΈ ΡΠΌΠ΅Π½ΡΡΠΈΡΡ Π½Π°Π³ΡΡΠ·ΠΊΡ Π½Π° Π»ΠΎΠ³ΠΎΠΏΠ΅Π΄Π°.
ΠΡΠ΅ΠΈΠΌΡΡΠ΅ΡΡΠ²ΠΎ ΡΡΠΎΠΉ ΡΠ°Π±ΠΎΡΡ Π·Π°ΠΊΠ»ΡΡΠ°Π΅ΡΡΡ Π² ΡΠΎΠΌ, ΡΡΠΎΠ±Ρ ΡΠ²ΡΠ·Π°ΡΡ ΠΎΠ΄Π½ΠΎΠ³ΠΎ Π²ΡΠ°ΡΠ° Ρ ΠΎΠ΄Π½ΠΈΠΌ ΠΏΠ°ΡΠΈΠ΅Π½ΡΠΎΠΌ ΠΈΠ»ΠΈ ΠΌΠ½ΠΎΠΆΠ΅ΡΡΠ²ΠΎΠΌ ΠΏΠ°ΡΠΈΠ΅Π½ΡΠΎΠ², ΠΌΠ½ΠΎΠ³ΠΎ Π²ΡΠ°ΡΠ΅ΠΉ Ρ ΠΌΠ½ΠΎΠΆΠ΅ΡΡΠ²ΠΎΠΌ ΠΏΠ°ΡΠΈΠ΅Π½ΡΠΎΠ², Π° ΡΠ°ΠΊΠΆΠ΅ ΠΏΡΠ΅Π΄ΠΎΡΡΠ°Π²Π»ΡΡΡ ΡΡΠ»ΡΠ³ΠΈ Π² ΠΎΡΠ΄Π°Π»Π΅Π½Π½ΡΡ
ΡΠ°ΠΉΠΎΠ½Π°Ρ
Π΄Π»Ρ ΡΠ»ΡΡΡΠ΅Π½ΠΈΡ Π·Π΄ΡΠ°Π²ΠΎΠΎΡ
ΡΠ°Π½Π΅Π½ΠΈΡ.This article presents a comprehensive remote Speech Therapy test system that integrates a speech test for CI implants patients which are directly connected with a computer with AUX or Bluetooth device. By using telehealth approach we can eliminate speech therapist or speech-language pathologist or may we can reduce the load on speech therapist.
Advantage of this work is to connect number of patients with one doctor or number of doctors like one to one or one to many or many to many to provide services in remote area to improve health care
An Investigation Into the Feasibility of Streamlining Language Sample Analysis Through Computer-Automated Transcription and Scoring
The purpose of the study was to investigate the feasibility of streamlining the transcription and scoring portion of language sample analysis (LSA) through computer-automation. LSA is a gold-standard procedure for examining childrensβ language abilities that is underutilized by speech language pathologists due to its time-consuming nature. To decrease the time associated with the process, the accuracy of transcripts produced automatically with Google Cloud Speech and the accuracy of scores generated by a hard-coded scoring function called the Literate Language Use in Narrative Analysis (LLUNA) were evaluated. A collection of narrative transcripts and audio recordings of narrative samples were selected to evaluate the accuracy of these automated systems. Samples were previously elicited from school-age children between the ages of 6;0-11;11 who were either typically developing (TD), at-risk for language-related learning disabilities (AR), or had developmental language disorder (DLD). Transcription error of Google Cloud Speech transcripts was evaluated with a weighted word-error rate (WERw). Score accuracy was evaluated with a quadratic weighted kappa (Kqw). Results indicated an average WERw of 48% across all language sample recordings, with a median WERw of 40%. Several recording characteristics of samples were associated with transcription error including the codec used to recorded the audio sample and the presence of background noise. Transcription error was lower on average for samples collected using a lossless codec, that contained no background noise. Scoring accuracy of LLUNA was high across all six measures of literate language when generated from traditionally produced transcripts, regardless of age or language ability (TD, DLD, AR). Adverbs were most variable in their score accuracy. Scoring accuracy dropped when LLUNA generated scores from transcripts produced by Google Cloud Speech, however, LLUNA was more likely to generate accurate scores when transcripts had low to moderate levels of transcription error. This work provides additional support for the use of automated transcription under the right recording conditions and automated scoring of literate language indices. It also provides preliminary support for streamlining the entire LSA process by automating both transcription and scoring, when high quality recordings of language samples are utilized
The Effect of Narrow-Band Transmission on Recognition of Paralinguistic Information From Human Vocalizations
Practically, no knowledge exists on the effects of speech coding and recognition for narrow-band transmission of speech signals within certain frequency ranges especially in relation to the recognition of paralinguistic cues in speech. We thus investigated the impact of narrow-band standard speech coders on the machine-based classification of affective vocalizations and clinical vocal recordings. In addition, we analyzed the effect of speech low-pass filtering by a set of different cut-off frequencies, either chosen as static values in the 0.5-5-kHz range or given dynamically by different upper limits from the first five speech formants (F1-F5). Speech coding and recognition were tested, first, according to short-term speaker states by using affective vocalizations as given by the Geneva Multimodal Emotion Portrayals. Second, in relation to long-term speaker traits, we tested vocal recording from clinical populations involving speech impairments as found in the Child Pathological Speech Database. We employ a large acoustic feature space derived from the Interspeech Computational Paralinguistics Challenge. Besides analysis of the sheer corruption outcome, we analyzed the potential of matched and multicondition training as opposed to miss-matched condition. In the results, first, multicondition and matched-condition training significantly increase performances as opposed to mismatched condition. Second, downgrades in classification accuracy occur, however, only at comparably severe levels of low-pass filtering. The downgrades especially appear for multi-categorical rather than for binary decisions. These can be dealt with reasonably by the alluded strategies
- β¦