Search CORE

270 research outputs found

Recommended from our members

Feasibility of additional telephone relay services

Author: Barrett P.
Fleming S.
Floyd M.
Ofcom
Pechey B.
Pilling D.
Publication venue: Ofcom
Publication date
Field of study

City Research Online

Perceptually optimised sign language video coding

Author: Agrafiotis D
Bull DR
Canagarajah CN
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2003
Field of study

Explore Bristol Research

Ultimate Trends in Integrated Systems to Enhance Automatic Speech Recognition Performance

Author: C. Dur&#225
Publication venue: 'IntechOpen'
Publication date: 01/11/2008
Field of study

IntechOpen

Crossref

Content-prioritised video coding for British Sign Language communication.

Author: Muir Laura Joy
Publication venue
Publication date: 31/10/2007
Field of study

Video communication of British Sign Language (BSL) is important for remote interpersonal communication and for the equal provision of services for deaf people. However, the use of video telephony and video conferencing applications for BSL communication is limited by inadequate video quality. BSL is a highly structured, linguistically complete, natural language system that expresses vocabulary and grammar visually and spatially using a complex combination of facial expressions (such as eyebrow movements, eye blinks and mouth/lip shapes), hand gestures, body movements and finger-spelling that change in space and time. Accurate natural BSL communication places specific demands on visual media applications which must compress video image data for efficient transmission. Current video compression schemes apply methods to reduce statistical redundancy and perceptual irrelevance in video image data based on a general model of Human Visual System (HVS) sensitivities. This thesis presents novel video image coding methods developed to achieve the conflicting requirements for high image quality and efficient coding. Novel methods of prioritising visually important video image content for optimised video coding are developed to exploit the HVS spatial and temporal response mechanisms of BSL users (determined by Eye Movement Tracking) and the characteristics of BSL video image content. The methods implement an accurate model of HVS foveation, applied in the spatial and temporal domains, at the pre-processing stage of a current standard-based system (H.264). Comparison of the performance of the developed and standard coding systems, using methods of video quality evaluation developed for this thesis, demonstrates improved perceived quality at low bit rates. BSL users, broadcasters and service providers benefit from the perception of high quality video over a range of available transmission bandwidths. The research community benefits from a new approach to video coding optimisation and better understanding of the communication needs of deaf people

Open Access Institutional Repository at Robert Gordon University

Semi-synchronous video for deaf telephony with an adapted synchronous codec

Author: Ma Zhenyu
Publication venue: 'University of the Western Cape Library Service'
Publication date: 01/01/2009
Field of study

Magister Scientiae - MScCommunication tools such as text-based instant messaging, voice and video relay services, real-time video chat and mobile SMS and MMS have successfully been used among Deaf people. Several years of field research with a local Deaf community revealed that disadvantaged South African Deaf people preferred to communicate with both Deaf and hearing peers in South African Sign Language as opposed to text. Synchronous video chat and video relay services provided such opportunities. Both types of services are commonly available in developed regions, but not in developing countries like South Africa. This thesis reports on a workaround approach to design and develop an asynchronous video communication tool that adapted synchronous video codecs to store-and-forward video delivery. This novel asynchronous video tool provided high quality South African Sign Language video chat at the expense of some additional latency. Synchronous video codec adaptation consisted of comparing codecs, and choosing one to optimise in order to minimise latency and preserve video quality. Traditional quality of service metrics only addressed real-time video quality and related services. There was no such standard for asynchronous video communication. Therefore, we also enhanced traditional objective video quality metrics with subjective assessment metrics conducted with the local Deaf community.South Afric

UWC Theses and Dissertations

Предоставление медицинских услуг на расстоянии для дистанционной оценки слуха и логопедии глухим людям после кохлеарной имплантации

Author: Ариф Саед Мухаммад Умаир
Publication venue
Publication date: 01/01/2018
Field of study

В этой статье представлена комплексная тестовая система дистанционной речевой терапии, которая объединяет речевой тест для пациентов с имплантатами CI, которые напрямую связаны с компьютером с устройством AUX или Bluetooth. Используя подход телезрителей, мы можем избежать участия логопеда, сурдолога, или уменьшить нагрузку на логопеда. Преимущество этой работы заключается в том, чтобы связать одного врача с одним пациентом или множеством пациентов, много врачей с множеством пациентов, а также предоставлять услуги в отдаленных районах для улучшения здравоохранения.This article presents a comprehensive remote Speech Therapy test system that integrates a speech test for CI implants patients which are directly connected with a computer with AUX or Bluetooth device. By using telehealth approach we can eliminate speech therapist or speech-language pathologist or may we can reduce the load on speech therapist. Advantage of this work is to connect number of patients with one doctor or number of doctors like one to one or one to many or many to many to provide services in remote area to improve health care

Electronic archive of Tomsk Polytechnic University

Multi-channel spectrograms for speech processing applications using deep learning methods

Author: Arias-Vergara T.
Klumpp P.
Nöth E.
Orozco-Arroyave J. R.
Schuster M.
Vasquez-Correa J. C.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Open Access LMU

An Investigation Into the Feasibility of Streamlining Language Sample Analysis Through Computer-Automated Transcription and Scoring

Author: Fox Carly
Publication venue: DigitalCommons@USU
Publication date: 01/12/2021
Field of study

The purpose of the study was to investigate the feasibility of streamlining the transcription and scoring portion of language sample analysis (LSA) through computer-automation. LSA is a gold-standard procedure for examining childrens’ language abilities that is underutilized by speech language pathologists due to its time-consuming nature. To decrease the time associated with the process, the accuracy of transcripts produced automatically with Google Cloud Speech and the accuracy of scores generated by a hard-coded scoring function called the Literate Language Use in Narrative Analysis (LLUNA) were evaluated. A collection of narrative transcripts and audio recordings of narrative samples were selected to evaluate the accuracy of these automated systems. Samples were previously elicited from school-age children between the ages of 6;0-11;11 who were either typically developing (TD), at-risk for language-related learning disabilities (AR), or had developmental language disorder (DLD). Transcription error of Google Cloud Speech transcripts was evaluated with a weighted word-error rate (WERw). Score accuracy was evaluated with a quadratic weighted kappa (Kqw). Results indicated an average WERw of 48% across all language sample recordings, with a median WERw of 40%. Several recording characteristics of samples were associated with transcription error including the codec used to recorded the audio sample and the presence of background noise. Transcription error was lower on average for samples collected using a lossless codec, that contained no background noise. Scoring accuracy of LLUNA was high across all six measures of literate language when generated from traditionally produced transcripts, regardless of age or language ability (TD, DLD, AR). Adverbs were most variable in their score accuracy. Scoring accuracy dropped when LLUNA generated scores from transcripts produced by Google Cloud Speech, however, LLUNA was more likely to generate accurate scores when transcripts had low to moderate levels of transcription error. This work provides additional support for the use of automated transcription under the right recording conditions and automated scoring of literate language indices. It also provides preliminary support for streamlining the entire LSA process by automating both transcription and scoring, when high quality recordings of language samples are utilized

DigitalCommons@USU

The Effect of Narrow-Band Transmission on Recognition of Paralinguistic Information From Human Vocalizations

Author: Fruhholz S
Marchi E
Schuller B
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Practically, no knowledge exists on the effects of speech coding and recognition for narrow-band transmission of speech signals within certain frequency ranges especially in relation to the recognition of paralinguistic cues in speech. We thus investigated the impact of narrow-band standard speech coders on the machine-based classification of affective vocalizations and clinical vocal recordings. In addition, we analyzed the effect of speech low-pass filtering by a set of different cut-off frequencies, either chosen as static values in the 0.5-5-kHz range or given dynamically by different upper limits from the first five speech formants (F1-F5). Speech coding and recognition were tested, first, according to short-term speaker states by using affective vocalizations as given by the Geneva Multimodal Emotion Portrayals. Second, in relation to long-term speaker traits, we tested vocal recording from clinical populations involving speech impairments as found in the Child Pathological Speech Database. We employ a large acoustic feature space derived from the Interspeech Computational Paralinguistics Challenge. Besides analysis of the sheer corruption outcome, we analyzed the potential of matched and multicondition training as opposed to miss-matched condition. In the results, first, multicondition and matched-condition training significantly increase performances as opposed to mismatched condition. Second, downgrades in classification accuracy occur, however, only at comparably severe levels of low-pass filtering. The downgrades especially appear for multi-categorical rather than for binary decisions. These can be dealt with reasonably by the alluded strategies

Spiral - Imperial College Digital Repository

ZORA

Design of a smartphone with a Digital Signal Processor

Author: Lecluse Joep
Publication venue
Publication date: 01/01/1996
Field of study

Repository TU/e

Pure OAI Repository