82 research outputs found

    A study of anticipatory coarticulation for French speakers and for Mandarin Chinese speakers

    Get PDF
    International audienceAnticipatory coarticulation is studied for two languages, French and Mandarin Chinese, within Vowel1 -Consonant -Vowel2 sequences (V1CV2 henceforth). EMMA data and acoustic signals were collected. The influences of V2 on V1 and of V2 on C are more specifically analyzed in this paper. The results showed that, for the French speakers, vowel V2 influenced the whole sequence V1CV2, while its influence was limited to the syllable CV2 for the Chinese speakers. This suggested that speech planning in French is managed beyond the size of a syllable, while the planning is limited within the syllabe for Chinese. The results demonstrated the impact of language based constraints on articulatory planning in speech

    Investigation of the effect of articulatory-based second language production learning on speech perception

    No full text
    International audienceThe effect of second language production training on perception has been previously explored, but it remains unclear whether such training by itself influences the perception of speech sounds. In previous work participants heard the correct pronunciation of the target while simultaneously undergoing production training, making it unclear what component of improvement was due to the production training alone. In the current study we have therefore modified our electromagnetic articulometer-based training system, which provides estimates of learner-specific head-corrected tongue positions for a target utterance in real time, to eliminate simultaneous presentation of audio stimuli. Japanese learners of the American English vowel /ae/ performed ABX perceptual testing on this vowel before and after the visually presented articulatory-based pronunciation training. We examined whether or not the production-driven pronunciation improvement also induces a change in the perception of the second language sounds

    Coverage based empirical modelling for EMS rescue system of Karachi (Pakistan)

    Get PDF
    Hitne slučajeve uglavnom rješava služba hitne pomoći (EMS). U većini se slučajeva radi o jednom pacijentu. Osnovni sustavi nisu u stanju reagirati kod hitnih slučajeva s više unesrećenih. EMS je u zadnje dvije godine morao prilagoditi svoje planove takvim slučajevima zbog sve većeg broja nesreća i terorističkih napada. Ali takvi bi se planovi mogli osujetiti smanjenim proračunom i izvorima financiranja, a i brojem ambulantnih kola i njihovim položajem/lokacijom, te strategijom njihova slanja, što sve utječe na EMS. Dodatni faktor je promjenljivost u broju slobodnih ambulantnih kola u različito vrijeme tijekom dana. U svrhu održavanja pokrivenosti potrebno je pripaziti i da su ambulantna kola uvijek u stanju pripravnosti. U radu predlažemo model optimalizacije u EMS-u koji će pomoći kod medicinskih postupaka u regiji Karachi, Pakistan (uporabom podataka iz dvije godine, 2010. i 2011.). Također smo proveli i empirijsku analizu vremena reagiranja ambulantnih kola, njihovo vrijeme dolaska u bolnicu i vrijeme provedeno u bolnici. Korištene su Google mape kako bi se olakšalo praćenje i analiziranje mjesta nesreće uz pomoć GPSa ili nekog drugog izvora informacija. Fizikalna simulacija i rezultati su korišteni kao dio procesa planiranja što pokazuje integritet i učinkovitost vremenske granice temeljene na hitnosti pacijenta u vrijeme poziva (Spasilačka ekipa 15).Emergency Medical Services (EMS) is a major source of handling emergencies. Most of the emergencies have one patient. The routine systems are not able to respond to emergencies in which there are many casualties. The mass-casualty disaster response and EMS services plans have become more popular in case of ordinary disasters and terrorist attacks over the past decades. But it might not be possible to construct such plans due to limited resources and budget. There may be such more factors including the number of ambulances deployed, their position/location, and dispatching strategies that affect the EMS system. One more factor is the variation in number of vacant ambulances at different time of the day. In order to sustain coverage, it is necessary to locate ambulances at the station in functional states. In this paper we proposed an optimization model dealing with EMS to assist the medical treatment in the region of Karachi, Pakistan (by using two years data from the year 2010 to 2011). We also conducted and estimated an empirical analysis of ambulance response times, travel times to a hospital and the time spent at the hospital. Google maps are used to facilitate EMS’s provider to view and analyse the entire scene of the accident with the help of GPS or other sources of information. Physical simulation and results are used as part of the planning process, which shows the integrity and efficiency of the time threshold based on the acuity of the patient at the time when the 15 call is made (Rescue 15)

    Robust Face Recognition System Based on a Multi-Views Face Database

    Get PDF
    In this chapter, we describe a new robust face recognition system base on a multi-views face database that derives some 3-D information from a set of face images. We attempt to build an approximately 3-D system for improving the performance of face recognition. Our objective is to provide a basic 3-D system for improving the performance of face recognition. The main goal of this vision system is 1) to minimize the hardware resources, 2) to obtain high success rates of identity verification, and 3) to cope with real-time constraints. Using the multi-views database, we address the problem of face recognition by evaluating the two methods PCA and ICA and comparing their relative performance. We explore the issues of subspace selection, algorithm comparison, and multi-views face recognition performance. In order to make full use of the multi-views property, we also propose a strategy of majority voting among the five views, which can improve the recognition rate. Experimental results show that ICA is a promising method among the many possible face recognition methods, and that the ICA algorithm with majority-voting is currently the best choice for our purposes

    A WEB-BASED CSCW SYSTEM FOR REMOTE SUBSTATION FAULT DIAGNOSIS

    Get PDF
    Relying on Computer Supporte

    RTA Analysis & Existing Modelling for Emergency Medical Service

    Get PDF
    Prevention of accidents is extremely difficult in absence of present situation analysis, as the aim to identify the incident locations and safety deficiency area is the key to work out the effective solution. To access the feasibility of using Geographic Information System (GIS) for mapping of incident locations with an existing data source is vital to estimate variation of RTAs (Road Traffic Accidents) pattern by interpolating. Generally, accident particulars like location, date, time, sex and suspect are included in GIS database. Here, Arc GIS (10.2.1) software is applied to identify incident locations in Karachi district. To reduce the accidents in particular area/study area and in order to sustain coverage for emergency response, there may be such more factors including the number of ambulances deployed, their position/location, and dispatching strategies that affect the EMS system, authors strictly recommended covering based probabilistic model for (Rescue-15) solving the problem of ambulance locations. GIS facilitates the respective authority to do assessment and to analyze the entire position of the accident with the help of GPS or additional sources of information while consequences are utilized as part of the preparation progression is based on the acuity of the patient in time

    Learning Speech Representation From Contrastive Token-Acoustic Pretraining

    Full text link
    For fine-grained generation and recognition tasks such as minimally-supervised text-to-speech (TTS), voice conversion (VC), and automatic speech recognition (ASR), the intermediate representations extracted from speech should serve as a "bridge" between text and acoustic information, containing information from both modalities. The semantic content is emphasized, while the paralinguistic information such as speaker identity and acoustic details should be de-emphasized. However, existing methods for extracting fine-grained intermediate representations from speech suffer from issues of excessive redundancy and dimension explosion. Contrastive learning is a good method for modeling intermediate representations from two modalities. However, existing contrastive learning methods in the audio field focus on extracting global descriptive information for downstream audio classification tasks, making them unsuitable for TTS, VC, and ASR tasks. To address these issues, we propose a method named "Contrastive Token-Acoustic Pretraining (CTAP)", which uses two encoders to bring phoneme and speech into a joint multimodal space, learning how to connect phoneme and speech at the frame level. The CTAP model is trained on 210k speech and phoneme text pairs, achieving minimally-supervised TTS, VC, and ASR. The proposed CTAP method offers a promising solution for fine-grained generation and recognition downstream tasks in speech processing

    speech and noise dual-stream spectrogram refine network with speech distortion loss for robust speech recognition

    Full text link
    In recent years, the joint training of speech enhancement front-end and automatic speech recognition (ASR) back-end has been widely used to improve the robustness of ASR systems. Traditional joint training methods only use enhanced speech as input for the backend. However, it is difficult for speech enhancement systems to directly separate speech from input due to the diverse types of noise with different intensities. Furthermore, speech distortion and residual noise are often observed in enhanced speech, and the distortion of speech and noise is different. Most existing methods focus on fusing enhanced and noisy features to address this issue. In this paper, we propose a dual-stream spectrogram refine network to simultaneously refine the speech and noise and decouple the noise from the noisy input. Our proposed method can achieve better performance with a relative 8.6% CER reduction
    corecore