941 research outputs found
Subband modeling for spoofing detection in automatic speaker verification
Spectrograms - time-frequency representations of audio signals - have found widespread use in neural network-based spoofing detection. While deep models are trained on the fullband spectrum of the signal, we argue that not all frequency bands are useful for these tasks. In this paper, we systematically investigate the impact of different subbands and their importance on replay spoofing detection on two benchmark datasets: ASVspoof 2017 v2.0 and ASVspoof 2019 PA. We propose a joint subband modelling framework that employs n different sub-networks to learn subband specific features. These are later combined and passed to a classifier and the whole network weights are updated during training. Our findings on the ASVspoof 2017 dataset suggest that the most discriminative information appears to be in the first and the last 1 kHz frequency bands, and the joint model trained on these two subbands shows the best performance outperforming the baselines by a large margin. However, these findings do not generalise on the ASVspoof 2019 PA dataset. This suggests that the datasets available for training these models do not reflect real world replay conditions suggesting a need for careful design of datasets for training replay spoofing countermeasures
Correcting respirable photometric particulate measurements using a gravimetric sampling method
According to the National Environmental Management: Air Quality Act of 2004 people have the right to clean air and a healthy environment. Particulate matter (PM) emissions pose a significant health threat. Both indoor and ambient air pollution contribute to the burden of disease associated with poor air quality. This is particularly true within the South African setting where low income households make use of different solid fuels for heating and cooking purposes resulting in high levels of PM emissions. This paper focuses on the evaluation mass concentration measurements recorded by continuous photometric PM instruments within KwaDela, a low income settlement in Mpumalanga located on the South African Highveld. Thus, obtaining a photometric calibration factor for both the DustTrak Model 8530 and the SidePak AM510. Sampling took place during August 2014 for a period of seven days. The photometric and gravimetric instruments were collocated within the indoor environment of selected households. These instruments were all fitted with 10mm Dorr-Oliver Cyclone inlets to obtain the respirable (PM4) cut-point. The study found that both instruments tend to overestimate the indoor particulate mass concentrations when compared to the reference gravimetric method. The estimated photometric calibration factors for the DustTrak Model 8530 and SidePak AM510 are 0.14 (95%Cl: 0.09, 0.15) and 0.24 (95%Cl: 0.16, 0.30) respectively. The overestimation of the photometric measurements is rather significant. It is therefore important that the correction factors are applied to data collected in indoor environments prone to the combustion of solid fuels. The correction factors obtained from this and other studies vary as a result of the environment (ambient, indoor etc.) as well as the aerosol size fraction and the origin thereof. Thus, it is important to considered site specific calibration factors when implementing these photometric light-scattering instruments
Indoor and outdoor particulate matter concentrations on the Mpumalanga highveld – A case study
The household combustion of solid fuels, for the purpose of heating and cooking, is an activity practiced by many people in South Africa. Air pollution caused by the combustion of solid fuels in households has a significant influence on public health. People mostaffected are those considered to be the poorest, living in low-income settlements, where burning solid fuel is the primary source of energy. Insufficient data has been collected in South Africa to quantify the concentrations of particulate emissions that peopleare exposed to, especially the respirable fraction, associated with the combustion of solid fuels. The aim of this paper is to gain an understanding of the particulate matter (PM) concentrations a person living in a typical household in a low income settlement in theSouth African Highveld is exposed to. It also seeks to demonstrate that the use of solid fuels in the household can lead to indoor air pollution concentrations reaching levels very similar to ambient PM concentrations, which could be well in excess of the NationalAmbient Air Quality Standards, representing a major national public health threat. A mobile monitoring station was used in KwaDela, Mpumalanga to measure both ambient particulate concentrations and meteorological conditions, while a range of dust/particulate monitors were used for indoor and personal particulate concentration measurements. Indoor and personal measurements are limited to the respirable fraction (PM4) as this fraction contributes significantly to the negative health impacts. The sampling for this case study took place from 7-19 August 2014. Highest particulate matter concentrations were evident during the early mornings and the early evenings, when solid fuel burning activities were at their highest. Indoor and personal daily average PM4 concentrations did not exceed the 24h National Ambient PM2.5 Standard of 65 ÎĽg/m3 nor did they exceed the 24h National Ambient PM10 Standard of 75 ÎĽg/ m3. The outdoor PM2.5 concentrations were found to be below the standards for the duration of the sampling period. The outdoor PM10 concentrations exceeded the standards for one day during the sampling period. Results indicate that, although people in KwaDelamay be exposed to ambient PM concentrations that can be non-compliant to ambient standards, the exposure to indoor air, where solid fuel is burnt, may be detrimental to their health
Neural Mention Detection
Mention detection is an important preprocessing step for annotation and interpretation in applications such as NER and coreference resolution, but few stand-alone neural models have been proposed able to handle the full range of mentions. In this work, we propose and compare three neural network-based approaches to mention detection. The first approach is based on the mention detection part of a state of the art coreference resolution system; the second uses ELMO embeddings together with a bidirectional LSTM and a biaffine classifier; the third approach uses the recently introduced BERT model. Our best model (using a biaffine classifier) achieves gains of up to 1.8 percentage points on mention recall when compared with a strong baseline in a HIGH RECALL coreference annotation setting. The same model achieves improvements of up to 5.3 and 6.2 p.p. when compared with the best-reported mention detection F1 on the CONLL and CRAC coreference data sets respectively in a HIGH F1 annotation setting. We then evaluate our models for coreference resolution by using mentions predicted by our best model in start-of-the-art coreference systems. The enhanced model achieved absolute improvements of up to 1.7 and 0.7 p.p. when compared with our strong baseline systems (pipeline system and end-to-end system) respectively. For nested NER, the evaluation of our model on the GENIA corpora shows that our model matches or outperforms state-of-the-art models despite not being specifically designed for this task
Recommended from our members
Conceptualising quality of life for older people with aphasia
Background: There is an increasing need in speech and language therapy for clinicians to provide intervention in the context of the broader life quality issues for people with aphasia. However, there is no descriptive research that is explicitly focused on quality of life (QoL) from the perspectives of older people with aphasia.
Aims: The current study explores how older people with chronic aphasia who are living in the community describe their QoL in terms of what contributes to and detracts from the quality in their current and future lives. The study is descriptive in nature, and the purpose is to conceptualize the factors that influence QoL.
Methods & Procedures: Thirty older participants (16 women, 14 men) with mild to moderate aphasic impairment took part. All participants had adequate communication skills to participate: demonstrating reliable yes/no response and moderate auditory comprehension ability. Participants were interviewed in their own homes using six brief unprompted open questions about QoL, in a structured interview. The first five questions were drawn from previous gerontological research (Farquhar, 1995), and a sixth question specifically targeting communication was added. Content analysis was used, identifying discrete units of data and then coding these into concepts and factors. Additional demographic information was collected, and participants’ mood on day of interviewing was assessed using the Geriatric Depression Scale (Sheikh & Yesavage, 1986).
Outcomes & Results: Activities, verbal communication, people, and body functioning were the core factors in QoL for these participants, and they described how these factors both contributed quality in life as well as detracted from life quality. Other factors that influenced QoL included stroke, mobility, positive personal outlook, in/dependence, home and health. Whilst the findings are limited by the lack of probing of participants’ responses, the study does present preliminary evidence for what is important in QoL to older people with aphasia.
Conclusions: Quality of life for older people with predominantly mild to moderate chronic aphasia who are living in the community is multifactorial in nature. Some factors lie within the remit of speech and language therapy, some lie beyond the professional role, but all are relevant for consideration in rehabilitation and community practice. Further qualitative research is implicated to better understand QoL with aphasia, using in-depth interviewing with a broader range of people with aphasia
Analysing the predictions of a CNN-based replay spoofing detection system
Playing recorded speech samples of an enrolled speaker - "replay attack" - is a simple approach to bypass an automatic speaker verification (ASV) system. The vulnerability of ASV systems to such attacks has been acknowledged and studied, but there has been no research into what spoofing detection systems are actually learning to discriminate. In this paper, we analyse the local behaviour of a replay spoofing detection system based on convolutional neural networks (CNNs) adapted from a state-of-the-art CNN (LCNN-FFT) submitted at the ASVspoof 2017 challenge. We generate temporal and spectral explanations for predictions of the model using the SLIME algorithm. Our findings suggest that in most instances of spoofing the model is using information in the first 400 milliseconds of each audio instance to make the class prediction. Knowledge of the characteristics that spoofing detection systems are exploiting can help build less vulnerable ASV systems, other spoofing detection systems, as well as better evaluation databases
Bures and Statistical Distance for Squeezed Thermal States
We compute the Bures distance between two thermal squeezed states and deduce
the Statistical Distance metric. By computing the curvature of this metric we
can identify regions of parameter space most sensitive to changes in these
parameters and thus lead to optimum detection statistics.Comment: 15 pages, 1 figure (not included - obtain from Author) To appear in
Journal of Physics
Recommended from our members
Discriminating disorder from difference using dynamic assessment with bilingual children
The DAPPLE (Dynamic Assessment of Preschoolers’ Proficiency in Learning English) is currently being developed in response to a clinical need. Children exposed to English as an additional language may be referred to speech and language therapy because their proficiency in English is not the same as their monolingual peers. Some, but not all, of these children are likely to have a core language learning difficulty. Clinicians need to be able to distinguish disorder from difference due to a child’s language learning context. The assessment used a test–teach–test format to examine children’s ability to learn vocabulary, sentence structure and phonology. The assessment, which takes less than 60 minutes to administer, was given to 26 children who were bilingual: 12 currently on a speech and language therapy caseload and 14 children matched for age and socio-economic status who had never been referred to speech and language therapy. The DAPPLE data clearly discriminated the two groups. The caseload group required a greater amount of prompting to identify targeted words in the receptive vocabulary assessment and performed less well in the post-teaching expressive component. For sentence structure, the caseload group required more cues to acquire the targeted clause elements in the teaching phase. The caseload group made more phoneme errors at the initial and final assessments than the controls, and the type of errors made differed. Teaching resulted in greater positive change in percent phonemes correct for the caseload participants. Qualitative analyses of individual children’s performance on the DAPPLE suggested that it has the potential to discriminate core language deficits from difference due to a bilingual language learning context. Future directions for development of the test are considered
Recommended from our members
Early phonological and sociocognitive skills as predictors of later language and social communication outcomes
Background:  Previous studies of outcome for children with early language delay have focused on measures of early language as predictors of language outcome. This study investigates whether very early processing skills (VEPS) known to underpin language development will be better predictors of specific language and social communication outcomes than measures of language itself.
Method:  Participants were 163 children referred to clinical services with concerns about language at 2;6–3;6 years and followed up at 4–5 years. Novel assessments of phonological and sociocognitive processing were administered at Time 1 (T1), together with a standardised test of receptive and expressive language, and parental report of expressive vocabulary. The language test was re-administered at Time 2 (T2), together with assessments of morphosyntax and parental reports of social communication.
Results:  Intercorrelations at and between T1 and T2 were high, and dissociations were rare. Ordinal regressions were run, entering predictors singly and simultaneously. With the exception of the phonological task, every early measure on its own was significantly predictive of most outcomes, and receptive language was the strongest all-round predictor. Results of simultaneous entry, controlling for the effect of other predictors, showed that early language was the strongest predictor of general language outcome, but early phonology was the strongest predictor of a measure of morphosyntax, and early sociocognition the strongest predictor of social communication.
Conclusions:  Language measures which draw on a wide range of skills were the strongest overall predictors of general language outcomes. However, our VEPS measures were stronger predictors of specific outcomes. The clinical and theoretical implications of these findings are discussed
- …