15 research outputs found
Audiomate : a Python package for working with audio datasets
Machine learning tasks in the audio domain frequently require large datasets with training data. Over the last years, numerous datasets have been made available for various purposes, for example, (Snyder, Chen, & Povey, 2015) and (Ardila et al., 2019). Unfortunately, most of the datasets are stored in widely differing formats. As a consequence, machine learning practitioners have to convert datasets into other formats before they can be used or combined. Furthermore, common tasks like reading, partitioning, or shuffling of datasets have to be developed over and over again for each format and require intimate knowledge of the formats. We purpose Audiomate, a Python toolkit, to solve this problem. Audiomate provides a uniform programming interface to work with numerous datasets. Knowledge about the structure or on-disk format of the datasets is not necessary. Audiomate
facilitates and simplifies a wide range of tasks:
• Reading and writing of numerous dataset formats using a uniform programming interface, for example (Snyder et al., 2015), (Panayotov, Chen, Povey, & Khudanpur, 2015) and (Ardila et al., 2019)
• Accessing metadata, like speaker information and labels
• Reading audio data (single files, batches of files)
• Retrieval of information about the data (e.g., number of speakers, total duration).
• Merging of multiple datasets (e.g., combine two speech datasets).
• Splitting data into smaller subsets (e.g., create training, validation, and test sets with a reasonable distribution of classes).
• Validation of data for specific requirements (e.g., check whether all samples were assigned a label
Genome size rather than content might affect call properties in toads of three ploidy levels (Anura: Bufonidae: Bufo viridis subgroup)
In vertebrates, genome size has been shown to correlate with nuclear and cell sizes, and influences phenotypic features, such as brain complexity. In three different anuran families, advertisement calls of polyploids exhibit longer notes and intervals than diploids, and difference in cellular dimensions have been hypothesized to cause these modifications. We investigated this phenomenon in green toads (Bufo viridis subgroup) of three ploidy levels, in a different call type (release calls) that may evolve independently from advertisement calls, examining 1205 calls, from ten species, subspecies, and hybrid forms. Significant differences between pulse rates of six diploid and four polyploid (3n, 4n) green toad forms across a range of temperatures from 7 to 27 °C were found. Laboratory data supported differences in pulse rates of triploids vs. tetraploids, but failed to reach significance when including field recordings. This study supports the idea that genome size, irrespective of call type, phylogenetic context, and geographical background, might affect call properties in anurans and suggests a common principle governing this relationship. The nuclear-cell size ratio, affected by genome size, seems the most plausible explanation. However, we cannot rule out hypotheses under which call-influencing genes from an unexamined diploid ancestral species might also affect call properties in the hybrid-origin polyploids
Older adults’ online information seeking and subjective well-being: The moderating role of internet skills
As increasing numbers of older adults incorporate the Internet into their lives, it is important to go beyond studying whether being an Internet user makes a difference for this population by examining how specific uses and skills relate to subjective well-being. This study examines the association between online information seeking and life satisfaction as one defining component of subjective well-being among 643 Swiss Internet users aged 60 and over. We find a positive relationship between online information seeking and older adults’ life satisfaction. Inspired by digital inequality research, we then explore whether this relationship is moderated by Internet skills. Results suggest that with increasing Internet skills, the association between online information seeking and life satisfaction gets stronger. We discuss the findings in light of both research on well-being and on digital inequality
ZHAW-InIT at GermEval 2020 task 4 : low-resource speech-to-text
This paper presents the contribution of ZHAW-InIT to Task 4 ”Low-Resource STT” at GermEval 2020. The goal of the task is to develop a system for translating Swiss German dialect speech into Standard German text in the domain of parliamentary debates. Our approach is based on Jasper, a CNN Acoustic Model, which we fine-tune on the task data. We enhance the base system with an extended Language Model containing in-domain data and speed perturbation and run further experiments with post-processing. Our submission achieved first place with a final Word Error Rate of 40.29%
Speech recognition component for search-oriented conversational artificial intelligence
User experience is key to make a computer program successful. If the handling needs a lot of expertise, people will not use it. In an optimal scenario, the user does not need to learn new procedures to control a new application. Conversational agents try to achieve that by providing a user interface using natural language. With spoken natural language the interaction can be simplified even more.
In order to create a conversational agent with spoken natural language, a reliable speech recognition system is essential. In this work different aspects of automatic speech recognition (ASR), for the application with a conversational agent, are explored. The goal of the conversational agent is to support people in the process of legal research. It has to find the correct information based on the user’s input.
To train a speech recognition system, data is needed. In a first step, two different ways to collect text data are explored. The text is needed to record speech data. With a grammar-based approach, manually crafted rules are used to generate sentences. Since grammars are restricted in variation, neural question generation was evaluated to produce open questions from specific input texts. In a next step, the performance of ASR systems was tested on task- and domain-specific data, using data recorded based on the generated text. Due to restricted time and resources, data was recorded only from one speaker. Since there was not enough data for further experiments on task-specific scenarios, open source German datasets were used to implement and improve acoustic models for generic speech recognition.
In order to build a speech recognition component for a conversational agent, different aspects influence the final result. Text generation for training language models or collecting speech data still needs grammar-based approaches for reliable results. Neural question generation produces too many invalid samples. Nevertheless, text generated with grammars can be employed to record speech and train language models. With adaptation using specific language models, open source ASR systems achieve similar results or even outperform commercial systems. For data with a very specific structure open source systems can outperform commercial systems by about 30% word error rate absolutely. Furthermore, for the acoustic model different approaches are feasible. Hybrid systems and end-to-end systems achieve similar results, but the hybrid system is still slightly better. End-to-end systems make adaptation to domain specific use cases easier, since no phonetic transcriptions are needed. To go even further, an end-to-end system can be trained on character n-grams instead of only single characters. Models trained on predicting tokens, generated with byte pair encoding, perform similar to models based on single characters. With the integration of complex decoding strategies and language models, character-based models still perform better
Improving Problem-Solving Skills with Smart Personal Assistants: Insights from a Quasi Field Experiment
Problem-solving skills are considered one of the most important learning goals for life. Therefore, educational institutions should help learners gain these skills despite organizational and financial restrictions. Even though there exists a growing body of research about the design and use of Smart Personal Assistants, such as Google’s Assistant or Amazon’s Alexa, little is known about their ability to help learners improve their problem-solving skills. Using a mixed-method approach, we investigate the value of newly emerging Smart Personal Assistants to improve long-term problem solving skills with the help of a pre- and post-test quasi field experiment in a second grade class of a vocational business school in Switzerland. The results indicate that groups interacting with Smart Personal Assistants show significantly better problem-solving skills compared to learners using paper-based support explained by changing learning processes. Our study contributes to existing intelligent tutoring system and technology-enhanced scaffolding research
Disentangling Tinnitus Distress and Tinnitus Presence by Means of EEG Power Analysis
The present study investigated 24 individuals suffering from chronic tinnitus (TI) and 24 nonaffected controls (CO). We recorded resting-state EEG and collected psychometric data to obtain information about how chronic tinnitus experience affects the cognitive and emotional state of TI. The study was meant to disentangle TI with high distress from those who suffer less from persistent tinnitus based on both neurophysiological and behavioral data. A principal component analysis of psychometric data uncovers two distinct independent dimensions characterizing the individual tinnitus experience. These independent states are distress and presence, the latter is described as the perceived intensity of sound experience that increases with tinnitus duration devoid of any considerable emotional burden. Neuroplastic changes correlate with the two independent components. TI with high distress display increased EEG activity in the oscillatory range around 25 Hz (upper β-band) that agglomerates over frontal recording sites. TI with high presence show enhanced EEG signal strength in the δ-, α-, and lower γ-bands (30–40 Hz) over bilateral temporal and left perisylvian electrodes. Based on these differential patterns we suggest that the two dimensions, namely, distress and presence, should be considered as independent dimensions of chronic subjective tinnitus
Serosurveillance of Schmallenberg virus in Switzerland using bulk tank milk samples.
Infections with Schmallenberg virus (SBV), a novel Orthobunyavirus transmitted by biting midges, can cause abortions and malformations of newborns and severe symptoms in adults of domestic and wild ruminants. Understanding the temporal and spatial distribution of the virus in a certain territory is important for the control and prevention of the disease. In this study, seroprevalence of antibodies against SBV and the spatial spread of the virus was investigated in Swiss dairy cattle applying a milk serology technique on bulk milk samples. The seroprevalence in cattle herds was significantly higher in December 2012 (99.5%) compared to July 2012 (19.7%). This high between-herd seroprevalence in cattle herds was observed shortly after the first detection of viral infections. Milk samples originating from farms with seropositive animals taken in December 2012 (n=209; mean 160%) revealed significantly higher S/P% ratios than samples collected in July 2012 (n=48; mean 103.6%). This finding suggests a high within-herd seroprevalence in infected herds which makes testing of bulk tank milk samples for the identification farms with past exposures to SBV a sensitive method. It suggests also that within-herd transmission followed by seroconversion still occurred between July and December. In July 2012, positive bulk tank milk samples were mainly restricted to the western part of Switzerland whereas in December 2012, all samples except one were positive. A spatial analysis revealed a separation of regions with and without positive farms in July 2012 and no spatial clustering within the regions with positive farms. In contrast to the spatial dispersion of bluetongue virus, a virus that is also transmitted by Culicoides midges, in 2008 in Switzerland, the spread of SBV occurred from the western to the eastern part of the country. The dispersed incursion of SBV took place in the western part of Switzerland and the virus spread rapidly to the remaining territory. This spatial pattern is consistent with the hypothesis that transmission by Culicoides midges was the main way of spreading