8 research outputs found

    Towards an Open-Source Dutch Speech Recognition System for the Healthcare Domain

    Get PDF
    The current largest open-source generic automatic speech recognition (ASR) system for Dutch, Kaldi NL, does not include a domain-specific healthcare jargon in the lexicon. Commercial alternatives (e.g., Google ASR system) are also not suitable for this purpose, not only because of the lexicon issue, but they do not safeguard privacy of sensitive data sufficiently and reliably. These reasons motivate that just a small amount of medical staff employs speech technology in the Netherlands. This paper proposes an innovative ASR training method developed within the Homo Medicinalis (HoMed) project. On the semantic level it specifically targets automatic transcription of doctor-patient consultation recordings with a focus on the use of medicines. In the first stage of HoMed, the Kaldi NL language model (LM) is fine-tuned with lists of Dutch medical terms and transcriptions of Dutch online healthcare news bulletins. Despite the acoustic challenges and linguistic complexity of the domain, we reduced the word error rate (WER) by 5.2%. The proposed method could be employed for ASR domain adaptation to other domains with sensitive and special category data. These promising results allow us to apply this methodology on highly sensitive audiovisual recordings of patient consultations at the Netherlands Institute for Health Services Research (Nivel)

    Challenges on the Promising Road to Automatic Speech Recognition of Privacy-Sensitive Dutch Doctor-Patient Consultation Recordings

    Get PDF
    In this paper we present the currently running PDI-SSH project Homo Medicinalis (HoMed), in which we use machine learning to build an Automatic Speech Recognition (ASR) infrastructure for disclosing privacy-sensitive doctor-patient consultation recordings

    Talking XTC: Drug discourse in post-war Dutch newspaper and radio debates

    Get PDF
    Digital search and visualisation technologies are combined into one methodological approach for structural public debate analysis of digital print and audiovisual media data archives called the “leveled approach”. This approach is conceptualised, developed and used for research into drug discourse in Dutch news media debates in this thesis, which consists of four studies into the reputation of drugs in post-war Dutch newspaper and radio debates. As each study contributes to digital method development in Digital Humanities and to the field of drug history, a section describing the digital search and analysis trajectory in the digitised media archive (distant reading) precedes each historical narrative (close reading). The four studies explore how the reputation of amphetamine (in chapter 1) and ecstasy (in chapters 2, 3 and 4) developed in a context of national drug regulation in the Netherlands. In this way, the hypothesis that Dutch drug regulation has been subject to an increasingly strong imperative to regulate in the post-war period is studied in the media domain. The findings of the four studies lead to three main conclusions about the development of the reputation of drugs in a context of discursive dynamics specific to the newspaper and radio debates. First, the discursive formation of drugs developed at a pace that was to some degree independent from developments in drug regulation: public unrest in the newspapers preceded amphetamine regulation, while ecstasy was commonly treated as a soft drug on the radio for many years after being classified as a hard drug. Second, the reputation of these drugs developed in a cross-media landscape in which international issues and local issues also had significant effects. Third, the discursive formation of ecstasy is best understood as multifaceted and contested, revolving around contrasting discursive strands defined by meaning constellations of 1) descriptions of the substance; 2) commonly connected actors; and 3) settings. In newspaper articles these discursive strands appeared mostly independently from each other, whereas they were most obvious in clashes between disagreeing stakeholders in discussions on the radio. This shows that analysing radio and newspaper archives enables an enriched perspective on historical cross-media debates. I suggest two leads for further structural research of digitised media debates. First, the leveled approach can be used as a structural framework for combining distant and close reading in OCR- and/or ASR metadata-enriched archives. This makes possible cross-media public debate research across print and audiovisual media archives. Second, this thesis’ consistent explication of the search and visualisation trajectory - the explication of the iterative space between distant and close reading - shows how to achieve a level of transparency that fosters improved opportunities for self reflection and peer review for cross-media public debate analysis based on distant and close reading. Moreover, this practice makes it possible to answer historical research questions using analysis of digital media data archives that face challenges related to (meta)data scarcity, uneven/changing (meta)data availability and continuous technological change

    Talking XTC: Drug discourse in post-war Dutch newspaper and radio debates

    No full text
    Digital search and visualisation technologies are combined into one methodological approach for structural public debate analysis of digital print and audiovisual media data archives called the “leveled approach”. This approach is conceptualised, developed and used for research into drug discourse in Dutch news media debates in this thesis, which consists of four studies into the reputation of drugs in post-war Dutch newspaper and radio debates. As each study contributes to digital method development in Digital Humanities and to the field of drug history, a section describing the digital search and analysis trajectory in the digitised media archive (distant reading) precedes each historical narrative (close reading). The four studies explore how the reputation of amphetamine (in chapter 1) and ecstasy (in chapters 2, 3 and 4) developed in a context of national drug regulation in the Netherlands. In this way, the hypothesis that Dutch drug regulation has been subject to an increasingly strong imperative to regulate in the post-war period is studied in the media domain. The findings of the four studies lead to three main conclusions about the development of the reputation of drugs in a context of discursive dynamics specific to the newspaper and radio debates. First, the discursive formation of drugs developed at a pace that was to some degree independent from developments in drug regulation: public unrest in the newspapers preceded amphetamine regulation, while ecstasy was commonly treated as a soft drug on the radio for many years after being classified as a hard drug. Second, the reputation of these drugs developed in a cross-media landscape in which international issues and local issues also had significant effects. Third, the discursive formation of ecstasy is best understood as multifaceted and contested, revolving around contrasting discursive strands defined by meaning constellations of 1) descriptions of the substance; 2) commonly connected actors; and 3) settings. In newspaper articles these discursive strands appeared mostly independently from each other, whereas they were most obvious in clashes between disagreeing stakeholders in discussions on the radio. This shows that analysing radio and newspaper archives enables an enriched perspective on historical cross-media debates. I suggest two leads for further structural research of digitised media debates. First, the leveled approach can be used as a structural framework for combining distant and close reading in OCR- and/or ASR metadata-enriched archives. This makes possible cross-media public debate research across print and audiovisual media archives. Second, this thesis’ consistent explication of the search and visualisation trajectory - the explication of the iterative space between distant and close reading - shows how to achieve a level of transparency that fosters improved opportunities for self reflection and peer review for cross-media public debate analysis based on distant and close reading. Moreover, this practice makes it possible to answer historical research questions using analysis of digital media data archives that face challenges related to (meta)data scarcity, uneven/changing (meta)data availability and continuous technological change

    Operationalizing “public debates” across digitized heterogeneous mass media datasets in the development and use of the Media Suite

    No full text
    In this paper, we propose a methodological operationalization of “public debates” as we focus on the research process of CLARIAH research pilot Debate Research Across Media (DReAM). In this pilot, heterogeneous datasets (of digitized print and audiovisual media) were made search-able with tools of the CLARIAH Media Suite, using the leveled research approach that we coined previously (combining distant and close reading) to do historical public debate analysis. The qualitative research interest in public debates on drugs and regulation is historical, but in order to bridge the gap between distant and close reading of the combined digital datasets, a number of insights from media studies is taken into consideration. The natures of the different media, the type of analysis and focus on the source material itself, and the necessity to combine historical expertise with a sensibility towards discursive relations are all considered before we argue that the accommodation of this approach in the Media Suite helps the researcher to gain an improved understanding of historical public debates in mass media

    Towards an Open-Source Dutch Speech Recognition System for the Healthcare Domain

    No full text
    The current largest open-source generic automatic speech recognition (ASR) system for Dutch, Kaldi NL, does not include a domain-specific healthcare jargon in the lexicon. Commercial alternatives (e.g., Google ASR system) are also not suitable for this purpose, not only because of the lexicon issue, but they do not safeguard privacy of sensitive data sufficiently and reliably. These reasons motivate that just a small amount of medical staff employs speech technology in the Netherlands. This paper proposes an innovative ASR training method developed within the Homo Medicinalis (HoMed) project. On the semantic level it specifically targets automatic transcription of doctor-patient consultation recordings with a focus on the use of medicines. In the first stage of HoMed, the Kaldi NL language model (LM) is fine-tuned with lists of Dutch medical terms and transcriptions of Dutch online healthcare news bulletins. Despite the acoustic challenges and linguistic complexity of the domain, we reduced the word error rate (WER) by 5.2%. The proposed method could be employed for ASR domain adaptation to other domains with sensitive and special category data. These promising results allow us to apply this methodology on highly sensitive audiovisual recordings of patient consultations at the Netherlands Institute for Health Services Research (Nivel)

    Operationalizing “public debates” across digitized heterogeneous mass media datasets in the development and use of the Media Suite

    No full text
    In this paper, we propose a methodological operationalization of “public debates” as we focus on the research process of CLARIAH research pilot Debate Research Across Media (DReAM). In this pilot, heterogeneous datasets (of digitized print and audiovisual media) were made search-able with tools of the CLARIAH Media Suite, using the leveled research approach that we coined previously (combining distant and close reading) to do historical public debate analysis. The qualitative research interest in public debates on drugs and regulation is historical, but in order to bridge the gap between distant and close reading of the combined digital datasets, a number of insights from media studies is taken into consideration. The natures of the different media, the type of analysis and focus on the source material itself, and the necessity to combine historical expertise with a sensibility towards discursive relations are all considered before we argue that the accommodation of this approach in the Media Suite helps the researcher to gain an improved understanding of historical public debates in mass media
    corecore