Search CORE

24 research outputs found

TwistBytes - identification of Cuneiform languages and German dialects at VarDial 2019

Author: Benites de Azevedo e Souza Fernando
Cieliebak Mark
von Däniken Pius
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

We describe our approaches for the German Dialect Identification (GDI) and the Cuneiform Language Identification (CLI) tasks at the VarDial Evaluation Campaign 2019. The goal was to identify dialects of Swiss German in GDI and Sumerian and Akkadian in CLI. In GDI, the system should distinguish four dialects from the German-speaking part of Switzerland. Our system for GDI achieved third place out of 6 teams, with a macro averaged F-1 of 74.6%. In CLI, the system should distinguish seven languages written in cuneiform script. Our system achieved third place out of 8 teams, with a macro averaged F-1 of 74.7%

Crossref

ZHAW digitalcollection

TRANSLIT : a large-scale name transliteration resource

Author: Benites de Azevedo e Souza Fernando
Cieliebak Mark
Duivesteijn Gilbert François
von Däniken Pius
Publication venue: European Language Resources Association
Publication date: 01/05/2020
Field of study

Transliteration is the process of expressing a proper name from a source language in the characters of a target language (e.g. from Cyrillic to Latin characters). We present TRANSLIT, a large-scale corpus with approx. 1.6 million entries in more than 180 languages with about 3 million variations of person and geolocation names. The corpus is based on various public data sources, which have been transformed into a unified format to simplify their usage, plus a newly compiled dataset from Wikipedia. In addition, we apply several machine learning methods to establish baselines for automatically detecting transliterated names in various languages. Our best systems achieve an accuracy of 92\% on identification of transliterated pairs

ZHAW digitalcollection

Twist Bytes : German dialect identification with data mining optimization

Author: Benites de Azevedo e Souza Fernando
Cieliebak Mark
Deriu Jan Milan
Grubenmann Ralf
von Däniken Pius
von Grünigen Dirk
Publication venue: VarDial
Publication date: 01/01/2018
Field of study

We describe our approaches used in the German Dialect Identification (GDI) task at the VarDial Evaluation Campaign 2018. The goal was to identify to which out of four dialects spoken in German speaking part of Switzerland a sentence belonged to. We adopted two different metaclassifier approaches and used some data mining insights to improve the preprocessing and the meta-classifier parameters. Especially, we focused on using different feature extraction methods and how to combine them, since they influenced the performance very differently of the system. Our system achieved second place out of 8 teams, with a macro averaged F-1 of 64.6%. We also participated on the surprise dialect task with a multi-label approach

ZHAW digitalcollection

ZHAW-InIT : social media geolocation at VarDial 2020

Author: Benites de Azevedo e Souza Fernando
Cieliebak Mark
Hürlimann Manuela
von Däniken Pius
Publication venue: International Committee on Computational Linguistics (ICCL)
Publication date: 13/12/2020
Field of study

We describe our approaches for the Social Media Geolocation (SMG) task at the VarDial Evaluation Campaign 2020. The goal was to predict geographical location (latitudes and longitudes) given an input text. There were three subtasks corresponding to German-speaking Switzerland (CH), Germany and Austria (DE-AT), and Croatia, Bosnia and Herzegovina, Montenegro and Serbia (BCMS). We submitted solutions to all subtasks but focused our development efforts on the CH subtask, where we achieved third place out of 16 submissions with a median distance of 15.93 km and had the best result of 14 unconstrained systems. In the DE-AT subtask, we ranked sixth out of ten submissions (fourth of 8 unconstrained systems) and for BCMS we achieved fourth place out of 13 submissions (second of 11 unconstrained systems)

ZHAW digitalcollection

ZHAW-InIT at GermEval 2020 task 4 : low-resource speech-to-text

Author: Benites de Azevedo e Souza Fernando
Büchi Matthias
Cieliebak Mark
Hürlimann Manuela
Ulasik Malgorzata Anna
von Däniken Pius
Publication venue: CEUR Workshop Proceedings
Publication date: 01/06/2020
Field of study

This paper presents the contribution of ZHAW-InIT to Task 4 ”Low-Resource STT” at GermEval 2020. The goal of the task is to develop a system for translating Swiss German dialect speech into Standard German text in the domain of parliamentary debates. Our approach is based on Jasper, a CNN Acoustic Model, which we fine-tune on the task data. We enhance the base system with an extended Language Model containing in-domain data and speed perturbation and run further experiments with post-processing. Our submission achieved first place with a final Word Error Rate of 40.29%

ZHAW digitalcollection

Design patterns for resource-constrained automated deep-learning methods

Author: Amirian Mohammadreza
Benites de Azevedo e Souza Fernando
Gupta Prakhar
Schilling Frank-Peter
Stadelmann Thilo
Tuggener Lukas
von Däniken Pius
Publication venue: 'MDPI AG'
Publication date: 06/11/2020
Field of study

We present an extensive evaluation of a wide variety of promising design patterns for automated deep-learning (AutoDL) methods, organized according to the problem categories of the 2019 AutoDL challenges, which set the task of optimizing both model accuracy and search efficiency under tight time and computing constraints. We propose structured empirical evaluations as the most promising avenue to obtain design principles for deep-learning systems due to the absence of strong theoretical support. From these evaluations, we distill relevant patterns which give rise to neural network design recommendations. In particular, we establish (a) that very wide fully connected layers learn meaningful features faster; we illustrate (b) how the lack of pretraining in audio processing can be compensated by architecture search; we show (c) that in text processing deep-learning-based methods only pull ahead of traditional methods for short text lengths with less than a thousand characters under tight resource limitations; and lastly we present (d) evidence that in very data- and computing-constrained settings, hyperparameter tuning of more traditional machine-learning methods outperforms deep-learning systems

Multidisciplinary Digital Publishing Institute

Infoscience - École polytechnique fédérale de Lausanne

ZHAW digitalcollection

Short-time dynamic patterns of bioaerosol generation and displacement in an indoor environment

Author: Astrid von Däniken
B Lighthart
B Lighthart
C Johansson
Carmen Hitz
CF Green
CS Cox
E Abt
H Brandl
Helmut Brandl
I Colbeck
J Dutkiewicz
J Tyndall
KH Bartlett
L Morawska
LD Stetzenbach
M Branis
M Luoma
M Zollinger
MSA Heikkinen
N Fierer
R Jaenicke
T Zormann
W Feller
Walter Krebs
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Short-time dynamics and distribution of airborne biological and total particles were assessed in a large university hallway by particle counting using laser particle counters and impaction air samplers. Particle numbers of four different size ranges were determined every 2 minutes over several hours. Bioaerosols (culturable bacteria and fungi determined as colony-forming units) were directly collected every 5 minutes on Petri dishes containing the corresponding growth medium. Results clearly show distinct shorttime dynamics of particulate aerosols, both of biological and non-biological origin. These reproducible periodic patterns are closely related to periods when lectures are held in lecture rooms and the intermissions in between where students are present in the hallway. Peaks of airborne culturable bacteria were observed with a periodicity of 1 hour. Bioaerosol concentrations follow synchronously the variation in total number of particles. These highly reproducible temporal dynamics have to be considered when monitoring indoor environments with respect to air quality

Crossref

ZORA

ZHAW digitalcollection

Fantastically reasonable: ambivalence in the representation of science and technology in super-hero comics

A long-standing contrast in academic discussions of science concerns its perceived disenchanting or enchanting public impact. In one image, science displaces magical belief in unknowable entities with belief in knowable forces and processes and reduces all things to a single technical measure. In the other, science is itself magically transcendent, expressed in technological adulation and an image of scientists as wizards or priests. This paper shows that these contrasting images are also found in representations of science in super-hero comics, which, given their lowly status in Anglo-American culture, would seem an unlikely place to find such commonality with academic discourse. It is argued that this is evidence that the contrast constitutes an ambivalence arising from the dilemmas that science poses; they are shared rhetorics arising from and reflexively feeding a set of broad cultural concerns. This is explored through consideration of representations of science at a number of levels in the comics, with particular focus on the science-magic constellation, and enchanted and disenchanted imagery in representations of technology and scientists. It is concluded that super-hero comics are one cultural arena where the public meaning of science is actively worked out, an activity that unites “expert” and “non-expert” alike

Crossref

SSOAR - Social Science Open Access Repository

Kingston University Research Repository

Merging haystacks to find matching needles : a transliteration approach

Author: Benites de Azevedo e Souza Fernando
Cieliebak Mark
Duivesteijn Gilbert
von Däniken Pius
Publication venue
Publication date: 01/01/2018
Field of study

ZHAW digitalcollection

spMMMP at GermEval 2018 shared task : classification of offensive content in tweets using convolutional neural networks and gated recurrent units

Author: Benites de Azevedo e Souza Fernando
Cieliebak Mark
Grubenmann Ralf
von Däniken Pius
von Grünigen Dirk
Publication venue: ÖAW Austrian Academy of Sciences
Publication date: 01/01/2018
Field of study

In this paper, we propose two different systems for classifying offensive language in micro-blog messages from twitter (”tweet”). The first system uses an ensemble of convolutional neural networks (CNN), whose outputs are then fed to a meta-classifier for the final prediction. The second system uses a combination of a CNN and a gated recurrent unit (GRU) together with a transfer-learning approach based on pretraining with a large, automatically translated dataset

Elektronisches Publikationsportal der Ãsterreichischen Akademie der Wissenschaften

ZHAW digitalcollection

Elektronisches Publikationsportal der Österreichischen Akademie der Wissenschaften