Search CORE

76 research outputs found

Error Correction based on Error Signatures applied to automatic speech recognition

Author: Telaar Dominic
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2015
Field of study

Finding Biomarker Signatures in Pooled Sample Designs: A Simulation Framework for Methodological Comparisons

Author: Nürnberg Gerd
Repsilber Dirk
Telaar Anna
Publication venue: Hindawi Publishing Corporation
Publication date: 01/01/2010
Field of study

Detection of discriminating patterns in gene expression data can be accomplished by using various methods of statistical learning. It has been proposed that sample pooling in this context would have negative effects; however, pooling cannot always be avoided. We propose a simulation framework to explicitly investigate the parameters of patterns, experimental design, noise, and choice of method in order to find out which effects on classification performance are to be expected. We use a two-group classification task and simulated gene expression data with independent differentially expressed genes as well as bivariate linear patterns and the combination of both. Our results show a clear increase of prediction error with pool size. For pooled training sets powered partial least squares discriminant analysis outperforms discriminance analysis, random forests, and support vector machines with linear or radial kernel for two of three simulated scenarios. The proposed simulation approach can be implemented to systematically investigate a number of additional scenarios of practical interest

Directory of Open Access Journals

PubMed Central

Hochschulbibliothekszentrum des Landes Nordrhein-Westfalen (hbz)

Automating Behavioral Testing in Machine Translation

Author: Ferrando Javier
Hasan Saša
Setiawan Hendra
Sperber Matthias
Telaar Dominic
Publication venue
Publication date: 02/11/2023
Field of study

Behavioral testing in NLP allows fine-grained evaluation of systems by examining their linguistic capabilities through the analysis of input-output behavior. Unfortunately, existing work on behavioral testing in Machine Translation (MT) is currently restricted to largely handcrafted tests covering a limited range of capabilities and languages. To address this limitation, we propose to use Large Language Models (LLMs) to generate a diverse set of source sentences tailored to test the behavior of MT models in a range of situations. We can then verify whether the MT model exhibits the expected behavior through matching candidate sets that are also generated using LLMs. Our approach aims to make behavioral testing of MT systems practical while requiring only minimal human effort. In our experiments, we apply our proposed evaluation framework to assess multiple available MT systems, revealing that while in general pass-rates follow the trends observable from traditional accuracy-based metrics, our method was able to uncover several important differences and potential bugs that go unnoticed when relying only on accuracy

arXiv.org e-Print Archive

Towards Real-World Streaming Speech Translation for Code-Switched Speech

Author: Agarwal Aashish
Alastruey Belen
Gollan Christian
Ng Tim
Sperber Matthias
Telaar Dominic
Publication venue
Publication date: 23/10/2023
Field of study

Code-switching (CS), i.e. mixing different languages in a single sentence, is a common phenomenon in communication and can be challenging in many Natural Language Processing (NLP) settings. Previous studies on CS speech have shown promising results for end-to-end speech translation (ST), but have been limited to offline scenarios and to translation to one of the languages present in the source (\textit{monolingual transcription}). In this paper, we focus on two essential yet unexplored areas for real-world CS speech translation: streaming settings, and translation to a third language (i.e., a language not included in the source). To this end, we extend the Fisher and Miami test and validation datasets to include new targets in Spanish and German. Using this data, we train a model for both offline and streaming ST and we establish baseline results for the two settings mentioned earlier

arXiv.org e-Print Archive

BMC Bioinformatics

Author: Black G.
Jacobsen M.
Kaufmann S.
Kern S.
Parida S.
Repsilber D.
Selbig J.
Telaar A.
Walzl G.
Publication venue
Publication date: 14/01/2010
Field of study

Background: For heterogeneous tissues, such as blood, measurements of gene expression are confounded by relative proportions of cell types involved. Conclusions have to rely on estimation of gene expression signals for homogeneous cell populations, e.g. by applying micro-dissection, fluorescence activated cell sorting, or in-silico deconfounding. We studied feasibility and validity of a non-negative matrix decomposition algorithm using experimental gene expression data for blood and sorted cells from the same donor samples. Our objective was to optimize the algorithm regarding detection of differentially expressed genes and to enable its use for classification in the difficult scenario of reversely regulated genes. This would be of importance for the identification of candidate biomarkers in heterogeneous tissues. Results: Experimental data and simulation studies involving noise parameters estimated from these data revealed that for valid detection of differential gene expression, quantile normalization and use of non-log data are optimal. We demonstrate the feasibility of predicting proportions of constituting cell types from gene expression data of single samples, as a prerequisite for a deconfounding-based classification approach. Classification cross-validation errors with and without using deconfounding results are reported as well as sample-size dependencies. Implementation of the algorithm, simulation and analysis scripts are available. Conclusions: The deconfounding algorithm without decorrelation using quantile normalization on non-log data is proposed for biomarkers that are difficult to detect, and for cases where confounding by varying proportions of cell types is the suspected reason. In this case, a deconfounding ranking approach can be used as a powerful alternative to, or complement of, other statistical learning approaches to define candidate biomarkers for molecular diagnosis and prediction in biomedicine, in realistically noisy conditions and with moderate sample sizes

MPG.PuRe

Antibiotic-free segregational plasmid stabilization in Escherichia coli owing to the knockout of triosephosphate isomerase (tpiA)

Author: Flaschel Erwin
Friehs Karl
Selvamani Ram Shankar Velur
Telaar Maurice
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Selvamani RSV, Telaar M, Friehs K, Flaschel E. Antibiotic-free segregational plasmid stabilization in Escherichia coli owing to the knockout of triosephosphate isomerase (tpiA). Microbial Cell Factories. 2014;13(1): 58.Background: Segregational stability of plasmids is of major concern for recombinant bacterial production strains. One of the best strategies to counteract plasmid loss is the use of auxotrophic mutants which are complemented with the lacking gene along with the product-relevant ones. However, these knockout mutants often show unwanted growth in complex standard media or no growth at all under uncomplemented conditions. This led to the choice of a gene for knockout that only connects two essential arms of an essential metabolic pathway - the glycolysis. Results: Triosephosphate isomerase was chosen because its knockout will have a tremendous effect on growth on glucose as well as on glycerol. On glycerol the effect is almost absolute whereas on glucose growth is still possible, but with considerably lower rate than usual. This feature is essential because it may render cloning easier. This enzymatic activity was successfully tested as an alternative to antibiotic-based plasmid selection. Expression of a model recombinant beta-glucanase in continuous cultivation was possible with stable maintenance of the plasmid. In addition, the complementation of tpiA knockout strains by the corresponding plasmids and their growth characteristics were tested on a series of complex and synthetic media. The accumulation of methylglyoxal during the growth of tpiA-deficient strains was shown to be a possible cause for the growth disadvantage of these strains in comparison to the parent strain for the Keio Collection strain or the complemented knock-out strain. Conclusion: Through the use of this new auxotrophic complementation system, antibiotic-free cloning and selection of recombinant plasmid were possible. Continuous cultivation and recombinant protein expression with high segregational stability over an extended time period was also demonstrated

Crossref

Springer - Publisher Connector

PubMed Central

Publications at Bielefeld University

Integration of Language Identification into a Recognition System for Spoken Conversations Containing Code-Switches

Author: Dau-Cheng Lyu
Dominic Telaar
Eng-Siong Chng
Florian Metze
Haizhou Li
Jochen Weiner
Ngoc Thang Vu
Tanja Schultz
Publication venue
Publication date: 01/01/2012
Field of study

ABSTRACT This paper describes the integration of language identification (LID) into a multilingual automatic speech recognition (ASR) system for spoken conversations containing code-switches between Mandarin and English. We apply a multistream approach to combine at frame level the acoustic model score and the language information, where the latter is provided by an LID component. Furthermore, we advance this multistream approach by a new method called "Language Lookahead", in which the language information of subsequent frames is used to improve accuracy. Both methods are evaluated using a set of controlled LID results with varying frame accuracies. Our results show that both approaches improve the ASR performance by at least 4% relative if the LID achieves a minimum frame accuracy of 85%

CiteSeerX

Brain-to-text: Decoding spoken phrases from phone representations in the brain

Author: Adriana de Pesters
Blakely
Bouchard
Bouchard
Brumberg
Canolty
Chang
Christian Herff
Crane
Crone
Crone
Deng
Dominic Heger
Dominic Telaar
Farwell
Formisano
Fukuda
Gales
Gales
Gasser
Gerwin Schalk
Guenther
Haeb-Umbach
Huang
Jelinek
Kellis
Kennedy
Kubanek
Kubanek
Lee
Leuthardt
Leuthardt
Lotte
Martin
McFarland
Mesgarani
Miller
Mugler
Mugler
Pasley
Pei
Pei
Peter Brunner
Potes
PulvermÃ¼ller
Rabiner
Roy
Sahin
Schalk
Schultz
Stolcke
Sutter
Talairach
Tanja Schultz
Telaar
Towle
Unknown.
Wolpaw
Publication venue: Frontiers Media
Publication date: 01/01/2015
Field of study

It has long been speculated whether communication between humans and machines based on natural speech related cortical activity is possible. Over the past decade, studies have suggested that it is feasible to recognize isolated aspects of speech from neural signals, such as auditory features, phones or one of a few isolated words. However, until now it remained an unsolved challenge to decode continuously spoken speech from the neural substrate associated with speech and language processing. Here, we show for the first time that continuously spoken speech can be decoded into the expressed words from intracranial electrocorticographic (ECoG) recordings. Specifically, we implemented a system, which we call Brain-To-Text that models single phones, employs techniques from automatic speech recognition (ASR), and thereby transforms brain activity while speaking into the corresponding textual representation. Our results demonstrate that our system can achieve word error rates as low as 25% and phone error rates below 50%. Additionally, our approach contributes to the current understanding of the neural basis of continuous speech production by identifying those cortical regions that hold substantial information about individual phones. In conclusion, the Brain-To-Text system described in this paper represents an important step toward human-machine communication based on imagined speech

Crossref

KITopen

Frontiers - Publisher Connector

PubMed Central

Biomarker discovery in heterogeneous tissue samples -taking the in-silico deconfounding approach

Author: A Kriete
A Liaw
Anna Telaar
B Efron
CL Lawson
D Geman
D Ghosh
D Repsilber
D Venet
Dirk Repsilber
Gerhard Walzl
Gillian F Black
GK Smyth
GK Smyth
H Lahdesmaki
Huaien Luo
J Landgrebe
Joachim Selbig
L Breiman
LA Herzenberg
M Hummel
M Jacobsen
M Jacobsen
M West
Marc Jacobsen
ME Ritchie
MR Emmert-Buck
NA Watkins
P Lu
R Development Core Team
RO Stuart
Sabine Kern
Shreemanta K Parida
Stefan HE Kaufmann
TD Moloshok
WL Ford
YH Yang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background For heterogeneous tissues, such as blood, measurements of gene expression are confounded by relative proportions of cell types involved. Conclusions have to rely on estimation of gene expression signals for homogeneous cell populations, e.g. by applying micro-dissection, fluorescence activated cell sorting, or <it>in-silico </it>deconfounding. We studied feasibility and validity of a non-negative matrix decomposition algorithm using experimental gene expression data for blood and sorted cells from the same donor samples. Our objective was to optimize the algorithm regarding detection of differentially expressed genes and to enable its use for classification in the difficult scenario of reversely regulated genes. This would be of importance for the identification of candidate biomarkers in heterogeneous tissues. Results Experimental data and simulation studies involving noise parameters estimated from these data revealed that for valid detection of differential gene expression, quantile normalization and use of non-log data are optimal. We demonstrate the feasibility of predicting proportions of constituting cell types from gene expression data of single samples, as a prerequisite for a deconfounding-based classification approach. Classification cross-validation errors with and without using deconfounding results are reported as well as sample-size dependencies. Implementation of the algorithm, simulation and analysis scripts are available. Conclusions The deconfounding algorithm without decorrelation using quantile normalization on non-log data is proposed for biomarkers that are difficult to detect, and for cases where confounding by varying proportions of cell types is the suspected reason. In this case, a deconfounding ranking approach can be used as a powerful alternative to, or complement of, other statistical learning approaches to define candidate biomarkers for molecular diagnosis and prediction in biomedicine, in realistically noisy conditions and with moderate sample sizes.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Hochschulbibliothekszentrum des Landes Nordrhein-Westfalen (hbz)

Workshops of the Sixth International Brain–Computer Interface Meeting: brain–computer interfaces past, present, and future

Author: Barachant A
Beaudoin N
Borton DA
Boutros NN
Brouwer A-M
Brouwer AM
Chao ZC
Charvet G
Chavarriaga R
Collinger JL
Dal Seno B
Degenhart AD
Dourish P
Gavin WJ
Guger C
Halder S
Herff C
Hsu SH
Iturrate I
Iturrate I
Kleih SC
Kohli S
Kohlmorgen J
Korik A
Korik A
Kreilinger A
Lotte F
Lotte F
Martin S
Mugler E
Murphy MD
Müller-Putz G
Naseer N
Nijboer F
Ofner P
Pandarinath C
Pereira J
Ramirez R
Rosenboom D
Rupp R
Rupp R
Salas MA
Salisbury DB
Schettini F
Schreuder M
Simon N
Sun H
Telaar D
Toppi J
Wang Z
Witkowski M
Publication venue: 'Informa UK Limited'
Publication date: 30/01/2017
Field of study

Brain–computer interfaces (BCI) (also referred to as brain–machine interfaces; BMI) are, by definition, an interface between the human brain and a technological application. Brain activity for interpretation by the BCI can be acquired with either invasive or non-invasive methods. The key point is that the signals that are interpreted come directly from the brain, bypassing sensorimotor output channels that may or may not have impaired function. This paper provides a concise glimpse of the breadth of BCI research and development topics covered by the workshops of the 6th International Brain–Computer Interface Meeting

Infoscience - École polytechnique fédérale de Lausanne

Crossref

University of Twente Research Information

HAL - UPEC / UPEM