76 research outputs found

    Error Correction based on Error Signatures applied to automatic speech recognition

    Get PDF

    Finding Biomarker Signatures in Pooled Sample Designs: A Simulation Framework for Methodological Comparisons

    Get PDF
    Detection of discriminating patterns in gene expression data can be accomplished by using various methods of statistical learning. It has been proposed that sample pooling in this context would have negative effects; however, pooling cannot always be avoided. We propose a simulation framework to explicitly investigate the parameters of patterns, experimental design, noise, and choice of method in order to find out which effects on classification performance are to be expected. We use a two-group classification task and simulated gene expression data with independent differentially expressed genes as well as bivariate linear patterns and the combination of both. Our results show a clear increase of prediction error with pool size. For pooled training sets powered partial least squares discriminant analysis outperforms discriminance analysis, random forests, and support vector machines with linear or radial kernel for two of three simulated scenarios. The proposed simulation approach can be implemented to systematically investigate a number of additional scenarios of practical interest

    Automating Behavioral Testing in Machine Translation

    Full text link
    Behavioral testing in NLP allows fine-grained evaluation of systems by examining their linguistic capabilities through the analysis of input-output behavior. Unfortunately, existing work on behavioral testing in Machine Translation (MT) is currently restricted to largely handcrafted tests covering a limited range of capabilities and languages. To address this limitation, we propose to use Large Language Models (LLMs) to generate a diverse set of source sentences tailored to test the behavior of MT models in a range of situations. We can then verify whether the MT model exhibits the expected behavior through matching candidate sets that are also generated using LLMs. Our approach aims to make behavioral testing of MT systems practical while requiring only minimal human effort. In our experiments, we apply our proposed evaluation framework to assess multiple available MT systems, revealing that while in general pass-rates follow the trends observable from traditional accuracy-based metrics, our method was able to uncover several important differences and potential bugs that go unnoticed when relying only on accuracy

    Towards Real-World Streaming Speech Translation for Code-Switched Speech

    Full text link
    Code-switching (CS), i.e. mixing different languages in a single sentence, is a common phenomenon in communication and can be challenging in many Natural Language Processing (NLP) settings. Previous studies on CS speech have shown promising results for end-to-end speech translation (ST), but have been limited to offline scenarios and to translation to one of the languages present in the source (\textit{monolingual transcription}). In this paper, we focus on two essential yet unexplored areas for real-world CS speech translation: streaming settings, and translation to a third language (i.e., a language not included in the source). To this end, we extend the Fisher and Miami test and validation datasets to include new targets in Spanish and German. Using this data, we train a model for both offline and streaming ST and we establish baseline results for the two settings mentioned earlier

    BMC Bioinformatics

    No full text
    Background: For heterogeneous tissues, such as blood, measurements of gene expression are confounded by relative proportions of cell types involved. Conclusions have to rely on estimation of gene expression signals for homogeneous cell populations, e.g. by applying micro-dissection, fluorescence activated cell sorting, or in-silico deconfounding. We studied feasibility and validity of a non-negative matrix decomposition algorithm using experimental gene expression data for blood and sorted cells from the same donor samples. Our objective was to optimize the algorithm regarding detection of differentially expressed genes and to enable its use for classification in the difficult scenario of reversely regulated genes. This would be of importance for the identification of candidate biomarkers in heterogeneous tissues. Results: Experimental data and simulation studies involving noise parameters estimated from these data revealed that for valid detection of differential gene expression, quantile normalization and use of non-log data are optimal. We demonstrate the feasibility of predicting proportions of constituting cell types from gene expression data of single samples, as a prerequisite for a deconfounding-based classification approach. Classification cross-validation errors with and without using deconfounding results are reported as well as sample-size dependencies. Implementation of the algorithm, simulation and analysis scripts are available. Conclusions: The deconfounding algorithm without decorrelation using quantile normalization on non-log data is proposed for biomarkers that are difficult to detect, and for cases where confounding by varying proportions of cell types is the suspected reason. In this case, a deconfounding ranking approach can be used as a powerful alternative to, or complement of, other statistical learning approaches to define candidate biomarkers for molecular diagnosis and prediction in biomedicine, in realistically noisy conditions and with moderate sample sizes

    Antibiotic-free segregational plasmid stabilization in Escherichia coli owing to the knockout of triosephosphate isomerase (tpiA)

    Get PDF
    Selvamani RSV, Telaar M, Friehs K, Flaschel E. Antibiotic-free segregational plasmid stabilization in Escherichia coli owing to the knockout of triosephosphate isomerase (tpiA). Microbial Cell Factories. 2014;13(1): 58.Background: Segregational stability of plasmids is of major concern for recombinant bacterial production strains. One of the best strategies to counteract plasmid loss is the use of auxotrophic mutants which are complemented with the lacking gene along with the product-relevant ones. However, these knockout mutants often show unwanted growth in complex standard media or no growth at all under uncomplemented conditions. This led to the choice of a gene for knockout that only connects two essential arms of an essential metabolic pathway - the glycolysis. Results: Triosephosphate isomerase was chosen because its knockout will have a tremendous effect on growth on glucose as well as on glycerol. On glycerol the effect is almost absolute whereas on glucose growth is still possible, but with considerably lower rate than usual. This feature is essential because it may render cloning easier. This enzymatic activity was successfully tested as an alternative to antibiotic-based plasmid selection. Expression of a model recombinant beta-glucanase in continuous cultivation was possible with stable maintenance of the plasmid. In addition, the complementation of tpiA knockout strains by the corresponding plasmids and their growth characteristics were tested on a series of complex and synthetic media. The accumulation of methylglyoxal during the growth of tpiA-deficient strains was shown to be a possible cause for the growth disadvantage of these strains in comparison to the parent strain for the Keio Collection strain or the complemented knock-out strain. Conclusion: Through the use of this new auxotrophic complementation system, antibiotic-free cloning and selection of recombinant plasmid were possible. Continuous cultivation and recombinant protein expression with high segregational stability over an extended time period was also demonstrated

    Integration of Language Identification into a Recognition System for Spoken Conversations Containing Code-Switches

    Get PDF
    ABSTRACT This paper describes the integration of language identification (LID) into a multilingual automatic speech recognition (ASR) system for spoken conversations containing code-switches between Mandarin and English. We apply a multistream approach to combine at frame level the acoustic model score and the language information, where the latter is provided by an LID component. Furthermore, we advance this multistream approach by a new method called "Language Lookahead", in which the language information of subsequent frames is used to improve accuracy. Both methods are evaluated using a set of controlled LID results with varying frame accuracies. Our results show that both approaches improve the ASR performance by at least 4% relative if the LID achieves a minimum frame accuracy of 85%

    Brain-to-text: Decoding spoken phrases from phone representations in the brain

    Get PDF
    It has long been speculated whether communication between humans and machines based on natural speech related cortical activity is possible. Over the past decade, studies have suggested that it is feasible to recognize isolated aspects of speech from neural signals, such as auditory features, phones or one of a few isolated words. However, until now it remained an unsolved challenge to decode continuously spoken speech from the neural substrate associated with speech and language processing. Here, we show for the first time that continuously spoken speech can be decoded into the expressed words from intracranial electrocorticographic (ECoG) recordings. Specifically, we implemented a system, which we call Brain-To-Text that models single phones, employs techniques from automatic speech recognition (ASR), and thereby transforms brain activity while speaking into the corresponding textual representation. Our results demonstrate that our system can achieve word error rates as low as 25% and phone error rates below 50%. Additionally, our approach contributes to the current understanding of the neural basis of continuous speech production by identifying those cortical regions that hold substantial information about individual phones. In conclusion, the Brain-To-Text system described in this paper represents an important step toward human-machine communication based on imagined speech

    Biomarker discovery in heterogeneous tissue samples -taking the in-silico deconfounding approach

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>For heterogeneous tissues, such as blood, measurements of gene expression are confounded by relative proportions of cell types involved. Conclusions have to rely on estimation of gene expression signals for homogeneous cell populations, e.g. by applying micro-dissection, fluorescence activated cell sorting, or <it>in-silico </it>deconfounding. We studied feasibility and validity of a non-negative matrix decomposition algorithm using experimental gene expression data for blood and sorted cells from the same donor samples. Our objective was to optimize the algorithm regarding detection of differentially expressed genes and to enable its use for classification in the difficult scenario of reversely regulated genes. This would be of importance for the identification of candidate biomarkers in heterogeneous tissues.</p> <p>Results</p> <p>Experimental data and simulation studies involving noise parameters estimated from these data revealed that for valid detection of differential gene expression, quantile normalization and use of non-log data are optimal. We demonstrate the feasibility of predicting proportions of constituting cell types from gene expression data of single samples, as a prerequisite for a deconfounding-based classification approach.</p> <p>Classification cross-validation errors with and without using deconfounding results are reported as well as sample-size dependencies. Implementation of the algorithm, simulation and analysis scripts are available.</p> <p>Conclusions</p> <p>The deconfounding algorithm without decorrelation using quantile normalization on non-log data is proposed for biomarkers that are difficult to detect, and for cases where confounding by varying proportions of cell types is the suspected reason. In this case, a deconfounding ranking approach can be used as a powerful alternative to, or complement of, other statistical learning approaches to define candidate biomarkers for molecular diagnosis and prediction in biomedicine, in realistically noisy conditions and with moderate sample sizes.</p

    Workshops of the Sixth International Brain–Computer Interface Meeting: brain–computer interfaces past, present, and future

    Get PDF
    Brain–computer interfaces (BCI) (also referred to as brain–machine interfaces; BMI) are, by definition, an interface between the human brain and a technological application. Brain activity for interpretation by the BCI can be acquired with either invasive or non-invasive methods. The key point is that the signals that are interpreted come directly from the brain, bypassing sensorimotor output channels that may or may not have impaired function. This paper provides a concise glimpse of the breadth of BCI research and development topics covered by the workshops of the 6th International Brain–Computer Interface Meeting
    corecore