Search CORE

263,248 research outputs found

Automated Detection of Usage Errors in non-native English Writing

Author: Fujishima Satoru
Ishizaki Shun
Publication venue
Publication date: 26/10/2011
Field of study

In an investigation of the use of a novelty detection algorithm for identifying inappropriate word combinations in a raw English corpus, we employ an unsupervised detection algorithm based on the one- class support vector machines (OC-SVMs) and extract sentences containing word sequences whose frequency of appearance is significantly low in native English writing. Combined with n-gram language models and document categorization techniques, the OC-SVM classifier assigns given sentences into two different groups; the sentences containing errors and those without errors. Accuracies are 79.30 % with bigram model, 86.63 % with trigram model, and 34.34 % with four-gram model

EEPIS Repository

Spoken Language Intent Detection using Confusion2Vec

Author: Georgiou Panayiotis
Shivakumar Prashanth Gurunath
Yang Mu
Publication venue: 'International Speech Communication Association'
Publication date: 01/07/2019
Field of study

Decoding speaker's intent is a crucial part of spoken language understanding (SLU). The presence of noise or errors in the text transcriptions, in real life scenarios make the task more challenging. In this paper, we address the spoken language intent detection under noisy conditions imposed by automatic speech recognition (ASR) systems. We propose to employ confusion2vec word feature representation to compensate for the errors made by ASR and to increase the robustness of the SLU system. The confusion2vec, motivated from human speech production and perception, models acoustic relationships between words in addition to the semantic and syntactic relations of words in human language. We hypothesize that ASR often makes errors relating to acoustically similar words, and the confusion2vec with inherent model of acoustic relationships between words is able to compensate for the errors. We demonstrate through experiments on the ATIS benchmark dataset, the robustness of the proposed model to achieve state-of-the-art results under noisy ASR conditions. Our system reduces classification error rate (CER) by 20.84% and improves robustness by 37.48% (lower CER degradation) relative to the previous state-of-the-art going from clean to noisy transcripts. Improvements are also demonstrated when training the intent detection models on noisy transcripts

arXiv.org e-Print Archive

Crossref

Did I say dog or cat? A study of semantic error detection and correction in children

Author: Budd
Budd
Cathleen Cortis
Clark
Dell
Dell
Dell
Dell
Evans
Foygel
Hartsuiker
Huettig
J. Richard Hanley
Jaeger
Jaeger
Karmiloff-Smith
Levelt
Levelt
Levy
Mary-Jane Budd
Meints
Morrison
Nazbanou Nozari
Nieuwenhuis
Nozari
Nozari
Nozari
Nozari
Nozari
Peets
Piaget
Postma
Quinn
Rapp
Reason
Rispoli
Sasisekaran
Schnur
Schwartz
Stemberger
Wechsler
Publication venue: 'Elsevier BV'
Publication date: 22/10/2015
Field of study

Although naturalistic studies of spontaneous speech suggest that young children can monitor their speech, the mechanisms for detection and correction of speech errors in children are not well understood. In particular, there is little research on monitoring semantic errors in this population. This study provides a systematic investigation of detection and correction of semantic errors in children between the ages of 5 and 8 years as they produced sentences to describe simple visual events involving nine highly familiar animals (the moving animals task). Results showed that older children made fewer errors and corrected a larger proportion of the errors that they made than younger children. We then tested the prediction of a production-based account of error monitoring that the strength of the language production system, and specifically its semantic–lexical component, should be correlated with the ability to detect and repair semantic errors. Strength of semantic–lexical mapping, as well as lexical–phonological mapping, was estimated individually for children by fitting their error patterns, obtained from an independent picture-naming task, to a computational model of language production. Children’s picture-naming performance was predictive of their ability to monitor their semantic errors above and beyond age. This relationship was specific to the strength of the semantic–lexical part of the system, as predicted by the production-based monitor

University of Essex Research Repository

Crossref

Uncertainty Detection as Approximate Max-Margin Sequence Labelling

Author: Dalianis Hercules
Eriksson Gunnar
Hassel Martin
Karlgren Jussi
Täckström Oscar
Velupillai Sumithra
Publication venue
Publication date: 01/01/2010
Field of study

This paper reports experiments for the CoNLL 2010 shared task on learning to detect hedges and their scope in natural language text. We have addressed the experimental tasks as supervised linear maximum margin prediction problems. For sentence level hedge detection in the biological domain we use an L1-regularised binary support vector machine, while for sentence level weasel detection in the Wikipedia domain, we use an L2-regularised approach. We model the in-sentence uncertainty cue and scope detection task as an L2-regularised approximate maximum margin sequence labelling problem, using the BIO-encoding. In addition to surface level features, we use a variety of linguistic features based on a functional dependency analysis. A greedy forward selection strategy is used in exploring the large set of potential features. Our official results for Task 1 for the biological domain are 85.2 F1-score, for the Wikipedia set 55.4 F1-score. For Task 2, our official results are 2.1 for the entire task with a score of 62.5 for cue detection. After resolving errors and final bugs, our final results are for Task 1, biological: 86.0, Wikipedia: 58.2; Task 2, scopes: 39.6 and cues: 78.5

CiteSeerX

Publikationer från Stockholms universitet

Publikationer från Uppsala Universitet

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

MDB: Interactively Querying Datasets and Models

Author: Naik Aaditya
Naik Mayur
Stein Adam
Wong Eric
Wu Yinjun
Publication venue
Publication date: 13/08/2023
Field of study

As models are trained and deployed, developers need to be able to systematically debug errors that emerge in the machine learning pipeline. We present MDB, a debugging framework for interactively querying datasets and models. MDB integrates functional programming with relational algebra to build expressive queries over a database of datasets and model predictions. Queries are reusable and easily modified, enabling debuggers to rapidly iterate and refine queries to discover and characterize errors and model behaviors. We evaluate MDB on object detection, bias discovery, image classification, and data imputation tasks across self-driving videos, large language models, and medical records. Our experiments show that MDB enables up to 10x faster and 40\% shorter queries than other baselines. In a user study, we find developers can successfully construct complex queries that describe errors of machine learning models

arXiv.org e-Print Archive