54 research outputs found
Recommended from our members
Efficient Latent Semantic Extraction from Cross Domain Data with Declarative Language
With large amounts of data continuously generated by intelligence devices, efficient analysis of huge data collections to unearth valuable insights has become one of the most elusive challenges for both academia and industry. The key elements to establishing a scalable analyzing framework should involve (1) an intuitive interface to describe the desired outcome, (2) a well-crafted model that integrates all available information sources to derive the optimal outcome and (3) an efficient algorithm that performs the data integration and extraction within a reasonable amount of time. In this dissertation, we address these challenges by proposing (1) a cross-language interface for a succinct expression of recursive queries, (2) a domain specific neural network model that can incorporate information of multiple modalities, and (3) a sample efficient training method that can be used even for extremely-large output-class classifiers. Our contributions in this thesis are thus threefold: First, for the ubiquitous recursive queries in advanced data analytics, on top of BigDatalog and Apache Spark, we design a succinct and expressive analytics tool encapsulating the functionality and classical algorithms of Datalog, a quintessential logic programming language. We provide the Logical Library (LLib), a Spark MLlib-like high-level API supporting a wide range of recursive algorithms and the Logical DataFrame (LFrame), an extension to Spark DataFrame supporting both relational and logical operations. The LLib and LFrame enable smooth collaborations between logical applications and other Spark libraries and cross-language logical programming in Scala, Java, or Python. Second, we utilize variants of recurrent neural network (RNN) to incorporate some enlightening sequential information overlooked by the conventional works in two different domains including Spoken Language Understanding (SLU) and Internet Embedding (IE). In SLU, we address the problem caused by solely relying on the first best interpretation (hypothesis) of an audio command through a series of new architectures comprising bidirectional LSTM and pooling layers to jointly utilize the other hypotheses' texts or embedding vectors, which are neglected but with valuable information missed by the first best hypothesis. In IE, we propose the DIP, an extension of RNN, to build up the internet coordinate system with the IP address sequences, which are also unnoticed in conventional distance-based internet embedding algorithms but encode structural information of the network. Both DIP and the integration of all hypotheses bring significant performance improvements for the corresponding downstream tasks. Finally, we investigate the training algorithm for multi-class classifiers with a large output-class size, which is common in deep neural networks and typically implemented as a softmax final layer with one output neuron per each class. To avoid expensive computing the intractable normalizing constant of softmax for each training data point, we analyze the well-known negative sampling and improve it to the amplified negative sampling algorithm, which gains much higher performance with lower training cost
SoftCorrect: Error Correction with Soft Detection for Automatic Speech Recognition
Error correction in automatic speech recognition (ASR) aims to correct those
incorrect words in sentences generated by ASR models. Since recent ASR models
usually have low word error rate (WER), to avoid affecting originally correct
tokens, error correction models should only modify incorrect words, and
therefore detecting incorrect words is important for error correction. Previous
works on error correction either implicitly detect error words through
target-source attention or CTC (connectionist temporal classification) loss, or
explicitly locate specific deletion/substitution/insertion errors. However,
implicit error detection does not provide clear signal about which tokens are
incorrect and explicit error detection suffers from low detection accuracy. In
this paper, we propose SoftCorrect with a soft error detection mechanism to
avoid the limitations of both explicit and implicit error detection.
Specifically, we first detect whether a token is correct or not through a
probability produced by a dedicatedly designed language model, and then design
a constrained CTC loss that only duplicates the detected incorrect tokens to
let the decoder focus on the correction of error tokens. Compared with implicit
error detection with CTC loss, SoftCorrect provides explicit signal about which
words are incorrect and thus does not need to duplicate every token but only
incorrect tokens; compared with explicit error detection, SoftCorrect does not
detect specific deletion/substitution/insertion errors but just leaves it to
CTC loss. Experiments on AISHELL-1 and Aidatatang datasets show that
SoftCorrect achieves 26.1% and 9.4% CER reduction respectively, outperforming
previous works by a large margin, while still enjoying fast speed of parallel
generation.Comment: AAAI 202
Towards structured neural spoken dialogue modelling.
195 p.In this thesis, we try to alleviate some of the weaknesses of the current approaches to dialogue modelling,one of the most challenging areas of Artificial Intelligence. We target three different types of dialogues(open-domain, task-oriented and coaching sessions), and use mainly machine learning algorithms to traindialogue models. One challenge of open-domain chatbots is their lack of response variety, which can betackled using Generative Adversarial Networks (GANs). We present two methodological contributions inthis regard. On the one hand, we develop a method to circumvent the non-differentiability of textprocessingGANs. On the other hand, we extend the conventional task of discriminators, which oftenoperate at a single response level, to the batch level. Meanwhile, two crucial aspects of task-orientedsystems are their understanding capabilities because they need to correctly interpret what the user islooking for and their constraints), and the dialogue strategy. We propose a simple yet powerful way toimprove spoken understanding and adapt the dialogue strategy by explicitly processing the user's speechsignal through audio-processing transformer neural networks. Finally, coaching dialogues shareproperties of open-domain and task-oriented dialogues. They are somehow task-oriented but, there is norush to complete the task, and it is more important to calmly converse to make the users aware of theirown problems. In this context, we describe our collaboration in the EMPATHIC project, where a VirtualCoach capable of carrying out coaching dialogues about nutrition was built, using a modular SpokenDialogue System. Second, we model such dialogues with an end-to-end system based on TransferLearning
Essential Speech and Language Technology for Dutch: Results by the STEVIN-programme
Computational Linguistics; Germanic Languages; Artificial Intelligence (incl. Robotics); Computing Methodologie
Detecting grammatical errors with treebank-induced, probabilistic parsers
Today's grammar checkers often use hand-crafted rule systems that define acceptable language. The development of such rule systems is labour-intensive and has to be repeated for each language. At the same time, grammars automatically induced from syntactically annotated corpora (treebanks) are successfully employed in other applications, for example text understanding and machine translation. At first glance, treebank-induced grammars seem to be unsuitable for grammar checking as they massively over-generate and fail to reject ungrammatical input due to their high robustness. We present three new methods for judging the grammaticality of a sentence with probabilistic, treebank-induced grammars, demonstrating that such grammars can be successfully applied to automatically judge the grammaticality of an input string. Our best-performing method exploits the differences between parse results for grammars trained on grammatical and ungrammatical treebanks. The second approach builds an estimator of the probability of the most likely parse using grammatical training data that has previously been parsed and annotated with parse probabilities. If the estimated probability of an input sentence (whose grammaticality is to be judged by the system) is higher by a certain amount than the actual parse probability, the sentence is flagged as ungrammatical. The third approach extracts discriminative parse tree fragments in the form of CFG rules from parsed grammatical and ungrammatical corpora and trains a binary classifier to distinguish grammatical from ungrammatical sentences. The three approaches are evaluated on a large test set of grammatical and ungrammatical sentences. The ungrammatical test set is generated automatically by inserting common grammatical errors into the British National Corpus. The results are compared to two traditional approaches, one that uses a hand-crafted, discriminative grammar, the XLE ParGram English LFG, and one based on part-of-speech n-grams. In addition, the baseline methods and the new methods are combined in a machine learning-based framework, yielding further improvements
- âŠ