447 research outputs found
Spoken Language Intent Detection using Confusion2Vec
Decoding speaker's intent is a crucial part of spoken language understanding
(SLU). The presence of noise or errors in the text transcriptions, in real life
scenarios make the task more challenging. In this paper, we address the spoken
language intent detection under noisy conditions imposed by automatic speech
recognition (ASR) systems. We propose to employ confusion2vec word feature
representation to compensate for the errors made by ASR and to increase the
robustness of the SLU system. The confusion2vec, motivated from human speech
production and perception, models acoustic relationships between words in
addition to the semantic and syntactic relations of words in human language. We
hypothesize that ASR often makes errors relating to acoustically similar words,
and the confusion2vec with inherent model of acoustic relationships between
words is able to compensate for the errors. We demonstrate through experiments
on the ATIS benchmark dataset, the robustness of the proposed model to achieve
state-of-the-art results under noisy ASR conditions. Our system reduces
classification error rate (CER) by 20.84% and improves robustness by 37.48%
(lower CER degradation) relative to the previous state-of-the-art going from
clean to noisy transcripts. Improvements are also demonstrated when training
the intent detection models on noisy transcripts
Semantic Detection of Targeted Attacks Using DOC2VEC Embedding
The targeted attack is one of the social engineering attacks. The detection of this type of attack is considered a challenge as it depends on semantic extraction of the intent of the attacker. However, previous research has primarily relies on the Natural Language Processing or Word Embedding techniques that lack the context of the attacker\u27s text message. Based on Sentence Embedding and machine learning approaches, this paper introduces a model for semantic detection of targeted attacks. This model has the advantage of encoding relevant information, which helps to improve the performance of the multi-class classification process. Messages will be categorized based on the type of security rule that the attacker has violated. The suggested model was tested using a dialogue dataset taken from phone calls, which was manually categorized into four categories. The text is pre-processed using natural language processing techniques, and the semantic features are extracted as Sentence Embedding vectors that are augmented with security policy sentences. Machine Learning algorithms are applied to classify text messages. The experimental results show that sentence embeddings with doc2vec achieved high prediction accuracy 96.8%. So, it outperformed the method applied to the same dialog dataset
Fine-Tuning BERT Models for Intent Recognition Using a Frequency Cut-Off Strategy for Domain-Specific Vocabulary Extension
The work leading to these results was supported by the Spanish Ministry of Science and Innovation through the R& D&i projects GOMINOLA (PID2020-118112RB-C21 and PID2020118112RB-C22, funded by MCIN/AEI/10.13039/501100011033), CAVIAR (TEC2017-84593-C2-1-R, funded by MCIN/ AEI/10.13039/501100011033/FEDER "Una manera de hacer Europa"), and AMICPoC (PDC2021-120846-C42, funded by MCIN/AEI/10.13039/501100011033 and by "the European Union "NextGenerationEU/PRTR"). This research also received funding from the European Union's Horizon2020 research and innovation program under grant agreement No 823907 (http://menhirproject.eu, accessed on 2 February 2022). Furthermore, R.K.'s research was supported by the Spanish Ministry of Education (FPI grant PRE2018-083225).Intent recognition is a key component of any task-oriented conversational system. The
intent recognizer can be used first to classify the user’s utterance into one of several predefined classes
(intents) that help to understand the user’s current goal. Then, the most adequate response can be
provided accordingly. Intent recognizers also often appear as a form of joint models for performing
the natural language understanding and dialog management tasks together as a single process, thus
simplifying the set of problems that a conversational system must solve. This happens to be especially
true for frequently asked question (FAQ) conversational systems. In this work, we first present an
exploratory analysis in which different deep learning (DL) models for intent detection and classification
were evaluated. In particular, we experimentally compare and analyze conventional recurrent
neural networks (RNN) and state-of-the-art transformer models. Our experiments confirmed that
best performance is achieved by using transformers. Specifically, best performance was achieved by
fine-tuning the so-called BETO model (a Spanish pretrained bidirectional encoder representations
from transformers (BERT) model from the Universidad de Chile) in our intent detection task. Then, as
the main contribution of the paper, we analyze the effect of inserting unseen domain words to extend
the vocabulary of the model as part of the fine-tuning or domain-adaptation process. Particularly,
a very simple word frequency cut-off strategy is experimentally shown to be a suitable method for
driving the vocabulary learning decisions over unseen words. The results of our analysis show that
the proposed method helps to effectively extend the original vocabulary of the pretrained models.
We validated our approach with a selection of the corpus acquired with the Hispabot-Covid19 system
obtaining satisfactory results.Spanish Ministry of Science and Innovation (MCIN/AEI) PID2020-118112RB-C21
PID2020118112RB-C22
PDC2021-120846-C42Spanish Ministry of Science and Innovation (MCIN/AEI/FEDER "Una manera de hacer Europa") TEC2017-84593-C2-1-RSpanish Ministry of Science and Innovation (European Union "NextGenerationEU/PRTR") PDC2021-120846-C42European Commission 823907German Research Foundation (DFG) PRE2018-08322
Vertical intent prediction approach based on Doc2vec and convolutional neural networks for improving vertical selection in aggregated search
Vertical selection is the task of selecting the most relevant verticals to a given query in order to improve the diversity and quality of web search results. This task requires not only predicting relevant verticals but also these verticals must be those the user expects to be relevant for his particular information need. Most existing works focused on using traditional machine learning techniques to combine multiple types of features for selecting several relevant verticals. Although these techniques are very efficient, handling vertical selection with high accuracy is still a challenging research task. In this paper, we propose an approach for improving vertical selection in order to satisfy the user vertical intent and reduce user’s browsing time and efforts. First, it generates query embeddings vectors using the doc2vec algorithm that preserves syntactic and semantic information within each query. Secondly, this vector will be used as input to a convolutional neural network model for increasing the representation of the query with multiple levels of abstraction including rich semantic information and then creating a global summarization of the query features. We demonstrate the effectiveness of our approach through comprehensive experimentation using various datasets. Our experimental findings show that our system achieves significant accuracy. Further, it realizes accurate predictions on new unseen data
- …