Search CORE

19,418 research outputs found

Query understanding enhanced by hierarchical parsing structures

Author: Jim Glass
Jingjing Liu
Panupong Pasupat
Scott Cyphers
Yining Wang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

Query understanding has been well studied in the areas of information retrieval and spoken language understanding (SLU). There are generally three layers of query understanding: domain classification, user intent detection, and semantic tagging. Classifiers can be applied to domain and intent detection in real systems, and semantic tagging (or slot filling) is commonly defined as a sequence-labeling task-- mapping a sequence of words to a sequence of labels. Various statistical features (e.g., n-grams) can be extracted from annotated queries for learning label prediction models; however, linguistic characteristics of queries, such as hierarchical structures and semantic relationships, are usually neglected in the feature extraction process. In this work, we propose an approach that leverages linguistic knowledge encoded in hierarchical parse trees for query understanding. Specifically, for natural language queries, we extract a set of syntactic structural features and semantic dependency features from query parse trees to enhance inference model learning. Experiments on real natural language queries show that augmenting sequence labeling models with linguistic knowledge can improve query understanding performance in various domains. Index Terms — query understanding, semantic tagging, linguistic parsin

CiteSeerX

Crossref

Short Text Pre-training with Extended Token Classification for E-commerce Query Understanding

Author: Cao Tianyu
Goutam Rahul
Jiang Haoming
Li Zheng
Luo Chen
Tang Xianfeng
Yin Bing
Yin Qingyu
Zhang Danqing
Publication venue
Publication date: 08/10/2022
Field of study

E-commerce query understanding is the process of inferring the shopping intent of customers by extracting semantic meaning from their search queries. The recent progress of pre-trained masked language models (MLM) in natural language processing is extremely attractive for developing effective query understanding models. Specifically, MLM learns contextual text embedding via recovering the masked tokens in the sentences. Such a pre-training process relies on the sufficient contextual information. It is, however, less effective for search queries, which are usually short text. When applying masking to short search queries, most contextual information is lost and the intent of the search queries may be changed. To mitigate the above issues for MLM pre-training on search queries, we propose a novel pre-training task specifically designed for short text, called Extended Token Classification (ETC). Instead of masking the input text, our approach extends the input by inserting tokens via a generator network, and trains a discriminator to identify which tokens are inserted in the extended input. We conduct experiments in an E-commerce store to demonstrate the effectiveness of ETC

arXiv.org e-Print Archive

Incorporating Device Context In Natural Language Understanding

Author: Anonymous
Publication venue: Technical Disclosure Commons
Publication date: 15/10/2020
Field of study

Automatic speech recognition (ASR) models are used to recognize user commands or queries in products such as smartphones, smart speakers/displays, and other products that enable speech interaction. Automatic speech recognition is a complex problem that requires correct processing of the acoustic and semantic signals from the voice input. Natural language understanding (NLU) systems sometimes fail to correctly interpret utterances that are associated with multiple possible intents. Per techniques described herein, device context features such as the identity of the foreground application and other information is utilized to disambiguate intent for a voice query. Incorporating device context as input to NLU models leads to improvement in the ability of the NLU models to correctly interpret utterances with ambiguous intent

Technical Disclosure Common

Zero-Shot Learning for Semantic Utterance Classification

Author: Dauphin Yann N.
Hakkani-Tur Dilek
Heck Larry
Tur Gokhan
Publication venue
Publication date: 01/01/2014
Field of study

We propose a novel zero-shot learning method for semantic utterance classification (SUC). It learns a classifier

f: X \to Y

for problems where none of the semantic categories

Y

are present in the training set. The framework uncovers the link between categories and utterances using a semantic space. We show that this semantic space can be learned by deep neural networks trained on large amounts of search engine query log data. More precisely, we propose a novel method that can learn discriminative semantic features without supervision. It uses the zero-shot learning framework to guide the learning of the semantic features. We demonstrate the effectiveness of the zero-shot semantic learning algorithm on the SUC dataset collected by (Tur, 2012). Furthermore, we achieve state-of-the-art results by combining the semantic features with a supervised method

arXiv.org e-Print Archive

CiteSeerX