Search CORE

9,041 research outputs found

Multilingual Universal Sentence Encoder for Semantic Retrieval

Author: Abrego Gustavo Hernandez
Ahmad Amin
Cer Daniel
Constant Noah
Guo Mandy
Kurzweil Ray
Law Jax
Strope Brian
Sung Yun-Hsuan
Tar Chris
Yang Yinfei
Yuan Steve
Publication venue
Publication date: 09/07/2019
Field of study

We introduce two pre-trained retrieval focused multilingual sentence encoding models, respectively based on the Transformer and CNN model architectures. The models embed text from 16 languages into a single semantic space using a multi-task trained dual-encoder that learns tied representations using translation based bridge tasks (Chidambaram al., 2018). The models provide performance that is competitive with the state-of-the-art on: semantic retrieval (SR), translation pair bitext retrieval (BR) and retrieval question answering (ReQA). On English transfer learning tasks, our sentence-level embeddings approach, and in some cases exceed, the performance of monolingual, English only, sentence embedding models. Our models are made available for download on TensorFlow Hub.Comment: 6 pages, 6 tables, 2 listings, and 1 figur

arXiv.org e-Print Archive

Crossref

What to prioritize? Natural Language Processing for the Development of a Modern Bug Tracking Solution in Hardware Development

Author: Do Thi Thu Hang
Dobler Markus
Kühl Niklas
Publication venue: 'HICSS Conference Office'
Publication date: 28/09/2021
Field of study

Managing large numbers of incoming bug reports and finding the most critical issues in hardware development is time consuming, but crucial in order to reduce development costs. In this paper, we present an approach to predict the time to fix, the risk and the complexity of debugging and resolution of a bug report using different supervised machine learning algorithms namely Random Forest, Naive Bayes, SVM, MLP and XGBoost. Further, we investigate the effect of the application of active learning and we evaluate the impact of different text representation techniques, namely TF-IDF, Word2Vec, Universal Sentence Encoder and XLNet on the model's performance. The evaluation shows that a combination of text embeddings generated through the Universal Sentence Encoder and MLP as classifier outperforms all other methods, and is well suited to predict the risk and complexity of bug tickets

arXiv.org e-Print Archive

ScholarSpace at University of Hawai'i at Manoa

AIS Electronic Library (AISeL)

Evaluating the Construct Validity of Text Embeddings with Application to Survey Questions

Author: Fang Qixiang
Nguyen Dong
Oberski Daniel L
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/02/2022
Field of study

Text embedding models from Natural Language Processing can map text data (e.g. words, sentences, documents) to supposedly meaningful numerical representations (a.k.a. text embeddings). While such models are increasingly applied in social science research, one important issue is often not addressed: the extent to which these embeddings are valid representations of constructs relevant for social science research. We therefore propose the use of the classic construct validity framework to evaluate the validity of text embeddings. We show how this framework can be adapted to the opaque and high-dimensional nature of text embeddings, with application to survey questions. We include several popular text embedding methods (e.g. fastText, GloVe, BERT, Sentence-BERT, Universal Sentence Encoder) in our construct validity analyses. We find evidence of convergent and discriminant validity in some cases. We also show that embeddings can be used to predict respondent's answers to completely new survey questions. Furthermore, BERT-based embedding techniques and the Universal Sentence Encoder provide more valid representations of survey questions than do others. Our results thus highlight the necessity to examine the construct validity of text embeddings before deploying them in social science research.Comment: Under revie

arXiv.org e-Print Archive

Utrecht University Repository