CORE
CO
nnecting
RE
positories
Services
Services overview
Explore all CORE services
Access to raw data
API
Dataset
FastSync
Content discovery
Recommender
Discovery
OAI identifiers
OAI Resolver
Managing content
Dashboard
Bespoke contracts
Consultancy services
Support us
Support us
Membership
Sponsorship
Research partnership
About
About
About us
Our mission
Team
Blog
FAQs
Contact us
Community governance
Governance
Advisory Board
Board of supporters
Research network
Innovations
Our research
Labs
A new benchmark dataset with production methodology for short text semantic similarity algorithms
Authors
Agirre E.
Agirre E.
+37 more
Almarsoomi F.
Bernstein A.
Bär D.
Cook W.
Crockett K.
Dethlefs N.
Erk K.
Feng J.
Fenton N.
Foltz P. W.
Gabrilovich E.
Guo W.
Gurevych I.
Hassan S.
Hatzivassiloglou V.
Ho C.
Islam A.
James O'shea
Jimenez S.
Keeley Crockett
Li Y.
Lord P. W.
Mitchell J.
Mohler M.
O'Shea J.
Osathanunkul K.
Resnik P.
Rieck K.
Saric F.
Steyvers M.
Valcourt G.
Van Der Pligt J.
Van Valin R. D.
Volokh A.
Witten I. H.
Yokote K.-I.
Zuhair Bandar
Publication date
1 December 2013
Publisher
'Association for Computing Machinery (ACM)'
Doi
Cite
Abstract
This research presents a new benchmark dataset for evaluating Short Text Semantic Similarity (STSS) measurement algorithms and the methodology used for its creation. The power of the dataset is evaluated by using it to compare two established algorithms, STASIS and Latent Semantic Analysis. This dataset focuses on measures for use in Conversational Agents; other potential applications include email processing and data mining of social networks. Such applications involve integrating the STSS algorithm in a complex system, but STSS algorithms must be evaluated in their own right and compared with others for their effectiveness before systems integration. Semantic similarity is an artifact of human perception; therefore its evaluation is inherently empirical and requires benchmark datasets derived from human similarity ratings. The new dataset of 64 sentence pairs, STSS-131, has been designed to meet these requirements drawing on a range of resources from traditional grammar to cognitive neuroscience. The human ratings are obtained from a set of trials using new and improved experimental methods, with validated measures and statistics. The results illustrate the increased challenge and the potential longevity of the STSS-131 dataset as the Gold Standard for future STSS algorithm evaluation. © 2013 ACM 1550-4875/2013/12-ART17 15.00
Similar works
Full text
Open in the Core reader
Download PDF
Available Versions
Crossref
See this paper in CORE
Go to the repository landing page
Download from data provider
info:doi/10.1145%2F2537046
Last time updated on 17/09/2020
Supporting member
E-space: Manchester Metropolitan University's Research Repository
See this paper in CORE
Go to the repository landing page
Download from data provider
oai:e-space.mmu.ac.uk:615505
Last time updated on 02/01/2019