42,184 research outputs found
Spoken content retrieval: A survey of techniques and technologies
Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR
Special Libraries, September 1976
Volume 67, Issue 9https://scholarworks.sjsu.edu/sla_sl_1976/1007/thumbnail.jp
Evolving text classification rules with genetic programming
We describe a novel method for using genetic programming to create compact classification rules using combinations of N-grams (character strings). Genetic programs acquire fitness by producing rules that are effective classifiers in terms of precision and recall when evaluated against a set of training documents. We describe a set of functions and terminals and provide results from a classification task using the Reuters 21578 dataset. We also suggest that the rules may have a number of other uses beyond classification and provide a basis for text mining applications
A retrieval-based dialogue system utilizing utterance and context embeddings
Finding semantically rich and computer-understandable representations for
textual dialogues, utterances and words is crucial for dialogue systems (or
conversational agents), as their performance mostly depends on understanding
the context of conversations. Recent research aims at finding distributed
vector representations (embeddings) for words, such that semantically similar
words are relatively close within the vector-space. Encoding the "meaning" of
text into vectors is a current trend, and text can range from words, phrases
and documents to actual human-to-human conversations. In recent research
approaches, responses have been generated utilizing a decoder architecture,
given the vector representation of the current conversation. In this paper, the
utilization of embeddings for answer retrieval is explored by using
Locality-Sensitive Hashing Forest (LSH Forest), an Approximate Nearest Neighbor
(ANN) model, to find similar conversations in a corpus and rank possible
candidates. Experimental results on the well-known Ubuntu Corpus (in English)
and a customer service chat dataset (in Dutch) show that, in combination with a
candidate selection method, retrieval-based approaches outperform generative
ones and reveal promising future research directions towards the usability of
such a system.Comment: A shorter version is accepted at ICMLA2017 conference;
acknowledgement added; typos correcte
Recommended from our members
Local search: A guide for the information retrieval practitioner
There are a number of combinatorial optimisation problems in information retrieval in which the use of local search methods are worthwhile. The purpose of this paper is to show how local search can be used to solve some well known tasks in information retrieval (IR), how previous research in the field is piecemeal, bereft of a structure and methodologically flawed, and to suggest more rigorous ways of applying local search methods to solve IR problems. We provide a query based taxonomy for analysing the use of local search in IR tasks and an overview of issues such as fitness functions, statistical significance and test collections when conducting experiments on combinatorial optimisation problems. The paper gives a guide on the pitfalls and problems for IR practitioners who wish to use local search to solve their research issues, and gives practical advice on the use of such methods. The query based taxonomy is a novel structure which can be used by the IR practitioner in order to examine the use of local search in IR
- …