196,214 research outputs found
A spoken document retrieval application in the oral history domain
The application of automatic speech recognition in the broadcast news domain is well studied. Recognition performance is generally high and accordingly, spoken document retrieval can successfully be applied in this domain, as demonstrated by a number of commercial systems. In other domains, a similar recognition performance is hard to obtain, or even far out of reach, for example due to lack of suitable training material. This is a serious impediment for the successful application of spoken document retrieval techniques for other data then news. This paper outlines our first steps towards a retrieval system that can automatically be adapted to new domains. We discuss our experience with a recently implemented spoken document retrieval application attached to a web-portal that aims at the disclosure of a multimedia data collection in the oral history domain. The paper illustrates that simply deploying an off-theshelf\ud
broadcast news system in this task domain will produce error rates that are too high to be useful for retrieval tasks. By applying adaptation techniques on the acoustic level and language model level, system performance can be improved considerably, but additional research on unsupervised adaptation and search interfaces is required to create an adequate search environment based on speech transcripts
Detecting missing content queries in an SMS-Based HIV/AIDS FAQ retrieval system
Automated Frequently Asked Question (FAQ) answering systems use pre-stored sets of question-answer pairs as an information source to answer natural language questions posed by the users. The main problem with this kind of information source is that there is no guarantee that there will be a relevant question-answer pair for all user queries. In this paper, we propose to deploy a binary classifier in an existing SMS-Based HIV/AIDS FAQ retrieval system to detect user queries that do not have the relevant question-answer pair in the FAQ document collection. Before deploying such a classifier, we first evaluate different feature sets for training in order to determine the sets of features that can build a model that yields the best classification accuracy. We carry out our evaluation using seven different feature sets generated from a query log before and after retrieval by the FAQ retrieval system. Our results suggest that, combining different feature sets markedly improves the classification accuracy
Modelling, Visualising and Summarising Documents with a Single Convolutional Neural Network
Capturing the compositional process which maps the meaning of words to that
of documents is a central challenge for researchers in Natural Language
Processing and Information Retrieval. We introduce a model that is able to
represent the meaning of documents by embedding them in a low dimensional
vector space, while preserving distinctions of word and sentence order crucial
for capturing nuanced semantics. Our model is based on an extended Dynamic
Convolution Neural Network, which learns convolution filters at both the
sentence and document level, hierarchically learning to capture and compose low
level lexical features into high level semantic concepts. We demonstrate the
effectiveness of this model on a range of document modelling tasks, achieving
strong results with no feature engineering and with a more compact model.
Inspired by recent advances in visualising deep convolution networks for
computer vision, we present a novel visualisation technique for our document
networks which not only provides insight into their learning process, but also
can be interpreted to produce a compelling automatic summarisation system for
texts
Content Based Document Recommender using Deep Learning
With the recent advancements in information technology there has been a huge
surge in amount of data available. But information retrieval technology has not
been able to keep up with this pace of information generation resulting in over
spending of time for retrieving relevant information. Even though systems exist
for assisting users to search a database along with filtering and recommending
relevant information, but recommendation system which uses content of documents
for recommendation still have a long way to mature. Here we present a Deep
Learning based supervised approach to recommend similar documents based on the
similarity of content. We combine the C-DSSM model with Word2Vec distributed
representations of words to create a novel model to classify a document pair as
relevant/irrelavant by assigning a score to it. Using our model retrieval of
documents can be done in O(1) time and the memory complexity is O(n), where n
is number of documents.Comment: Accepted in ICICI 2017, Coimbatore, Indi
Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-shot Learning
While billions of non-English speaking users rely on search engines every
day, the problem of ad-hoc information retrieval is rarely studied for
non-English languages. This is primarily due to a lack of data set that are
suitable to train ranking algorithms. In this paper, we tackle the lack of data
by leveraging pre-trained multilingual language models to transfer a retrieval
system trained on English collections to non-English queries and documents. Our
model is evaluated in a zero-shot setting, meaning that we use them to predict
relevance scores for query-document pairs in languages never seen during
training. Our results show that the proposed approach can significantly
outperform unsupervised retrieval techniques for Arabic, Chinese Mandarin, and
Spanish. We also show that augmenting the English training collection with some
examples from the target language can sometimes improve performance.Comment: ECIR 2020 (short
Photograph indexing and retrieval using star-graphs
International audienceWe present in this paper a relational approach for indexing and retrieving photographs from a collection. Instead of using simple keywords as an indexing language, we propose to use star-graphs as document descriptors. A star-graph is a conceptual graph that contains a single relation, with some concepts linked to it. They are elementary pieces of information describing combinations of concepts. We use star-graphs as descriptors - or index terms - for image content representation. This allows for relational indexing and expression of complex user needs, in comparison to classical text retrieval, where simple keywords are generally used as document descriptors. We present a document representation model, a weighting scheme for star-graphs inspired by the tf.idf used in text retrieval. We have applied our model to image retrieval, and show the system evaluation results
- …