68,102 research outputs found
DWT-CompCNN: Deep Image Classification Network for High Throughput JPEG 2000 Compressed Documents
For any digital application with document images such as retrieval, the
classification of document images becomes an essential stage. Conventionally
for the purpose, the full versions of the documents, that is the uncompressed
document images make the input dataset, which poses a threat due to the big
volume required to accommodate the full versions of the documents. Therefore,
it would be novel, if the same classification task could be accomplished
directly (with some partial decompression) with the compressed representation
of documents in order to make the whole process computationally more efficient.
In this research work, a novel deep learning model, DWT CompCNN is proposed for
classification of documents that are compressed using High Throughput JPEG 2000
(HTJ2K) algorithm. The proposed DWT-CompCNN comprises of five convolutional
layers with filter sizes of 16, 32, 64, 128, and 256 consecutively for each
increasing layer to improve learning from the wavelet coefficients extracted
from the compressed images. Experiments are performed on two benchmark
datasets- Tobacco-3482 and RVL-CDIP, which demonstrate that the proposed model
is time and space efficient, and also achieves a better classification accuracy
in compressed domain.Comment: In Springer Journal - Pattern Analysis and Applications under Minor
Revisio
Efficient Learning for Undirected Topic Models
Replicated Softmax model, a well-known undirected topic model, is powerful in
extracting semantic representations of documents. Traditional learning
strategies such as Contrastive Divergence are very inefficient. This paper
provides a novel estimator to speed up the learning based on Noise Contrastive
Estimate, extended for documents of variant lengths and weighted inputs.
Experiments on two benchmarks show that the new estimator achieves great
learning efficiency and high accuracy on document retrieval and classification.Comment: Accepted by ACL-IJCNLP 2015 short paper. 6 page
Improving average ranking precision in user searches for biomedical research datasets
Availability of research datasets is keystone for health and life science
study reproducibility and scientific progress. Due to the heterogeneity and
complexity of these data, a main challenge to be overcome by research data
management systems is to provide users with the best answers for their search
queries. In the context of the 2016 bioCADDIE Dataset Retrieval Challenge, we
investigate a novel ranking pipeline to improve the search of datasets used in
biomedical experiments. Our system comprises a query expansion model based on
word embeddings, a similarity measure algorithm that takes into consideration
the relevance of the query terms, and a dataset categorisation method that
boosts the rank of datasets matching query constraints. The system was
evaluated using a corpus with 800k datasets and 21 annotated user queries. Our
system provides competitive results when compared to the other challenge
participants. In the official run, it achieved the highest infAP among the
participants, being +22.3% higher than the median infAP of the participant's
best submissions. Overall, it is ranked at top 2 if an aggregated metric using
the best official measures per participant is considered. The query expansion
method showed positive impact on the system's performance increasing our
baseline up to +5.0% and +3.4% for the infAP and infNDCG metrics, respectively.
Our similarity measure algorithm seems to be robust, in particular compared to
Divergence From Randomness framework, having smaller performance variations
under different training conditions. Finally, the result categorization did not
have significant impact on the system's performance. We believe that our
solution could be used to enhance biomedical dataset management systems. In
particular, the use of data driven query expansion methods could be an
alternative to the complexity of biomedical terminologies
Queensland University of Technology at TREC 2005
The Information Retrieval and Web Intelligence (IR-WI) research group is a research team at the Faculty of Information Technology, QUT, Brisbane, Australia. The IR-WI group participated in the Terabyte and Robust track at TREC 2005, both for the first time. For the Robust track we applied our existing information retrieval system that was originally designed for use with structured (XML) retrieval to the domain of document retrieval. For the Terabyte track we experimented with an open source IR system, Zettair and performed two types of experiments. First, we compared Zettairâs performance on both a high-powered supercomputer and a distributed system across seven midrange personal computers. Second, we compared Zettairâs performance when a standard TREC title is used, compared with a natural language query, and a query expanded with synonyms. We compare the systems both in terms of efficiency and retrieval performance. Our results indicate that the distributed system is faster than the supercomputer, while slightly decreasing retrieval performance, and that natural language queries also slightly decrease retrieval performance, while our query expansion technique significantly decreased performance
Combining relevance information in a synchronous collaborative information retrieval environment
Traditionally information retrieval (IR) research has focussed on a single user interaction modality, where a user searches to satisfy an information need. Recent
advances in both web technologies, such as the sociable web of Web 2.0, and computer hardware, such as tabletop interface devices, have enabled multiple users to collaborate on many computer-related tasks. Due to these advances there is an increasing need to support
two or more users searching together at the same time, in order to satisfy a shared information need, which we refer to as Synchronous Collaborative Information Retrieval.
Synchronous Collaborative Information Retrieval (SCIR) represents a significant paradigmatic shift from traditional IR systems. In order to support an effective SCIR search, new techniques are required to coordinate users' activities. In this chapter we explore the effectiveness of a sharing of knowledge policy on a collaborating group. Sharing of knowledge refers to the process of passing relevance information across users,
if one user finds items of relevance to the search task then the group should benefit in the form of improved ranked lists returned to each searcher.
In order to evaluate the proposed techniques we simulate two users searching together through an incremental feedback system. The simulation assumes that users decide on an initial query with which to begin the collaborative search and proceed through the search by providing relevance judgments to the system and receiving a new ranked list. In order to populate these simulations we extract data from the interaction logs of various
experimental IR systems from previous Text REtrieval Conference (TREC) workshops
- âŠ