2,349 research outputs found
Topical relevance model
We introduce the topical relevance model (TRLM) as a generalisation of the standard relevance model (RLM). The TRLM alleviates the limitations of the RLM by exploiting the multi-topical structure of pseudo-relevant documents. In TRLM, intra-topical document and query term co-occurrences are favoured, whereas the inter-topical ones are down-weighted. The multi-topical nature of pseudo-relevant documents results from the multi-faceted nature of the information need typically expressed in a query. The TRLM provides a framework to estimate a set of underlying hypothetical relevance models for each such aspect of the information need. Experimental results show that the TRLM significantly outperforms the RLM for ad-hoc and patent prior art search, and additionally that it outperforms recent extensions of the RLM
Reply With: Proactive Recommendation of Email Attachments
Email responses often contain items-such as a file or a hyperlink to an
external document-that are attached to or included inline in the body of the
message. Analysis of an enterprise email corpus reveals that 35% of the time
when users include these items as part of their response, the attachable item
is already present in their inbox or sent folder. A modern email client can
proactively retrieve relevant attachable items from the user's past emails
based on the context of the current conversation, and recommend them for
inclusion, to reduce the time and effort involved in composing the response. In
this paper, we propose a weakly supervised learning framework for recommending
attachable items to the user. As email search systems are commonly available,
we constrain the recommendation task to formulating effective search queries
from the context of the conversations. The query is submitted to an existing IR
system to retrieve relevant items for attachment. We also present a novel
strategy for generating labels from an email corpus---without the need for
manual annotations---that can be used to train and evaluate the query
formulation model. In addition, we describe a deep convolutional neural network
that demonstrates satisfactory performance on this query formulation task when
evaluated on the publicly available Avocado dataset and a proprietary dataset
of internal emails obtained through an employee participation program.Comment: CIKM2017. Proceedings of the 26th ACM International Conference on
Information and Knowledge Management. 201
Bibliometric cartography of information retrieval research by using co-word analysis
The aim of this study is to map the intellectual structure of the field of Information Retrieval (IR) during the period of 1987-1997. Co-word analysis was employed to reveal patterns and trends in the IR field by measuring the association strengths of terms representative of relevant publications or other texts produced in IR field. Data were collected from Science Citation Index (SCI) and Social Science Citation Index (SSCI) for the period of 1987-1997. In addition to the keywords added by the SCI and SSCI databases, other important keywords were extracted from titles and abstracts manually. These keywords were further standardized using vocabulary control tools. In order to trace the dynamic changes of the IR field, the whole 11-year period was further separated into two consecutive periods: 1987-1991 and 1992-1997. The results show that the IR field has some established research themes and it also changes rapidly to embrace new themes
Recommended from our members
System and method for searching a data base using a content-searchable memory
" A dynamic storage device requires periodic refresh and includes logical operation circuitry within the refresh circuitry. The individual storage positions of the storage device are periodically read by a refresh amplifier, and then a logical operation is performed on the refresh data before the data re applied to the write amplifier. That operation allows implementation of associative database searching by cyclically executing ""data compare"" and other logical operations within the refresh circuitry. A system of content searching may be implemented in any storage device, dynamic or not, in which a comparand may be matched with any of a plurality of subunits of a word, and a storage bit is used to identify any words in which a mismatch occurs. Upon recognizing a match, the device can be commanded (a) to output the word or a selected portion (which may be different than the matched portion), (b) to move a selected portion of the word to a different location in the word, or (c) to alter the bits of the word or a selected portion. Arithmetical operations may be implemented through such alterations after matching. Off-chip storage systems of use with such devices are also disclosed. "Board of Regents, University of Texas Syste
Cross-language Text Classification with Convolutional Neural Networks From Scratch
Cross language classification is an important task in multilingual learning, where documents in different languages often share the same set of categories. The main goal is to reduce the labeling cost of training classification model for each individual language. The novel approach by using Convolutional Neural Networks for multilingual language classification is proposed in this article. It learns representation of knowledge gained from languages. Moreover, current method works for new individual language, which was not used in training. The results of empirical study on large dataset of 21 languages demonstrate robustness and competitiveness of the presented approach
What Causes My Test Alarm? Automatic Cause Analysis for Test Alarms in System and Integration Testing
Driven by new software development processes and testing in clouds, system
and integration testing nowadays tends to produce enormous number of alarms.
Such test alarms lay an almost unbearable burden on software testing engineers
who have to manually analyze the causes of these alarms. The causes are
critical because they decide which stakeholders are responsible to fix the bugs
detected during the testing. In this paper, we present a novel approach that
aims to relieve the burden by automating the procedure. Our approach, called
Cause Analysis Model, exploits information retrieval techniques to efficiently
infer test alarm causes based on test logs. We have developed a prototype and
evaluated our tool on two industrial datasets with more than 14,000 test
alarms. Experiments on the two datasets show that our tool achieves an accuracy
of 58.3% and 65.8%, respectively, which outperforms the baseline algorithms by
up to 13.3%. Our algorithm is also extremely efficient, spending about 0.1s per
cause analysis. Due to the attractive experimental results, our industrial
partner, a leading information and communication technology company in the
world, has deployed the tool and it achieves an average accuracy of 72% after
two months of running, nearly three times more accurate than a previous
strategy based on regular expressions.Comment: 12 page
Recommended from our members
An investigation to study the feasibility of on-line bibliographic information retrieval system using an APP
This thesis was submitted for the degree of Doctor of Philosophy and was awarded by Brunel University.This thesis reports an investigation on the feasibility study of a
searching mechanism using an APP suitable for an on-line bibliographic
retrieval, operation, especially for retrospective searches.
From the study of the searching methods used in the conventional
systems it is seen that elaborate file- and data- structures are
introduced to improve the response time of the system. These
consequently lead to software and hardware redundancies. To mask
these complexities of the system an expensive computer with higher
capabilities and more powerful instruction set is commonly used.
Thus the service of the systen becomes cost-ineffective.
On the other hand the primitive operations of a searching mechanism,
such as, association, domain selection, intersection and unions, are
the intrinsic features of an associative parallel processor. Therefore
it is important to establish the feasibility of an APP as a cost-effective
searching mechanise.
In this thesis a searching mechanism using an 'ON-THE-FLY' searching
technique has been proposed. The parallel search unit uses a Byte-oriented
VRL-APP for efficient character string processing.
At the time of undertaking this work the specification for neither the
retrieval systems nor the BO-VRL APP's were well established; hence a
two-phase investigation was originated. In the Phase I of the work a
bottom up approach was adopted to derive a formal and precise
specification for the BO-VRL-APP. During the Phase II of the work
a top-down approach was opted for the implementation of the searching
mechanism.
An experimental research vehicle has been developed to establish
the feasibility of an APP as a cost-effective searching mechanism.
Although rigorous proof of the feasibility has not been obtained,
the thesis establishes that the APP is well suited for on-line
bibligraphic information retrieval operations where substring searches
including boolean selection and threshold weights are efficiently
supported
Storage Solutions for Big Data Systems: A Qualitative Study and Comparison
Big data systems development is full of challenges in view of the variety of
application areas and domains that this technology promises to serve.
Typically, fundamental design decisions involved in big data systems design
include choosing appropriate storage and computing infrastructures. In this age
of heterogeneous systems that integrate different technologies for optimized
solution to a specific real world problem, big data system are not an exception
to any such rule. As far as the storage aspect of any big data system is
concerned, the primary facet in this regard is a storage infrastructure and
NoSQL seems to be the right technology that fulfills its requirements. However,
every big data application has variable data characteristics and thus, the
corresponding data fits into a different data model. This paper presents
feature and use case analysis and comparison of the four main data models
namely document oriented, key value, graph and wide column. Moreover, a feature
analysis of 80 NoSQL solutions has been provided, elaborating on the criteria
and points that a developer must consider while making a possible choice.
Typically, big data storage needs to communicate with the execution engine and
other processing and visualization technologies to create a comprehensive
solution. This brings forth second facet of big data storage, big data file
formats, into picture. The second half of the research paper compares the
advantages, shortcomings and possible use cases of available big data file
formats for Hadoop, which is the foundation for most big data computing
technologies. Decentralized storage and blockchain are seen as the next
generation of big data storage and its challenges and future prospects have
also been discussed
- …