16 research outputs found
Content Based Document Recommender using Deep Learning
With the recent advancements in information technology there has been a huge
surge in amount of data available. But information retrieval technology has not
been able to keep up with this pace of information generation resulting in over
spending of time for retrieving relevant information. Even though systems exist
for assisting users to search a database along with filtering and recommending
relevant information, but recommendation system which uses content of documents
for recommendation still have a long way to mature. Here we present a Deep
Learning based supervised approach to recommend similar documents based on the
similarity of content. We combine the C-DSSM model with Word2Vec distributed
representations of words to create a novel model to classify a document pair as
relevant/irrelavant by assigning a score to it. Using our model retrieval of
documents can be done in O(1) time and the memory complexity is O(n), where n
is number of documents.Comment: Accepted in ICICI 2017, Coimbatore, Indi
Cross-Document Pattern Matching
We study a new variant of the string matching problem called cross-document
string matching, which is the problem of indexing a collection of documents to
support an efficient search for a pattern in a selected document, where the
pattern itself is a substring of another document. Several variants of this
problem are considered, and efficient linear-space solutions are proposed with
query time bounds that either do not depend at all on the pattern size or
depend on it in a very limited way (doubly logarithmic). As a side result, we
propose an improved solution to the weighted level ancestor problem
Dynamic Range Majority Data Structures
Given a set of coloured points on the real line, we study the problem of
answering range -majority (or "heavy hitter") queries on . More
specifically, for a query range , we want to return each colour that is
assigned to more than an -fraction of the points contained in . We
present a new data structure for answering range -majority queries on a
dynamic set of points, where . Our data structure uses O(n)
space, supports queries in time, and updates in amortized time. If the coordinates of the points are integers,
then the query time can be improved to . For constant values of , this improved query
time matches an existing lower bound, for any data structure with
polylogarithmic update time. We also generalize our data structure to handle
sets of points in d-dimensions, for , as well as dynamic arrays, in
which each entry is a colour.Comment: 16 pages, Preliminary version appeared in ISAAC 201