298 research outputs found
Half-quadratic transportation problems
We present a primal--dual memory efficient algorithm for solving a relaxed
version of the general transportation problem. Our approach approximates the
original cost function with a differentiable one that is solved as a sequence
of weighted quadratic transportation problems. The new formulation allows us to
solve differentiable, non-- convex transportation problems
Content Based Document Recommender using Deep Learning
With the recent advancements in information technology there has been a huge
surge in amount of data available. But information retrieval technology has not
been able to keep up with this pace of information generation resulting in over
spending of time for retrieving relevant information. Even though systems exist
for assisting users to search a database along with filtering and recommending
relevant information, but recommendation system which uses content of documents
for recommendation still have a long way to mature. Here we present a Deep
Learning based supervised approach to recommend similar documents based on the
similarity of content. We combine the C-DSSM model with Word2Vec distributed
representations of words to create a novel model to classify a document pair as
relevant/irrelavant by assigning a score to it. Using our model retrieval of
documents can be done in O(1) time and the memory complexity is O(n), where n
is number of documents.Comment: Accepted in ICICI 2017, Coimbatore, Indi
Does the Geometry of Word Embeddings Help Document Classification? A Case Study on Persistent Homology Based Representations
We investigate the pertinence of methods from algebraic topology for text
data analysis. These methods enable the development of
mathematically-principled isometric-invariant mappings from a set of vectors to
a document embedding, which is stable with respect to the geometry of the
document in the selected metric space. In this work, we evaluate the utility of
these topology-based document representations in traditional NLP tasks,
specifically document clustering and sentiment classification. We find that the
embeddings do not benefit text analysis. In fact, performance is worse than
simple techniques like , indicating that the geometry of the
document does not provide enough variability for classification on the basis of
topic or sentiment in the chosen datasets.Comment: 5 pages, 3 figures. Rep4NLP workshop at ACL 201
Neural Networks for Information Retrieval
Machine learning plays a role in many aspects of modern IR systems, and deep
learning is applied in all of them. The fast pace of modern-day research has
given rise to many different approaches for many different IR problems. The
amount of information available can be overwhelming both for junior students
and for experienced researchers looking for new research topics and directions.
Additionally, it is interesting to see what key insights into IR problems the
new technologies are able to give us. The aim of this full-day tutorial is to
give a clear overview of current tried-and-trusted neural methods in IR and how
they benefit IR research. It covers key architectures, as well as the most
promising future directions.Comment: Overview of full-day tutorial at SIGIR 201
Soft Seeded SSL Graphs for Unsupervised Semantic Similarity-based Retrieval
Semantic similarity based retrieval is playing an increasingly important role
in many IR systems such as modern web search, question-answering, similar
document retrieval etc. Improvements in retrieval of semantically similar
content are very significant to applications like Quora, Stack Overflow, Siri
etc. We propose a novel unsupervised model for semantic similarity based
content retrieval, where we construct semantic flow graphs for each query, and
introduce the concept of "soft seeding" in graph based semi-supervised learning
(SSL) to convert this into an unsupervised model.
We demonstrate the effectiveness of our model on an equivalent question
retrieval problem on the Stack Exchange QA dataset, where our unsupervised
approach significantly outperforms the state-of-the-art unsupervised models,
and produces comparable results to the best supervised models. Our research
provides a method to tackle semantic similarity based retrieval without any
training data, and allows seamless extension to different domain QA
communities, as well as to other semantic equivalence tasks.Comment: Published in Proceedings of the 2017 ACM Conference on Information
and Knowledge Management (CIKM '17
- …