37,641 research outputs found
Search Result Clustering via Randomized Partitioning of Query-Induced Subgraphs
In this paper, we present an approach to search result clustering, using
partitioning of underlying link graph. We define the notion of "query-induced
subgraph" and formulate the problem of search result clustering as a problem of
efficient partitioning of given subgraph into topic-related clusters. Also, we
propose a novel algorithm for approximative partitioning of such graph, which
results in cluster quality comparable to the one obtained by deterministic
algorithms, while operating in more efficient computation time, suitable for
practical implementations. Finally, we present a practical clustering search
engine developed as a part of this research and use it to get results about
real-world performance of proposed concepts.Comment: 16th Telecommunications Forum TELFOR 200
Recipe Suggestion Tool
There is currently a great need for a tool to search cooking recipes based on ingredients, country and recipe type. Current search engines do not provide this feature. Most of the recipe search results in current websites are not efficiently clustered based on relevance or categories resulting in a user getting lost in the huge search results presented. They also do not provide links to view images of the ingredients of a recipe.
My project aims to combine the features like search based on ingredients, suggestions for similar recipes, and images for the ingredients under one search engine and provide an intuitive interface for the same. I explored different clustering algorithms to find an efficient algorithm that can be used to cluster recipe data matching user\u27s queries. As part of this project, I also built FreeText search it help users can search Recipes by ingredients, country and recipe type. I created few charts for users to understand which ingredients are used more in recipes and which country ingredients are more. This website also provides articles to users for making tasty recipes. In this article page users can comment and rate the article. Our website is deployed to Microsoft azure platform
TopSig: Topology Preserving Document Signatures
Performance comparisons between File Signatures and Inverted Files for text
retrieval have previously shown several significant shortcomings of file
signatures relative to inverted files. The inverted file approach underpins
most state-of-the-art search engine algorithms, such as Language and
Probabilistic models. It has been widely accepted that traditional file
signatures are inferior alternatives to inverted files. This paper describes
TopSig, a new approach to the construction of file signatures. Many advances in
semantic hashing and dimensionality reduction have been made in recent times,
but these were not so far linked to general purpose, signature file based,
search engines. This paper introduces a different signature file approach that
builds upon and extends these recent advances. We are able to demonstrate
significant improvements in the performance of signature file based indexing
and retrieval, performance that is comparable to that of state of the art
inverted file based systems, including Language models and BM25. These findings
suggest that file signatures offer a viable alternative to inverted files in
suitable settings and from the theoretical perspective it positions the file
signatures model in the class of Vector Space retrieval models.Comment: 12 pages, 8 figures, CIKM 201
Efficient Diversification of Web Search Results
In this paper we analyze the efficiency of various search results
diversification methods. While efficacy of diversification approaches has been
deeply investigated in the past, response time and scalability issues have been
rarely addressed. A unified framework for studying performance and feasibility
of result diversification solutions is thus proposed. First we define a new
methodology for detecting when, and how, query results need to be diversified.
To this purpose, we rely on the concept of "query refinement" to estimate the
probability of a query to be ambiguous. Then, relying on this novel ambiguity
detection method, we deploy and compare on a standard test set, three different
diversification methods: IASelect, xQuAD, and OptSelect. While the first two
are recent state-of-the-art proposals, the latter is an original algorithm
introduced in this paper. We evaluate both the efficiency and the effectiveness
of our approach against its competitors by using the standard TREC Web
diversification track testbed. Results shown that OptSelect is able to run two
orders of magnitude faster than the two other state-of-the-art approaches and
to obtain comparable figures in diversification effectiveness.Comment: VLDB201
- …