12,666 research outputs found

    Deriving query suggestions for site search

    Get PDF
    Modern search engines have been moving away from simplistic interfaces that aimed at satisfying a user's need with a single-shot query. Interactive features are now integral parts of web search engines. However, generating good query modification suggestions remains a challenging issue. Query log analysis is one of the major strands of work in this direction. Although much research has been performed on query logs collected on the web as a whole, query log analysis to enhance search on smaller and more focused collections has attracted less attention, despite its increasing practical importance. In this article, we report on a systematic study of different query modification methods applied to a substantial query log collected on a local website that already uses an interactive search engine. We conducted experiments in which we asked users to assess the relevance of potential query modification suggestions that have been constructed using a range of log analysis methods and different baseline approaches. The experimental results demonstrate the usefulness of log analysis to extract query modification suggestions. Furthermore, our experiments demonstrate that a more fine-grained approach than grouping search requests into sessions allows for extraction of better refinement terms from query log files. © 2013 ASIS&T

    A Local Algorithm for Constructing Spanners in Minor-Free Graphs

    Get PDF
    Constructing a spanning tree of a graph is one of the most basic tasks in graph theory. We consider this problem in the setting of local algorithms: one wants to quickly determine whether a given edge ee is in a specific spanning tree, without computing the whole spanning tree, but rather by inspecting the local neighborhood of ee. The challenge is to maintain consistency. That is, to answer queries about different edges according to the same spanning tree. Since it is known that this problem cannot be solved without essentially viewing all the graph, we consider the relaxed version of finding a spanning subgraph with (1+ϵ)n(1+\epsilon)n edges (where nn is the number of vertices and ϵ\epsilon is a given sparsity parameter). It is known that this relaxed problem requires inspecting Ω(n)\Omega(\sqrt{n}) edges in general graphs, which motivates the study of natural restricted families of graphs. One such family is the family of graphs with an excluded minor. For this family there is an algorithm that achieves constant success probability, and inspects (d/ϵ)poly(h)log(1/ϵ)(d/\epsilon)^{poly(h)\log(1/\epsilon)} edges (for each edge it is queried on), where dd is the maximum degree in the graph and hh is the size of the excluded minor. The distances between pairs of vertices in the spanning subgraph GG' are at most a factor of poly(d,1/ϵ,h)poly(d, 1/\epsilon, h) larger than in GG. In this work, we show that for an input graph that is HH-minor free for any HH of size hh, this task can be performed by inspecting only poly(d,1/ϵ,h)poly(d, 1/\epsilon, h) edges. The distances between pairs of vertices in the spanning subgraph GG' are at most a factor of O~(hlog(d)/ϵ)\tilde{O}(h\log(d)/\epsilon) larger than in GG. Furthermore, the error probability of the new algorithm is significantly improved to Θ(1/n)\Theta(1/n). This algorithm can also be easily adapted to yield an efficient algorithm for the distributed setting

    Towards a Universal Wordnet by Learning from Combined Evidenc

    Get PDF
    Lexical databases are invaluable sources of knowledge about words and their meanings, with numerous applications in areas like NLP, IR, and AI. We propose a methodology for the automatic construction of a large-scale multilingual lexical database where words of many languages are hierarchically organized in terms of their meanings and their semantic relations to other words. This resource is bootstrapped from WordNet, a well-known English-language resource. Our approach extends WordNet with around 1.5 million meaning links for 800,000 words in over 200 languages, drawing on evidence extracted from a variety of resources including existing (monolingual) wordnets, (mostly bilingual) translation dictionaries, and parallel corpora. Graph-based scoring functions and statistical learning techniques are used to iteratively integrate this information and build an output graph. Experiments show that this wordnet has a high level of precision and coverage, and that it can be useful in applied tasks such as cross-lingual text classification

    TopicViz: Semantic Navigation of Document Collections

    Full text link
    When people explore and manage information, they think in terms of topics and themes. However, the software that supports information exploration sees text at only the surface level. In this paper we show how topic modeling -- a technique for identifying latent themes across large collections of documents -- can support semantic exploration. We present TopicViz, an interactive environment for information exploration. TopicViz combines traditional search and citation-graph functionality with a range of novel interactive visualizations, centered around a force-directed layout that links documents to the latent themes discovered by the topic model. We describe several use scenarios in which TopicViz supports rapid sensemaking on large document collections

    SODA: Generating SQL for Business Users

    Full text link
    The purpose of data warehouses is to enable business analysts to make better decisions. Over the years the technology has matured and data warehouses have become extremely successful. As a consequence, more and more data has been added to the data warehouses and their schemas have become increasingly complex. These systems still work great in order to generate pre-canned reports. However, with their current complexity, they tend to be a poor match for non tech-savvy business analysts who need answers to ad-hoc queries that were not anticipated. This paper describes the design, implementation, and experience of the SODA system (Search over DAta Warehouse). SODA bridges the gap between the business needs of analysts and the technical complexity of current data warehouses. SODA enables a Google-like search experience for data warehouses by taking keyword queries of business users and automatically generating executable SQL. The key idea is to use a graph pattern matching algorithm that uses the metadata model of the data warehouse. Our results with real data from a global player in the financial services industry show that SODA produces queries with high precision and recall, and makes it much easier for business users to interactively explore highly-complex data warehouses.Comment: VLDB201
    corecore