103,578 research outputs found
Extracting Hierarchies of Search Tasks & Subtasks via a Bayesian Nonparametric Approach
A significant amount of search queries originate from some real world
information need or tasks. In order to improve the search experience of the end
users, it is important to have accurate representations of tasks. As a result,
significant amount of research has been devoted to extracting proper
representations of tasks in order to enable search systems to help users
complete their tasks, as well as providing the end user with better query
suggestions, for better recommendations, for satisfaction prediction, and for
improved personalization in terms of tasks. Most existing task extraction
methodologies focus on representing tasks as flat structures. However, tasks
often tend to have multiple subtasks associated with them and a more
naturalistic representation of tasks would be in terms of a hierarchy, where
each task can be composed of multiple (sub)tasks. To this end, we propose an
efficient Bayesian nonparametric model for extracting hierarchies of such tasks
\& subtasks. We evaluate our method based on real world query log data both
through quantitative and crowdsourced experiments and highlight the importance
of considering task/subtask hierarchies.Comment: 10 pages. Accepted at SIGIR 2017 as a full pape
Distributed resource discovery using a context sensitive infrastructure
Distributed Resource Discovery in a World Wide Web environment using full-text indices will never scale. The distinct properties of WWW information (volume, rate of change, topical diversity) limits the scaleability of traditional approaches to distributed Resource Discovery. An approach combining metadata clustering and query routing can, on the other hand, be proven to scale much better. This paper presents the Content-Sensitive Infrastructure, which is a design building on these results. We also present an analytical framework for comparing scaleability of different distribution strategies
Reasoning & Querying ā State of the Art
Various query languages for Web and Semantic Web data, both for practical use and as an area of research in the scientific community, have emerged in recent years. At the same time, the broad adoption of the internet where keyword search is used in many applications, e.g. search engines, has familiarized casual users with using keyword queries to retrieve information on the internet. Unlike this easy-to-use querying, traditional query languages require knowledge of the language itself as well as of the data to be queried. Keyword-based query languages for XML and RDF bridge the gap between the two, aiming at enabling simple querying of semi-structured data, which is relevant e.g. in the context of the emerging Semantic Web. This article presents an overview of the field of keyword querying for XML and RDF
Image retrieval by hypertext links
This paper presents a model for retrieval of images from a large World Wide Web based collection. Rather than considering complex visual recognition algorithms, the model presented is based on combining evidence of the text content and hypertext structure of the Web. The paper shows that certain types of query are amply served by this form of representation. It also presents a novel means of gathering relevance judgements
The egalitarian effect of search engines
Search engines have become key media for our scientific, economic, and social
activities by enabling people to access information on the Web in spite of its
size and complexity. On the down side, search engines bias the traffic of users
according to their page-ranking strategies, and some have argued that they
create a vicious cycle that amplifies the dominance of established and already
popular sites. We show that, contrary to these prior claims and our own
intuition, the use of search engines actually has an egalitarian effect. We
reconcile theoretical arguments with empirical evidence showing that the
combination of retrieval by search engines and search behavior by users
mitigates the attraction of popular pages, directing more traffic toward less
popular sites, even in comparison to what would be expected from users randomly
surfing the Web.Comment: 9 pages, 8 figures, 2 appendices. The final version of this e-print
has been published on the Proc. Natl. Acad. Sci. USA 103(34), 12684-12689
(2006), http://www.pnas.org/cgi/content/abstract/103/34/1268
vSPARQL: A View Definition Language for the Semantic Web
Translational medicine applications would like to leverage the biological and biomedical ontologies, vocabularies, and data sets available on the semantic web. We present a general solution for RDF information set reuse inspired by database views. Our view definition language, vSPARQL, allows applications to specify the exact content that they are interested in and how that content should be restructured or modified. Applications can access relevant content by querying against these view definitions. We evaluate the expressivity of our approach by defining views for practical use cases and comparing our view definition language to existing query languages
Artequakt: Generating tailored biographies from automatically annotated fragments from the web
The Artequakt project seeks to automatically generate narrativebiographies of artists from knowledge that has been extracted from the Web and maintained in a knowledge base. An overview of the system architecture is presented here and the three key components of that architecture are explained in detail, namely knowledge extraction, information management and biography construction. Conclusions are drawn from the initial experiences of the project and future progress is detailed
- ā¦