Search CORE

104,320 research outputs found

Relevance ranking is not relevance ranking or, when the user is not the user, the search results are not search results

Author: David Bade
Publication venue: 'Emerald'
Publication date
Field of study

From document to entity retrieval : improving precision and performance of focused text search

Author: Rode Henning
Publication venue: University of Twente
Publication date: 01/01/2008
Field of study

Text retrieval is an active area of research since decades. Several issues have\ud been studied over the entire period, like the development of statistical models\ud for the estimation of relevance, or the challenge to keep retrieval tasks efficient with ever growing text collections. Especially in the last decade, we have also seen a diversification of retrieval tasks. Passage or XML retrieval systems allow a more focused search. Question answering or expert search systems\ud do not even return a ranked list of text units, but for instance persons with expertise on a given topic. The sketched situation forms the starting point of this thesis, which presents a number of task-specific search solutions and tries to set them into more generic frameworks. In particular, we take a look at the three areas (1) context adaptivity of search, (2) efficient XML retrieval, and (3) entity ranking.\ud In the first case, we show how different types of context information can\ud be incorporated in the retrieval of documents. When users are searching for\ud information, the search task is typically part of a wider working process. This\ud search context, however, is often not reflected by the few search keywords\ud stated to the retrieval system, though it can contain valuable information for\ud query refinement. We address with this work two research questions related\ud to the aim of developing context-aware retrieval systems. First, we show\ud how already available information about the user’s context can be employed\ud effectively to gain highly precise search results. Second, we investigate how\ud such meta-data about the search context can be gathered. The proposed\ud “query profiles” have a central role in the query refinement process. They\ud automatically detect necessary context information and help the user to explicitly\ud express context-dependent search constraints. The effectiveness of\ud the approach is tested with retrieval experiments on newspaper data.\ud When documents are not regarded as a simple sequence of words, but their content is structured in a machine readable form, it is attractive to\ud try to develop retrieval systems that make use of the additional structure\ud information. Structured retrieval first asks for the design of a suitable language\ud that enables the user to express queries on content and structure. We\ud investigate here existing query languages, whether and how they support\ud the basic needs of structured querying. However, our main focus lies on the\ud efficiency of structured retrieval systems. Conventional inverted indices for\ud document retrieval systems are not suitable for maintaining structure indices.\ud We identify base operations involved in the execution of structured queries\ud and show how they can be supported by new indices and algorithms on a\ud database system. Efficient query processing has to be concerned with the\ud optimization of query plans as well. We investigate low-level query plans of\ud physical database operators for the execution of simple query patterns. Furthermore,\ud It is demonstrated how complex queries benefit from higher level\ud query optimization.\ud New search tasks and interfaces for the presentation of search results,\ud like faceted search applications, question answering, expert search, and automatic\ud timeline construction, come with the need to rank entities instead of\ud documents. By entities we mean unique (named) existences, such as persons,\ud organizations or dates. Modern language processing tools are able to automatically\ud detect and categorize named entities in large text collections. In\ud order to estimate their relevance to a given search topic, we develop retrieval\ud models for entities which are based on the relevance of texts that mention the\ud entity. A graph-based relevance propagation framework is introduced for this\ud purpose that enables to derive the relevance of entities. Several options for\ud the modeling of entity containment graphs and different relevance propagation\ud approaches are tested, demonstrating the usefulness of the graph-based\ud ranking framework

University of Twente Research Information

Search Engine Optimisation. PageRank best Practices

Author: Ferré Viñes Neus
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/07/2008
Field of study

Projecte realitzat en col.laboració amb el centre RWTH AachenSince the explosion of the Internet age the need of search online information has grown as well at the light velocity. As a consequent, new marketing disciplines arise in the digital world. This thesis describes, in the search engine marketing framework, how the ranking in the search engine results page (SERP) can be influenced. Wikipedia describes search engine marketing or SEM as a form of Internet marketing that seeks to promote websites by increasing their visibility in search engine result pages (SERPs). Therefore, the importance of being searchable and visible to the users reveal needs of improvement for the website designers. Different factors are used to produce search rankings. One of them is PageRank. The present thesis focuses on how PageRank of Google makes use of the linking structure of the Web in order to maximise relevance of the results in a web search. PageRank used to be the jigsaw of webmasters because of the secrecy it used to have. The formula that lies behind PageRank enabled the founders of Google to convert a PhD into one of the most successful companies ever. The uniqueness of PageRank in contrast to other Web Search Engines consist in providing the user with the greatest relevance of the results for a specific query, thus providing the most satisfactory user experience. Google does use PageRank as part of their ranking formula. Although it is not as important as many believe, it is nevertheless a measure of a web page’s popularity, and gives a certain indication on how “important” Google considers a page to be. The goal of search marketing is being visible to the end user. Two different fields within search marketing can be pointed out: Search Engine Optimisation and search engine marketing. This study focuses on the first one, Search Engine Optimisation, which refers to all types of initiatives and actions taken by website designers in order to increase the relevance for the Search Engines. It is about design, optimising content, linking structure (internal and external) and other page specific factors. Because of the predominance of Google, this thesis looks at which steps can be taken in a certain website when trying to be optimized for Google’s algorithm PageRank. Moreover, other factors which also have an influence are analyzed

UPCommons. Portal del coneixement obert de la UPC

News vertical search using user-generated content

Author: McCreadie Richard
Publication venue
Publication date: 01/01/2012
Field of study

The thesis investigates how content produced by end-users on the World Wide Web — referred to as user-generated content — can enhance the news vertical aspect of a universal Web search engine, such that news-related queries can be satisfied more accurately, comprehensively and in a more timely manner. We propose a news search framework to describe the news vertical aspect of a universal web search engine. This framework is comprised of four components, each providing a different piece of functionality. The Top Events Identification component identifies the most important events that are happening at any given moment using discussion in user-generated content streams. The News Query Classification component classifies incoming queries as news-related or not in real-time. The Ranking News-Related Content component finds and ranks relevant content for news-related user queries from multiple streams of news and user-generated content. Finally, the News-Related Content Integration component merges the previously ranked content for the user query into theWeb search ranking. In this thesis, we argue that user-generated content can be leveraged in one or more of these components to better satisfy news-related user queries. Potential enhancements include the faster identification of news queries relating to breaking news events, more accurate classification of news-related queries, increased coverage of the events searched for by the user or increased freshness in the results returned. Approaches to tackle each of the four components of the news search framework are proposed, which aim to leverage user-generated content. Together, these approaches form the news vertical component of a universal Web search engine. Each approach proposed for a component is thoroughly evaluated using one or more datasets developed for that component. Conclusions are derived concerning whether the use of user-generated content enhances the component in question using an appropriate measure, namely: effectiveness when ranking events by their current importance/newsworthiness for the Top Events Identification component; classification accuracy over different types of query for the News Query Classification component; relevance of the documents returned for the Ranking News-Related Content component; and end-user preference for rankings integrating user-generated content in comparison to the unalteredWeb search ranking for the News-Related Content Integration component. Analysis of the proposed approaches themselves, the effective settings for the deployment of those approaches and insights into their behaviour are also discussed. In particular, the evaluation of the Top Events Identification component examines how effectively events — represented by newswire articles — can be ranked by their importance using two different streams of user-generated content, namely blog posts and Twitter tweets. Evaluation of the proposed approaches for this component indicates that blog posts are an effective source of evidence to use when ranking events and that these approaches achieve state-of-the-art effectiveness. Using the same approaches instead driven by a stream of tweets, provide a story ranking performance that is significantly more effective than random, but is not consistent across all of the datasets and approaches tested. Insights are provided into the reasons for this with regard to the transient nature of discussion in Twitter. Through the evaluation of the News Query Classification component, we show that the use of timely features extracted from different news and user-generated content sources can increase the accuracy of news query classification over relying upon newswire provider streams alone. Evidence also suggests that the usefulness of the user-generated content sources varies as news events mature, with some sources becoming more influential over time as new content is published, leading to an upward trend in classification accuracy. The Ranking News-Related Content component evaluation investigates how to effectively rank content from the blogosphere and Twitter for news-related user queries. Of the approaches tested, we show that learning to rank approaches using features specific to blog posts/tweets lead to state-of-the-art ranking effectiveness under real-time constraints. Finally this thesis demonstrates that the majority of end-users prefer rankings integrated with usergenerated content for news-related queries to rankings containing only Web search results or integrated with only newswire articles. Of the user-generated content sources tested, the most popular source is shown to be Twitter, particularly for queries relating to breaking events. The central contributions of this thesis are the introduction of a news search framework, the approaches to tackle each of the four components of the framework that integrate user-generated content and their subsequent evaluation in a simulated real-time setting. This thesis draws insights from a broad range of experiments spanning the entire search process for news-related queries. The experiments reported in this thesis demonstrate the potential and scope for enhancements that can be brought about by the leverage of user-generated content for real-time news search and related applications

Glasgow Theses Service

CiteSeerX

Recommended from our members

Clustering Information Retrieval Search Outputs

Author: Kural S.
Publication venue
Publication date
Field of study

Users are known to have difficulties in dealing with information retrieval search outputs especially if the outputs are above a certain size. It has been argued by several researchers that search output clustering can help users in their interaction with IR systems. Clustering may provide users an overview of the output by exploiting the topicality information that resides in the output but has not been used in the retrieval stage. It can enable them to find the relevant documents more easily and also help them to form an understanding of the different facets of the query that have been provided for their Inspection. This project aimed to investigate the viability of using clustering as a way of mediating users’ interaction with search outputs and attempted to identify its possible benefits. Can&Ozkarahan’s(90) C3M algorithm was used to test the effectiveness of clustering as a way of search output presentation. C3M is a relatively simple, non-hierarchical method that has been shown to give compatible or superior results to best-known hierarchical methods. The method was implemented in TCL and linked to the department’s experimental IR system Okapi. Implementation included a procedure of term selection for document representation which preceded the clustering process and a procedure involving cluster representation for users’ viewing following the clustering process. After some tuning of the implementation parameters for the databases used, several experiments were designed and conducted to assess whether clusters could group documents in useful ways. One group of experiments aimed to assess the ability of the implementation to bring together topically related documents. It was quite difficult to gather data for such an assessment, but the existence of a set of data generated for TREC Interactive track(1996) enabled us to design experiments that at least approximately satisfied our objective. TREC provided a set of queries, and groups of relevant documents with facet assignments made by expert users. It was thus possible to make an Inference by measuring the correlation between the clusters relevant documents were assigned to and the facet assignments made for the documents by TREC experts. The utility of this data set was limited for various reasons discussed in the related chapters, however, it can be concluded that clusters cannot be relied on to bring together relevant documents assigned to a certain facet. While there was some correlation between the cluster and facet assignments of the documents when the clustering was done only on relevant documents, no correlation could be found when the clustering was based on results of queries defined by City participants to the Interactive track. Another group of experiments was conducted to compare output clustering with relevance ranking as a search output representation method. This comparison was necessary as an immediate consequence of clustering search output would be the loss of relevance ranking. It had to be assessed whether clustering could help users to find the relevant documents more easily than by relevance ranking, before any clustering solution could be proposed as an alternative to relevance ranked output. For this purpose, two sets of user experiments(n=20 and n=57) were conducted based on the users’ own information needs. While changes have been made to the implementation between the first and the second set of experiments, the experimental design was almost the same in both runs. Users were first asked to rank clusters formed from the search output(top 50 documents) and then make relevance judgements for the individual documents for the same output. The precision of cluster(s) marked best by the users were then compared to precision values that would be attained by relevance ranking at comparable thresholds. The results from the 1st group of user experiments were not conclusive(in some part due to the smallness of the data set), but they drew our attention to the importance of representation of clusters and documents for users’ viewing. After some changes to the implementation, mainly related to representation issues, and an intermediate set of 10 experiments to assess two new representation formats, a set of 57 user experiments were conducted to measure and compare precision values attainable by clustering versus relevance ranking. These experiments revealed no significant precision difference between clustered outputs and ranked lists. The number of cases where one method achieved better than the other was slightly higher for the ranked lists at the top cluster level and slightly higher for the clustered representation at the top two clusters level. However the overall average precision values were higher for the ranked list at both levels. As such, clustering did not appear to be preferable to ranked lists especially as It also represented overheads in both computing time and resources involved in creation of the clusters, and the time and effort taken by the users to inspect them. An interesting outcome of the user experiments was the ability of the users to identify clusters that do not include relevant information. There were less relevant documents among the clusters marked last by the users as compared to the documents ranked last at similar threshold levels. This brought out the possibility of using clusters as an exclusion tool to improve the precision of ranked lists. After exclusion of documents from the last cluster, ranked lists performed significantly better than the clusters at the top cluster level. There was also some evidence (consisting of observation of users during the experiments and a few user comments) that clusters could be used to provide the users with a glimpse of the search results, in order to decide whether to inspect the search results or initiate a new query straight away. In summary, cumulative experiment results imply that clustering cannot outperform relevance ranking, and seems to deserve only a secondary role in users’ interaction with IR systems. However, it should also be noted that the experiment results are not representative of the whole set of possible user types and search situations and it may be possible to Identify search situations where clustering can be more beneficial than relevance ranking

City Research Online

Sharpening the Search Saw: Lessons from Expert Searchers

Author: Tucker Virginia M.
Publication venue: SJSU ScholarWorks
Publication date: 01/01/2015
Field of study

Many students consider themselves to be proficient searchers and yet are disappointed or frustrated when faced with the task of locating relevant scholarly articles for a literature review. This bleak experience is common among higher education students, even for those in library and information science programs who have heightened appreciation for information resources and yet may settle for “good enough Googling” (Plosker, 2004, p. 34). This is in large part due to reliance on web search engines that have evolved relevance ranking into a vastly intelligent business, one in which we are both its customers and product (Vaidhyanathan, 2011). Google’s Hummingbird nest of search algorithms (Sullivan, 2013) provides quick and targeted hits, yet it can trigger blinders-on trust in first-page results. Concern for student search practices ranges from this permissive trust all the way to lost ability to recall facts and formulate questions (Abilock, 2015), lack of confidence in one’s own knowledge (Carr, 2010), and increased dependence on single search boxes that encourage stream-of-consciousness user input (Tucker, 2013); indeed, students may be high in tech savvy but lacking the critical thinking skills needed for information research tasks (Katz, 2007). Students have come to rely on web search engine intelligence—and it is inarguably colossal—to such an extent that they may fail to formulate a question before charging forward to search for its answer. “Google is known as a search engine, yet there is barely any searching involved anymore. The gap between a question crystallizing in your mind and an answer appearing at the top of your screen is shrinking all the time. As a consequence, our ability to ask questions is atrophying” (Leslie, 2015, para. 4). Highly accomplished students often lament their lack of skills for higher-level searching that calls for formulating pointed questions when struggling to develop a solid literature review. In addition, many are unaware that search results are filtered based on previous searches, location, and other factors extracted from personal search patterns by the search engine. Two students working side by side and entering the same search terms may receive quite different results on Google, yet the extent to which this ‘filter bubble’ (Pariser, 2011) is personalizing their search results is difficult to assess and to overcome. Just as important, it can be impossible to know what a search might be missing: how to know what’s not there? This portrayal of the information landscape may appear gloomy but, in fact, it could not be a more inspiring environment in which to do research, to find connections in ideas, and to benefit from and generate new ideas. A few lessons from expert searchers, focused on critical concepts and search practices, can sharpen a student’s search saw and move the proficient student-researcher, desiring more relevant and comprehensive search results, into a trajectory toward search expertise. For the lessons involved in this journey, the focus is on two areas: first, the critical concepts— called threshold concepts (Meyer & Land, 2003)— found to be necessary for developing search expertise (Tucker et al., 2014); and, second, four strategic areas within search that can have significant and immediate impact on improving search results for research literature. The latter are grounded in the threshold concepts and positioned for application to literature reviews for graduate student studies

Queensland University of Technology ePrints Archive

SJSU ScholarWorks

The use of implicit evidence for relevance feedback in web retrieval

Author: A. Spink
B.J. Jansen
G. Salton
J. Grundin
J.A. Konstan
M. Beaulieu
S. E. Maxwell
S.E. Robertson
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2002
Field of study

In this paper we report on the application of two contrasting types of relevance feedback for web retrieval. We compare two systems; one using explicit relevance feedback (where searchers explicitly have to mark documents relevant) and one using implicit relevance feedback (where the system endeavours to estimate relevance by mining the searcher's interaction). The feedback is used to update the display according to the user's interaction. Our research focuses on the degree to which implicit evidence of document relevance can be substituted for explicit evidence. We examine the two variations in terms of both user opinion and search effectiveness

CiteSeerX

Crossref

University of Strathclyde Institutional Repository

Enlighten

Contextualised Browsing in a Digital Library's Living Lab

Author: Belkin Nicholas J.
Carevic Zeljko
Kanoulas Evangelos
Mayr Philipp
Pharo Nils
Sepliarskaia Anna
White Ryen W
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/04/2018
Field of study

Contextualisation has proven to be effective in tailoring \linebreak search results towards the users' information need. While this is true for a basic query search, the usage of contextual session information during exploratory search especially on the level of browsing has so far been underexposed in research. In this paper, we present two approaches that contextualise browsing on the level of structured metadata in a Digital Library (DL), (1) one variant bases on document similarity and (2) one variant utilises implicit session information, such as queries and different document metadata encountered during the session of a users. We evaluate our approaches in a living lab environment using a DL in the social sciences and compare our contextualisation approaches against a non-contextualised approach. For a period of more than three months we analysed 47,444 unique retrieval sessions that contain search activities on the level of browsing. Our results show that a contextualisation of browsing significantly outperforms our baseline in terms of the position of the first clicked item in the result set. The mean rank of the first clicked document (measured as mean first relevant - MFR) was 4.52 using a non-contextualised ranking compared to 3.04 when re-ranking the result lists based on similarity to the previously viewed document. Furthermore, we observed that both contextual approaches show a noticeably higher click-through rate. A contextualisation based on document similarity leads to almost twice as many document views compared to the non-contextualised ranking.Comment: 10 pages, 2 figures, paper accepted at JCDL 201

arXiv.org e-Print Archive

Crossref