1,060 research outputs found

    New perspectives on Web search engine research

    Get PDF
    Purpose–The purpose of this chapter is to give an overview of the context of Web search and search engine-related research, as well as to introduce the reader to the sections and chapters of the book. Methodology/approach–We review literature dealing with various aspects of search engines, with special emphasis on emerging areas of Web searching, search engine evaluation going beyond traditional methods, and new perspectives on Webs earching. Findings–The approaches to studying Web search engines are manifold. Given the importance of Web search engines for knowledge acquisition, research from different perspectives needs to be integrated into a more cohesive perspective. Researchlimitations/implications–The chapter suggests a basis for research in the field and also introduces further research directions. Originality/valueofpaper–The chapter gives a concise overview of the topics dealt with in the book and also shows directions for researchers interested in Web search engines

    An Empirical Examination of the Associations between Social Tags and Web Queries

    Get PDF
    Introduction. We aim to discover the associations between social tags for a Web page and Web queries that would retrieve the same Webpage in three major search engines. Method. 4,827 query terms were submitted to the three major search engines to acquire search engine results pages. A series of Perl scripts were written to read search engine results pages and to identify, analyse, and extract organic links Analysis. Web pages from the organic links in search engine results pages were examined to see whether and how they had been tagged in Delicious. Only the Webpages tagged by at least 100 taggers were included in this study. The top thirty popular social tags used were harvested. The two sets of data were quantitatively analysed to investigate the research questions. Results. At least 60% of search engines\u27 query terms overlapped with social tags in Delicious; higher ranked social tags were more likely to be used as query terms for the same Web resources; and the co-occurring pattern of query terms and social tags over social ranking resembled a power law distribution. Conclusions. Socially tagged resources are likely to be highly ranked in search engine results pages. The findings can be applicable to the future study of Web resource related tasks such as Web searching and Web indexing

    Using contextual information to understand searching and browsing behavior

    Get PDF
    There is great imbalance in the richness of information on the web and the succinctness and poverty of search requests of web users, making their queries only a partial description of the underlying complex information needs. Finding ways to better leverage contextual information and make search context-aware holds the promise to dramatically improve the search experience of users. We conducted a series of studies to discover, model and utilize contextual information in order to understand and improve users' searching and browsing behavior on the web. Our results capture important aspects of context under the realistic conditions of different online search services, aiming to ensure that our scientific insights and solutions transfer to the operational settings of real world applications

    Searching or surfing : how do students who use the Web locate information resources?

    Get PDF
    SIGLEAvailable from British Library Document Supply Centre- DSC:DXN062800 / BLDSC - British Library Document Supply CentreGBUnited Kingdo

    GAUGING PUBLIC INTEREST FROM SERVER LOGS, SURVEYS AND INLINKS A Multi-Method Approach to Analyze News Websites

    Get PDF
    As the World Wide Web (the Web) has turned into a full-fledged medium to disseminate news, it is very important for journalism and information science researchers to investigate how Web users access online news reports and how to interpret such usage patterns. This doctoral thesis collected and analyzed Web server log statistics, online surveys results, online reprints of the top 50 news reports, as well as external inlinks data of a leading comprehensive online newspaper (the People\u27s Daily Online) in China, one of the biggest Web/information markets in today\u27s world. The aim of the thesis was to explore various methods to gauge the public interest from a Webometrics perspective. A total of 129 days of Web server log statistics, including the top 50 Chinese and English news stories with the highest daily pageview numbers, the comments attracted by these news items and the emailed frequencies of the same stories were collected from October 2007 to September 2008. These top 50 news items’positions on the Chinese and English homepages and the top 50 queries submitted to the website search engine of the People’s Daily Online were also retrieved. Results of the two online surveys launched in March 2008 and March 2009 were collected after their respective closing dates. The external inlinks to the People’s Daily Online were retrieved by Yahoo! (Chinese and English versions), and the online reprints were retrieved by Google. Besides the general usage patterns identified from the top 50 news stories, this study, by conducting statistical tests on the data sets, also reveals the following findings. First, the editors’ choices and the readers’ favorites do not always match each other; thus content of news title is more important than its homepage position in attracting online visits. Second, the Chinese and English readers’ interests in the same events are different. Third, the pageview numbers and comments posted to the news items reflect the unfavorable attitudes of the Chinese people toward the United States and Japan, which might offer us a method to investigate the public interest in some other issues or nations after necessary modifications. More importantly, some publicly available data, such as the comments posted to the news stories and online survey results, further show that the pageview measure does reflect readers’ interests/needs truthfully, as proved by the strong correlations between the top news reports and relevant top queries. The external ininks to the news websites and the online reprints of the top news items help us examine readers\u27 interests from other perspectives, as well as establish online profiles of the news websites. Such publicly accessible information could be an alternative data source for researchers to study readers\u27 interests when the Web server log data are not available. This doctoral thesis not only shows the usefulness of Web server log statistics, survey results, and other publicly accessible data in studying Web user’s information needs, but also offers practical suggestions for online news sites to improve their contents and homepage designs. However, no single method can draw a complete picture of the online news readers’ interests. The above mentioned research methodologies should be employed together, in order to make more comprehensive conclusions. Future research is especially needed to investigate the continuously rapid growth of the “Mobile News Readers,” which poses both challenges and opportunities to the press industry in the 21st century

    Addressing the new generation of spam (Spam 2.0) through Web usage models

    Get PDF
    New Internet collaborative media introduce new ways of communicating that are not immune to abuse. A fake eye-catching profile in social networking websites, a promotional review, a response to a thread in online forums with unsolicited content or a manipulated Wiki page, are examples of new the generation of spam on the web, referred to as Web 2.0 Spam or Spam 2.0. Spam 2.0 is defined as the propagation of unsolicited, anonymous, mass content to infiltrate legitimate Web 2.0 applications.The current literature does not address Spam 2.0 in depth and the outcome of efforts to date are inadequate. The aim of this research is to formalise a definition for Spam 2.0 and provide Spam 2.0 filtering solutions. Early-detection, extendibility, robustness and adaptability are key factors in the design of the proposed method.This dissertation provides a comprehensive survey of the state-of-the-art web spam and Spam 2.0 filtering methods to highlight the unresolved issues and open problems, while at the same time effectively capturing the knowledge in the domain of spam filtering.This dissertation proposes three solutions in the area of Spam 2.0 filtering including: (1) characterising and profiling Spam 2.0, (2) Early-Detection based Spam 2.0 Filtering (EDSF) approach, and (3) On-the-Fly Spam 2.0 Filtering (OFSF) approach. All the proposed solutions are tested against real-world datasets and their performance is compared with that of existing Spam 2.0 filtering methods.This work has coined the term ‘Spam 2.0’, provided insight into the nature of Spam 2.0, and proposed filtering mechanisms to address this new and rapidly evolving problem
    corecore