34 research outputs found

    A three-year study on the freshness of Web search engine databases

    Get PDF
    This paper deals with one aspect of the index quality of search engines: index freshness. The purpose is to analyse the update strategies of the major Web search engines Google, Yahoo, and MSN/Live.com. We conducted a test of the updates of 40 daily updated pages and 30 irregularly updated pages, respectively. We used data from a time span of six weeks in the years 2005, 2006, and 2007. We found that the best search engine in terms of up-to-dateness changes over the years and that none of the engines has an ideal solution for index freshness. Frequency distributions for the pages’ ages are skewed, which means that search engines do differentiate between often- and seldom-updated pages. This is confirmed by the difference between the average ages of daily updated pages and our control group of pages. Indexing patterns are often irregular, and there seems to be no clear policy regarding when to revisit Web pages. A major problem identified in our research is the delay in making crawled pages available for searching, which differs from one engine to another

    A three-year study on the freshness of Web search engine databases

    Get PDF
    This paper deals with one aspect of the index quality of search engines: index freshness. The purpose is to analyse the update strategies of the major Web search engines Google, Yahoo, and MSN/Live.com. We conducted a test of the updates of 40 daily updated pages and 30 irregularly updated pages, respectively. We used data from a time span of six weeks in the years 2005, 2006, and 2007. We found that the best search engine in terms of up-to-dateness changes over the years and that none of the engines has an ideal solution for index freshness. Frequency distributions for the pages’ ages are skewed, which means that search engines do differentiate between often- and seldom-updated pages. This is confirmed by the difference between the average ages of daily updated pages and our control group of pages. Indexing patterns are often irregular, and there seems to be no clear policy regarding when to revisit Web pages. A major problem identified in our research is the delay in making crawled pages available for searching, which differs from one engine to another

    What Users See – Structures in Search Engine Results Pages

    Get PDF
    This paper investigates the composition of search engine results pages. We define what elements the most popular web search engines use on their results pages (e.g., organic results, advertisements, shortcuts) and to which degree they are used for popular vs. rare queries. Therefore, we send 500 queries of both types to the major search engines Google, Yahoo, Live.com and Ask. We count how often the different elements are used by the individual engines. In total, our study is based on 42,758 elements. Findings include that search engines use quite different approaches to results pages composition and therefore, the user gets to see quite different results sets depending on the search engine and search query used. Organic results still play the major role in the results pages, but different shortcuts are of some importance, too. Regarding the frequency of certain host within the results sets, we find that all search engines show Wikipedia results quite often, while other hosts shown depend on the search engine used. Both Google and Yahoo prefer results from their own offerings (such as YouTube or Yahoo Answers). Since we used the .com interfaces of the search engines, results may not be valid for other country-specific interfaces

    A Proposed Architecture for Continuous Web Monitoring Through Online Crawling of Blogs

    Full text link
    Getting informed of what is registered in the Web space on time, can greatly help the psychologists, marketers and political analysts to familiarize, analyse, make decision and act correctly based on the society`s different needs. The great volume of information in the Web space hinders us to continuously online investigate the whole space of the Web. Focusing on the considered blogs limits our working domain and makes the online crawling in the Web space possible. In this article, an architecture is offered which continuously online crawls the related blogs, using focused crawler, and investigates and analyses the obtained data. The online fetching is done based on the latest announcements of the ping server machines. A weighted graph is formed based on targeting the important key phrases, so that a focused crawler can do the fetching of the complete texts of the related Web pages, based on the weighted graph.Comment: 10 pages, 2 figure

    Why we need an independent index of the Web

    Full text link
    The path to greater diversity, as we have seen, cannot be achieved by merely hoping for a new search engine nor will government support for a single alternative achieve this goal. What is instead required is to create the conditions that will make establishing such a search engine possible in the first place. I describe how building and maintaining a proprietary index is the greatest deterrent to such an undertaking. We must first overcome this obstacle. Doing so will still not solve the problem of the lack of diversity in the search engine marketplace. But it may establish the conditions necessary to achieve that desired end

    Allied medical sciences students\u27 experiences with technology: are they digitally literate?

    Get PDF
    Objective: The ability to use digital resources is important for medical students. In order to use digital resources, they need the capabilities of digital technology utilizing, which is referred to as digital literacy. However, how much effectively students can use these facilities is a subject that needs to be addressed. So, the present study intended to investigate the digital literacy level of students of the Allied medical sciences of Shahid Beheshti University of Medical Sciences. Materials and methods: This cross-sectional study was performed at the Faculty of Allied medical sciences of Shahid Beheshti University of Medical Sciences in academic year of 2016-2017 using a researcher-made questionnaire containing 23 closed questions in four sections. A total of 115 students in three educational grades: bachelor, master and the PhD grade were included in this study. Z-test was used to evaluate the relationship between internet skills and students\u27 academic achievements if any exist. Results: Almost half of the students (51.3%) have not completed any computer courses regarding basic ICT skills. The findings showed that 41.2% of PhD students are aware of digital literacy concept; meanwhile, only 11% of bachelor students and 20.6% of master ones knew the actual meaning of this concept. The use of public search engines was a favorite alternative for finding specialized terminologies at all grades. Furthermore, there was a significant difference between the level of familiarity with the Internet and the students\u27 grade (p≤0.05). Conclusion: Digital literacy training courses can enhance digital literacy skills significantly. Most students agreed with the inclusion of digital literacy courses in their curriculum. Therefore, they should be supported by educators and librarians in order to effectively use the Internet and information technology as well as to overcome the problems of finding and using information to gain academic achievement
    corecore