34 research outputs found
A three-year study on the freshness of Web search engine databases
This paper deals with one aspect of the index quality of search engines: index freshness. The purpose is to analyse the update strategies of the major Web search engines Google, Yahoo, and MSN/Live.com. We conducted a test of the
updates of 40 daily updated pages and 30 irregularly updated pages, respectively. We used data from a time span of six weeks in the years 2005, 2006, and 2007. We found that the best search engine in terms of up-to-dateness changes over the years and that none of the engines has an ideal solution for index freshness. Frequency distributions for the pages’ ages are skewed, which means that search engines do differentiate between often- and seldom-updated pages. This is confirmed by the difference between the average ages of daily updated pages and our control group of pages. Indexing patterns are often irregular, and there seems to be no clear policy regarding when to revisit Web pages. A major problem identified in our research is the delay in making crawled pages available for searching, which differs from one engine to another
A three-year study on the freshness of Web search engine databases
This paper deals with one aspect of the index quality of search engines: index freshness. The purpose is to analyse the update strategies of the major Web search engines Google, Yahoo, and MSN/Live.com. We conducted a test of the
updates of 40 daily updated pages and 30 irregularly updated pages, respectively. We used data from a time span of six weeks in the years 2005, 2006, and 2007. We found that the best search engine in terms of up-to-dateness changes over the years and that none of the engines has an ideal solution for index freshness. Frequency distributions for the pages’ ages are skewed, which means that search engines do differentiate between often- and seldom-updated pages. This is confirmed by the difference between the average ages of daily updated pages and our control group of pages. Indexing patterns are often irregular, and there seems to be no clear policy regarding when to revisit Web pages. A major problem identified in our research is the delay in making crawled pages available for searching, which differs from one engine to another
What Users See – Structures in Search Engine Results Pages
This paper investigates the composition of search engine results pages. We define what elements the most
popular web search engines use on their results pages (e.g., organic results, advertisements, shortcuts) and to
which degree they are used for popular vs. rare queries. Therefore, we send 500 queries of both types to the
major search engines Google, Yahoo, Live.com and Ask. We count how often the different elements are used by
the individual engines. In total, our study is based on 42,758 elements. Findings include that search engines use
quite different approaches to results pages composition and therefore, the user gets to see quite different results
sets depending on the search engine and search query used. Organic results still play the major role in the results
pages, but different shortcuts are of some importance, too. Regarding the frequency of certain host within the
results sets, we find that all search engines show Wikipedia results quite often, while other hosts shown depend
on the search engine used. Both Google and Yahoo prefer results from their own offerings (such as YouTube or
Yahoo Answers). Since we used the .com interfaces of the search engines, results may not be valid for other
country-specific interfaces
A Proposed Architecture for Continuous Web Monitoring Through Online Crawling of Blogs
Getting informed of what is registered in the Web space on time, can greatly
help the psychologists, marketers and political analysts to familiarize,
analyse, make decision and act correctly based on the society`s different
needs. The great volume of information in the Web space hinders us to
continuously online investigate the whole space of the Web. Focusing on the
considered blogs limits our working domain and makes the online crawling in the
Web space possible. In this article, an architecture is offered which
continuously online crawls the related blogs, using focused crawler, and
investigates and analyses the obtained data. The online fetching is done based
on the latest announcements of the ping server machines. A weighted graph is
formed based on targeting the important key phrases, so that a focused crawler
can do the fetching of the complete texts of the related Web pages, based on
the weighted graph.Comment: 10 pages, 2 figure
Why we need an independent index of the Web
The path to greater diversity, as we have seen, cannot be achieved by merely
hoping for a new search engine nor will government support for a single
alternative achieve this goal. What is instead required is to create the
conditions that will make establishing such a search engine possible in the
first place. I describe how building and maintaining a proprietary index is the
greatest deterrent to such an undertaking. We must first overcome this
obstacle. Doing so will still not solve the problem of the lack of diversity in
the search engine marketplace. But it may establish the conditions necessary to
achieve that desired end
Allied medical sciences students\u27 experiences with technology: are they digitally literate?
Objective: The ability to use digital resources is important for medical students. In order to use digital resources, they need the capabilities of digital technology utilizing, which is referred to as digital literacy. However, how much effectively students can use these facilities is a subject that needs to be addressed. So, the present study intended to investigate the digital literacy level of students of the Allied medical sciences of Shahid Beheshti University of Medical Sciences.
Materials and methods: This cross-sectional study was performed at the Faculty of Allied medical sciences of Shahid Beheshti University of Medical Sciences in academic year of 2016-2017 using a researcher-made questionnaire containing 23 closed questions in four sections. A total of 115 students in three educational grades: bachelor, master and the PhD grade were included in this study. Z-test was used to evaluate the relationship between internet skills and students\u27 academic achievements if any exist.
Results: Almost half of the students (51.3%) have not completed any computer courses regarding basic ICT skills. The findings showed that 41.2% of PhD students are aware of digital literacy concept; meanwhile, only 11% of bachelor students and 20.6% of master ones knew the actual meaning of this concept. The use of public search engines was a favorite alternative for finding specialized terminologies at all grades. Furthermore, there was a significant difference between the level of familiarity with the Internet and the students\u27 grade (p≤0.05).
Conclusion: Digital literacy training courses can enhance digital literacy skills significantly. Most students agreed with the inclusion of digital literacy courses in their curriculum. Therefore, they should be supported by educators and librarians in order to effectively use the Internet and information technology as well as to overcome the problems of finding and using information to gain academic achievement