143,574 research outputs found
Smart Search: A Firefox Add-On to Compute a Web Traffic Ranking
Search engines results are typically ordered according to some notion of importance of a web page as well as relevance of the content of a web page to a query. Web page importance is usually calculated based on some graph theoretic properties of the web. Another common technique to measure page importance is to make use of the traffic that goes to a particular web page as measured by a browser toolbar. Currently, there are some traffic ranking tools available like www.alexa.com, www.ranking.com, www.compete.com that give such analytic as to the number of users who visit a web site. Alexa provides the traffic rank for a website based on two factors: The number of users that view a website and the number of pages viewed. The Alexa toolbar is not open-source.The main goal of our project was to create a Smart Search Firefox add-on for the Yioop search engine, an open source search engine developed by my project advisor, Dr. Chris Pollett. This add-on would provide similar analytic data to the Yioop search engine, but in a transparent and open-source way. With the results received from the Smart Search toolbar extension, the Yioop search engine refines the search results as well as provides user centric-search results. Eventually, users would benefit from these better search results
Query independent measures of annotation and annotator impact
The modern-day web-user plays a far more active role in the creation of content for the web as a whole. In this paper we present Annoby, a free-text annotation system built to give users a more interactive experience of the events of the Rugby World Cup 2007. Annotations can be used for query-independent ranking of both the annotations and the original recorded video footage (or documents) which has been annotated, based on the social interactions of a community of users. We present two algorithms, AuthorRank and MessageRank, designed to take advantage of these interactions so as to provide a means of ranking documents by their social impact
From Linked Data to Relevant Data -- Time is the Essence
The Semantic Web initiative puts emphasis not primarily on putting data on
the Web, but rather on creating links in a way that both humans and machines
can explore the Web of data. When such users access the Web, they leave a trail
as Web servers maintain a history of requests. Web usage mining approaches have
been studied since the beginning of the Web given the log's huge potential for
purposes such as resource annotation, personalization, forecasting etc.
However, the impact of any such efforts has not really gone beyond generating
statistics detailing who, when, and how Web pages maintained by a Web server
were visited.Comment: 1st International Workshop on Usage Analysis and the Web of Data
(USEWOD2011) in the 20th International World Wide Web Conference (WWW2011),
Hyderabad, India, March 28th, 201
A Relation-Based Page Rank Algorithm for Semantic Web Search Engines
With the tremendous growth of information available to end users through the Web, search engines come to play ever a more critical role. Nevertheless, because of their general-purpose approach, it is always less uncommon that obtained result sets provide a burden of useless pages. The next-generation Web architecture, represented by the Semantic Web, provides the layered architecture possibly allowing overcoming this limitation. Several search engines have been proposed, which allow increasing information retrieval accuracy by exploiting a key content of Semantic Web resources, that is, relations. However, in order to rank results, most of the existing solutions need to work on the whole annotated knowledge base. In this paper, we propose a relation-based page rank algorithm to be used in conjunction with Semantic Web search engines that simply relies on information that could be extracted from user queries and on annotated resources. Relevance is measured as the probability that a retrieved resource actually contains those relations whose existence was assumed by the user at the time of query definitio
A Brief History of Web Crawlers
Web crawlers visit internet applications, collect data, and learn about new
web pages from visited pages. Web crawlers have a long and interesting history.
Early web crawlers collected statistics about the web. In addition to
collecting statistics about the web and indexing the applications for search
engines, modern crawlers can be used to perform accessibility and vulnerability
checks on the application. Quick expansion of the web, and the complexity added
to web applications have made the process of crawling a very challenging one.
Throughout the history of web crawling many researchers and industrial groups
addressed different issues and challenges that web crawlers face. Different
solutions have been proposed to reduce the time and cost of crawling.
Performing an exhaustive crawl is a challenging question. Additionally
capturing the model of a modern web application and extracting data from it
automatically is another open question. What follows is a brief history of
different technique and algorithms used from the early days of crawling up to
the recent days. We introduce criteria to evaluate the relative performance of
web crawlers. Based on these criteria we plot the evolution of web crawlers and
compare their performanc
Characterizing and modeling the dynamics of online popularity
Online popularity has enormous impact on opinions, culture, policy, and
profits. We provide a quantitative, large scale, temporal analysis of the
dynamics of online content popularity in two massive model systems, the
Wikipedia and an entire country's Web space. We find that the dynamics of
popularity are characterized by bursts, displaying characteristic features of
critical systems such as fat-tailed distributions of magnitude and inter-event
time. We propose a minimal model combining the classic preferential popularity
increase mechanism with the occurrence of random popularity shifts due to
exogenous factors. The model recovers the critical features observed in the
empirical analysis of the systems analyzed here, highlighting the key factors
needed in the description of popularity dynamics.Comment: 5 pages, 4 figures. Modeling part detailed. Final version published
in Physical Review Letter
- …