7,644 research outputs found
Agents, Bookmarks and Clicks: A topical model of Web traffic
Analysis of aggregate and individual Web traffic has shown that PageRank is a
poor model of how people navigate the Web. Using the empirical traffic patterns
generated by a thousand users, we characterize several properties of Web
traffic that cannot be reproduced by Markovian models. We examine both
aggregate statistics capturing collective behavior, such as page and link
traffic, and individual statistics, such as entropy and session size. No model
currently explains all of these empirical observations simultaneously. We show
that all of these traffic patterns can be explained by an agent-based model
that takes into account several realistic browsing behaviors. First, agents
maintain individual lists of bookmarks (a non-Markovian memory mechanism) that
are used as teleportation targets. Second, agents can retreat along visited
links, a branching mechanism that also allows us to reproduce behaviors such as
the use of a back button and tabbed browsing. Finally, agents are sustained by
visiting novel pages of topical interest, with adjacent pages being more
topically related to each other than distant ones. This modulates the
probability that an agent continues to browse or starts a new session, allowing
us to recreate heterogeneous session lengths. The resulting model is capable of
reproducing the collective and individual behaviors we observe in the empirical
data, reconciling the narrowly focused browsing patterns of individual users
with the extreme heterogeneity of aggregate traffic measurements. This result
allows us to identify a few salient features that are necessary and sufficient
to interpret the browsing patterns observed in our data. In addition to the
descriptive and explanatory power of such a model, our results may lead the way
to more sophisticated, realistic, and effective ranking and crawling
algorithms.Comment: 10 pages, 16 figures, 1 table - Long version of paper to appear in
Proceedings of the 21th ACM conference on Hypertext and Hypermedi
Large-Scale Web Traffic Log Analyzer using Cloudera Impala on Hadoop Distributed File System
AbstractResource planning and data analysis are important for network services in order to increase the service efficiency. Nowadays, Large websites or web servers have a large number of visitors, which mean a large web traffic log need to be stored in the plain text or the relational database. However plain text and relational database are not efficient to handle a large number of data. Moreover, the web traffic log analysis hardware or software that can handle such a big data is also expensive. This research paper proposes the design of a large-scale web traffic log analyzer using PHP language to show the visitors' traffic data analysis in the form of charts. The Hadoop Distributed File System (HDFS) is used in conjunction with other related techniques to gather and store visitors' traffic log. Cloudera Impala is used to query web traffic log stored in HDFS while Apache Thrift is an intermediary connecting Cloudera Impala to PHP web. Upon testing our large-scale web traffic log analyzer on HDFS Cluster of 8 nodes with 50 gigabytes of traffic log, our system can query and analysis web traffic log then display the result in about 4 second
Realistic Traffic Generation for Web Robots
Critical to evaluating the capacity, scalability, and availability of web
systems are realistic web traffic generators. Web traffic generation is a
classic research problem, no generator accounts for the characteristics of web
robots or crawlers that are now the dominant source of traffic to a web server.
Administrators are thus unable to test, stress, and evaluate how their systems
perform in the face of ever increasing levels of web robot traffic. To resolve
this problem, this paper introduces a novel approach to generate synthetic web
robot traffic with high fidelity. It generates traffic that accounts for both
the temporal and behavioral qualities of robot traffic by statistical and
Bayesian models that are fitted to the properties of robot traffic seen in web
logs from North America and Europe. We evaluate our traffic generator by
comparing the characteristics of generated traffic to those of the original
data. We look at session arrival rates, inter-arrival times and session
lengths, comparing and contrasting them between generated and real traffic.
Finally, we show that our generated traffic affects cache performance similarly
to actual traffic, using the common LRU and LFU eviction policies.Comment: 8 page
Point Process Models of 1/f Noise and Internet Traffic
We present a simple model reproducing the long-range autocorrelations and the
power spectrum of the web traffic. The model assumes the traffic as Poisson
flow of files with size distributed according to the power-law. In this model
the long-range autocorrelations are independent of the network properties as
well as of inter-packet time distribution.Comment: 6 pages, 2 figures, CNET2004 Proceedings AI
Smart Search: A Firefox Add-On to Compute a Web Traffic Ranking
Search engines results are typically ordered according to some notion of importance of a web page as well as relevance of the content of a web page to a query. Web page importance is usually calculated based on some graph theoretic properties of the web. Another common technique to measure page importance is to make use of the traffic that goes to a particular web page as measured by a browser toolbar. Currently, there are some traffic ranking tools available like www.alexa.com, www.ranking.com, www.compete.com that give such analytic as to the number of users who visit a web site. Alexa provides the traffic rank for a website based on two factors: The number of users that view a website and the number of pages viewed. The Alexa toolbar is not open-source.The main goal of our project was to create a Smart Search Firefox add-on for the Yioop search engine, an open source search engine developed by my project advisor, Dr. Chris Pollett. This add-on would provide similar analytic data to the Yioop search engine, but in a transparent and open-source way. With the results received from the Smart Search toolbar extension, the Yioop search engine refines the search results as well as provides user centric-search results. Eventually, users would benefit from these better search results
Web Traffic Time Series Forecasting
Online web traffic forecasting is one of the most crucial elements of maintaining and improving websites and digital platforms. Traffic patterns usually predict future online traffic, including page views, unique visitors, session duration, and bounce rates. However, it is challenging to forecast non-stationary online web traffic, particularly when the data has spikes or irregular patterns. This non-stationary property demands a more advanced forecasting technique. In this study, we provide a neural networkbased method, Spiking Neural Networks (SNNs), for dealing with the data spikes and irregular patterns in non-stationary data. In our study, we compared the forecasting results of SNNs with traditional and popular time-series prediction methods like Long Short-Term Memory (LSTM) networks and Seasonal AutoRegressive Integrated Moving Average with exogenous variables (SARIMAX). The evaluation was based on prediction error metrics such as Mean Square Error (RMSE) and the Mean Absolute Error (MAE). Our results found that SNNs worked better in forecasting the non-stationary web traffic data when compared to the traditional methods. This effective forecasting technique by SNNs can be crucial in sectors like e-commerce and digital marketing, where accurately predicting the traffic helps optimize resources and improve digital strategies
- âĶ