121,613 research outputs found
"A Two-Stage Prediction Model for Web Page Transition"
Utilizing data from a log file, a two-stage model for step-ahead web page prediction that permits adaptive page customization in real-time is proposed. The first stage predicts the next page of a viewer based on a variant of a Markov transition matrix computed from page sequences of other visitors who read the same pages as that viewer did thus far. The second stage re-analyzes the incorrect exit/continuation predictions of the first stage through data mining, incorporating the visitor's viewing behavior observed from the log file. The two-stage process takes advantage of a robust, theory-driven nature of statistical modeling for extracting the overall feature of the data, and a flexible, data-driven nature of data mining to capture any idiosyncrasies and complications unresolved in the first stage. The empirical result with a test site implies that the first stage alone is sufficiently accurate (50.3%) in predicting page transitions. Prediction of site exit was even better with 100% of the exit and 90.8% of the continuation predictions being correct. The result was compared against other models for predictive accuracy.
Using Markov Chains for link prediction in adaptive web sites
The large number of Web pages on many Web sites has raised
navigational problems. Markov chains have recently been used to model user navigational behavior on the World Wide Web (WWW). In this paper, we propose a method for constructing a Markov model of a Web site based on past
visitor behavior. We use the Markov model to make link predictions that assist new users to navigate the Web site. An algorithm for transition probability
matrix compression has been used to cluster Web pages with similar transition behaviors and compress the transition matrix to an optimal size for efficient probability calculation in link prediction. A maximal forward path method is used to further improve the efficiency of link prediction. Link prediction has been implemented in an online system called ONE (Online Navigation Explorer) to assist users' navigation in the adaptive Web site
Web Site Personalization based on Link Analysis and Navigational Patterns
The continuous growth in the size and use of the World Wide Web imposes new methods of design and development of on-line information services. The need for predicting the users’ needs in order to improve the usability and user retention of a web site is more than evident and can be addressed by personalizing it. Recommendation algorithms aim at proposing “next” pages to users based on their current visit and the past users’ navigational patterns. In the vast majority of related algorithms, however, only the usage data are used to produce recommendations, disregarding the structural properties of the web graph. Thus important – in terms of PageRank authority score – pages may be underrated. In this work we present UPR, a PageRank-style algorithm which combines usage data and link analysis techniques for assigning probabilities to the web pages based on their importance in the web site’s navigational graph. We propose the application of a localized version of UPR (l-UPR) to personalized navigational sub-graphs for online web page ranking and recommendation. Moreover, we propose a hybrid probabilistic predictive model based on Markov models and link analysis for assigning prior probabilities in a hybrid probabilistic model. We prove, through experimentation, that this approach results in more objective and representative predictions than the ones produced from the pure usage-based approaches
Evaluating Variable Length Markov Chain Models for Analysis of User Web Navigation Sessions
Markov models have been widely used to represent and analyse user web
navigation data. In previous work we have proposed a method to dynamically
extend the order of a Markov chain model and a complimentary method for
assessing the predictive power of such a variable length Markov chain. Herein,
we review these two methods and propose a novel method for measuring the
ability of a variable length Markov model to summarise user web navigation
sessions up to a given length. While the summarisation ability of a model is
important to enable the identification of user navigation patterns, the ability
to make predictions is important in order to foresee the next link choice of a
user after following a given trail so as, for example, to personalise a web
site. We present an extensive experimental evaluation providing strong evidence
that prediction accuracy increases linearly with summarisation ability
Generating dynamic higher-order Markov models in web usage mining
Markov models have been widely used for modelling users’ web navigation behaviour. In previous work we have presented a dynamic clustering-based Markov model that accurately represents second-order transition probabilities given by a collection of navigation sessions. Herein, we propose a generalisation of the method that takes into account higher-order conditional probabilities. The method makes use of the state cloning concept together with a clustering technique to separate the navigation paths that reveal differences in the conditional probabilities. We report on experiments conducted with three real world data sets. The results show that some pages require a long history to understand the users choice of link, while others require only a short history. We also show that the number of additional states induced by the method can be controlled through a probability threshold parameter
Retrospective Higher-Order Markov Processes for User Trails
Users form information trails as they browse the web, checkin with a
geolocation, rate items, or consume media. A common problem is to predict what
a user might do next for the purposes of guidance, recommendation, or
prefetching. First-order and higher-order Markov chains have been widely used
methods to study such sequences of data. First-order Markov chains are easy to
estimate, but lack accuracy when history matters. Higher-order Markov chains,
in contrast, have too many parameters and suffer from overfitting the training
data. Fitting these parameters with regularization and smoothing only offers
mild improvements. In this paper we propose the retrospective higher-order
Markov process (RHOMP) as a low-parameter model for such sequences. This model
is a special case of a higher-order Markov chain where the transitions depend
retrospectively on a single history state instead of an arbitrary combination
of history states. There are two immediate computational advantages: the number
of parameters is linear in the order of the Markov chain and the model can be
fit to large state spaces. Furthermore, by providing a specific structure to
the higher-order chain, RHOMPs improve the model accuracy by efficiently
utilizing history states without risks of overfitting the data. We demonstrate
how to estimate a RHOMP from data and we demonstrate the effectiveness of our
method on various real application datasets spanning geolocation data, review
sequences, and business locations. The RHOMP model uniformly outperforms
higher-order Markov chains, Kneser-Ney regularization, and tensor
factorizations in terms of prediction accuracy
edge2vec: Representation learning using edge semantics for biomedical knowledge discovery
Representation learning provides new and powerful graph analytical approaches
and tools for the highly valued data science challenge of mining knowledge
graphs. Since previous graph analytical methods have mostly focused on
homogeneous graphs, an important current challenge is extending this
methodology for richly heterogeneous graphs and knowledge domains. The
biomedical sciences are such a domain, reflecting the complexity of biology,
with entities such as genes, proteins, drugs, diseases, and phenotypes, and
relationships such as gene co-expression, biochemical regulation, and
biomolecular inhibition or activation. Therefore, the semantics of edges and
nodes are critical for representation learning and knowledge discovery in real
world biomedical problems. In this paper, we propose the edge2vec model, which
represents graphs considering edge semantics. An edge-type transition matrix is
trained by an Expectation-Maximization approach, and a stochastic gradient
descent model is employed to learn node embedding on a heterogeneous graph via
the trained transition matrix. edge2vec is validated on three biomedical domain
tasks: biomedical entity classification, compound-gene bioactivity prediction,
and biomedical information retrieval. Results show that by considering
edge-types into node embedding learning in heterogeneous graphs,
\textbf{edge2vec}\ significantly outperforms state-of-the-art models on all
three tasks. We propose this method for its added value relative to existing
graph analytical methodology, and in the real world context of biomedical
knowledge discovery applicability.Comment: 10 page
Social Dynamics of Digg
Online social media provide multiple ways to find interesting content. One
important method is highlighting content recommended by user's friends. We
examine this process on one such site, the news aggregator Digg. With a
stochastic model of user behavior, we distinguish the effects of the content
visibility and interestingness to users. We find a wide range of interest and
distinguish stories primarily of interest to a users' friends from those of
interest to the entire user community. We show how this model predicts a
story's eventual popularity from users' early reactions to it, and estimate the
prediction reliability. This modeling framework can help evaluate alternative
design choices for displaying content on the site.Comment: arXiv admin note: text overlap with arXiv:1010.023
- …