49,936 research outputs found

    Automatically identifying coherent web sessions from browser logs

    Get PDF
    Due to the increasing diversity in both user’s behaviour and the types of web tasks performed, many studies in information retrieval (IR) are turning towards session-based retrieval rather than single URL-query pairs. However, extracting the meaningful session data from the raw discrete logs is still a significant challenge. Most prior studies have been based on datasets where the logs of each user’s web history were simply divided by fixed periods of inactivity, such as 5, 15, or 30 minutes [52,31]. There have also been some attempts beyond these simplistic fixed timeouts [91]. Rather than covering all web activities, they focus on search-related activities only. Consequently, it is necessary to finding a meaningful way to cluster all activities including both searching and browsing on a web browser. The goal of this study is to find a way to better automatically segment users’ web activity into sessions. There are three research stages: 1) how people understand their mental model in the session segmentation, 2) how these self-identified sessions look in practically implemented weblogs, and 3) how we can algorithmically identify these sessions from browser activity, and how each algorithm performs. To answer these questions, firstly a qualitative study was conducted and a taxonomy of six factors related to the user-identified sessions was generated. Then a Chrome Extension was built that provided the practical reflection of user-identified sessions with comprehensive sets of web logs including both user interaction and visit details. This helped to gather a ground truth dataset to support further evaluation. Finally, several algorithmic approaches to automatically clustering web activities closer to user-identified sessions were evaluated

    Rough Sets Clustering and Markov model for Web Access Prediction

    Get PDF
    Discovering user access patterns from web access log is increasing the importance of information to build up adaptive web server according to the individual user’s behavior. The variety of user behaviors on accessing information also grows, which has a great impact on the network utilization. In this paper, we present a rough set clustering to cluster web transactions from web access logs and using Markov model for next access prediction. Using this approach, users can effectively mine web log records to discover and predict access patterns. We perform experiments using real web trace logs collected from www.dusit.ac.th servers. In order to improve its prediction ration, the model includes a rough sets scheme in which search similarity measure to compute the similarity between two sequences using upper approximation

    Web User Session Characterization via Clustering Techniques

    Get PDF
    We focus on the identification and definition of "Web user-sessions", an aggregation of several TCP connections generated by the same source host on the basis of TCP connection opening time. The identification of a user session is non trivial; traditional approaches rely on threshold based mechanisms, which are very sensitive to the value assumed for the threshold and may be difficult to correctly set. By applying clustering techniques, we define a novel methodology to identify Web user-sessions without requiring an a priori definition of threshold values. We analyze the characteristics of user sessions extracted from real traces, studying the statistical properties of the identified sessions. From the study it emerges that Web user-sessions tend to be Poisson, but correlation may arise during periods of network/hosts anomalous functioning

    J2EE application for clustered servers : focus on balancing workloads among clustered servers : a thesis presented in partial fulfilment of the requirements for the degree of Master of Information Science in Computer Science at Massey University, Albany, New Zealand

    Get PDF
    J2EE has become a de facto platform for developing enterprise applications not only by its standard based methodology but also by reducing the cost and complexity of developing multi-tier enterprise applications. J2EE based application servers keep business logic separate from the front-end applications (client-side) and back-end database servers. The standardized components and containers simplify J2EE application design. The containers automatically manage the fundamental system level services for its components, which enable the components design to focus on the business requirement and business logic. This study applies the latest J2EE technologies to configure an online benchmark enterprise application - MG Project. The application focuses on three types of components design including Servlet, entity bean and session bean. Servlets run on the web server Tomcat, EJB components, session beans and entity beans run on the application server JBoss and the database runs on the database server Postgre SQL. This benchmark application is used for testing the performance of clustered JBoss due to various load-balancing policies applied at the EJB level. This research also focuses on studying the various load-balancing policies effect on the performance of clustered JBoss. As well as the four built-in load-balancing policies i.e. First Available, First Available Identical All Proxies, Random Robin and Round Robin, the study also extend the JBoss Load balance Policy interface to design two dynamic load-balancing policies. They are dynamic and dynamic weight-based load-balancing policies. The purpose of dynamic load-balancing policies design is to ensure minimal response time and obtain better performance by dispatching incoming requests to the appropriate server. However, a more accurate policy usually means more communications and calculations, which give an extra burden to a heavily loaded application server that can lead to drops in the performance

    Extracting Hierarchies of Search Tasks & Subtasks via a Bayesian Nonparametric Approach

    Get PDF
    A significant amount of search queries originate from some real world information need or tasks. In order to improve the search experience of the end users, it is important to have accurate representations of tasks. As a result, significant amount of research has been devoted to extracting proper representations of tasks in order to enable search systems to help users complete their tasks, as well as providing the end user with better query suggestions, for better recommendations, for satisfaction prediction, and for improved personalization in terms of tasks. Most existing task extraction methodologies focus on representing tasks as flat structures. However, tasks often tend to have multiple subtasks associated with them and a more naturalistic representation of tasks would be in terms of a hierarchy, where each task can be composed of multiple (sub)tasks. To this end, we propose an efficient Bayesian nonparametric model for extracting hierarchies of such tasks \& subtasks. We evaluate our method based on real world query log data both through quantitative and crowdsourced experiments and highlight the importance of considering task/subtask hierarchies.Comment: 10 pages. Accepted at SIGIR 2017 as a full pape

    Cross Validation Of Neural Network Applications For Automatic New Topic Identification

    Get PDF
    There are recent studies in the literature on automatic topic-shift identification in Web search engine user sessions; however most of this work applied their topic-shift identification algorithms on data logs from a single search engine. The purpose of this study is to provide the cross-validation of an artificial neural network application to automatically identify topic changes in a web search engine user session by using data logs of different search engines for training and testing the neural network. Sample data logs from the Norwegian search engine FAST (currently owned by Overture) and Excite are used in this study. Findings of this study suggest that it could be possible to identify topic shifts and continuations successfully on a particular search engine user session using neural networks that are trained on a different search engine data log

    Generating dynamic higher-order Markov models in web usage mining

    Get PDF
    Markov models have been widely used for modelling users’ web navigation behaviour. In previous work we have presented a dynamic clustering-based Markov model that accurately represents second-order transition probabilities given by a collection of navigation sessions. Herein, we propose a generalisation of the method that takes into account higher-order conditional probabilities. The method makes use of the state cloning concept together with a clustering technique to separate the navigation paths that reveal differences in the conditional probabilities. We report on experiments conducted with three real world data sets. The results show that some pages require a long history to understand the users choice of link, while others require only a short history. We also show that the number of additional states induced by the method can be controlled through a probability threshold parameter
    corecore