341 research outputs found

    Clustering Web Sessions Using Extended General Pages

    Get PDF

    Workload characterization and customer interaction at e-commerce web servers

    Get PDF
    Electronic commerce servers have a significant presence in today's Internet. Corporations want to maintain high availability, sufficient capacity, and satisfactory performance for their E-commerce Web systems, and want to provide satisfactory services to customers. Workload characterization and the analysis of customers' interactions with Web sites are the bases upon which to analyze server performance, plan system capacity, manage system resources, and personalize services at the Web site. To date, little empirical evidence has been discovered that identifies the characteristics for Web workloads of E-commerce systems and the behaviours of customers. This thesis analyzes the Web access logs at public Web sites for three organizations: a car rental company, an IT company, and the Computer Science department of the University of Saskatchewan. In these case studies, the characteristics of Web workloads are explored at the request level, functionlevel, resource level, and session level; customers' interactions with Web sites are analyzed by identifying and characterizing session groups. The main E-commerce Web workload characteristics and performance implications are: i) The requests for dynamic Web objects are an important part of the workload. These requests should be characterized separately since the system processes them differently; ii) Some popular image files, which are embedded in the same Web page, are always requested together. If these files are requested and sent in a bundle, a system will greatly reduce the overheads in processing requests for these files; iii) The percentage of requests for each Web page category tends to be stable in the workload when the time scale is large enough. This observation is helpful in forecasting workload composition; iv) the Secure Socket Layer protocol (SSL) is heavily used and most Web objects are either requested primarily through SSL or primarily not through SSL; and v) Session groups of different characteristics are identified for all logs. The analysis of session groups may be helpful in improving system performance, maximizing revenue throughput of the system, providing better services to customers, and managing and planning system resources. A hybrid clustering algorithm, which is a combination of the minimum spanning tree method and k-means clustering algorithm, is proposed to identify session clusters. Session clusters obtained using the three session representations Pages Requested, Navigation Pattern, and Resource Usage are similar enough so that it is possible to use different session representations interchangeably to produce similar groupings. The grouping based on one session representation is believed to be sufficient to answer questions in server performance, resource management, capacity planning and Web site personalization, which previously would have required multiple different groupings. Grouping by Pages Requested is recommended since it is the simplest and data on Web pages requested is relatively easy to obtain in HTTP logs

    User-adaptive website with information palettes

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 75-76).The majority of existing websites on the Internet do not adapt to the individual user. Instead, they serve the same static content that has been created beforehand to everyone who visits the site. However, it has been shown that different people have different cognitive styles, or preferred ways in which they think, perceive information, and solve problems. Each cognitive style desires a certain type of information presented in a certain way. In this thesis, I design and implement a framework for creating user-adaptive websites that can infer a user's cognitive style from the webpages he or she visits and serve adaptive information palettes with content suited for that cognitive style.Specifically, the system first assigns ratings to each webpage, defining how each one rates along a set of cognitive style dimensions. Then it tracks a user's session on a website, compares it to sessions of past users, clusters similar sessions together, and computes the likely cognitive style of the user using a weighted average of the ratings of the webpages in the user's current session and in the cluster. I implemented this system as a customer advocacy website for General Motors. The website successfully infers users' cognitive styles and displays suitable information palettes.by Qiuyuan Jimmy Li.M.Eng

    Web structure mining of dynamic pages

    Get PDF
    Web structure mining in static web contents decreases the accuracy of mined outcomes and affects the quality of decision making activity. By structure mining in web hidden data, the accuracy ratio of mined outcomes can be improved, thus enhancing the reliability and quality of decision making activity. Data Mining is an automated or semi automated exploration and analysis of large volume of data in order to reveal meaningful patterns. The term web mining is the discovery and analysis of useful information from World Wide Web that helps web search engines to find high quality web pages and enhances web click stream analysis. One branch of web mining is web structure mining. The goal of which is to generate structural summary about the Web site and Web pages. Web structure mining tries to discover the link structure of the hyperlinks at the inter-document level. In recent years, Web link structure mining has been widely used to infer important information about Web pages. But a major part of the web is in hidden form, also called Deep Web or Hidden Web that refers to documents on the Web that are dynamic and not accessible by general search engines; most search engine spiders can access only publicly index able Web (or the visible Web). Most documents in the hidden Web, including pages hidden behind search forms, specialized databases, and dynamically generated Web pages, are not accessible by general Web mining applications. Dynamic content generation is used in modern web pages and user forms are used to get information from a particular user and stored in a database. The link structure lying in these forms can not be accessed during conventional mining procedures. To access these links, user forms are filled automatically by using a rule based framework which has robust ability to read a web page containing dynamic contents as activeX controls like input boxes, command buttons, combo boxes, etc. After reading these controls dummy values are filled in the available fields and the doGet or doPost methods are automatically executed to acquire the link of next subsequent web page. The accuracy ratio of web page hierarchical structures can phenomenally be improved by including these hidden web pages in the process of Web structure mining. The designed system framework is adequately strong to process the dynamic Web pages along with static ones

    Big Data

    Get PDF
    Η εργασία στοχεύει στην ανάλυση της αγοράς των μεγάλων δεδομένων, Περιλαμβάνονται οι πάροχοι μαζί με κάποιες ενδιαφέρουσες περιπτώσεις χρήσης.Nowadays, term big data, draws a lot of attention, both for Business and person perspective. For decades, companies have been making business decisions through its Business Intelligence department, based on transactional data which were basically stored in relational databases. However, regulatory compliance, increased competition, and other pressures have created an insatiable need for companies to accumulate and analyze large, fast-growing quantities of data that was beyond the critical data

    Data Exfiltration:A Review of External Attack Vectors and Countermeasures

    Get PDF
    AbstractContext One of the main targets of cyber-attacks is data exfiltration, which is the leakage of sensitive or private data to an unauthorized entity. Data exfiltration can be perpetrated by an outsider or an insider of an organization. Given the increasing number of data exfiltration incidents, a large number of data exfiltration countermeasures have been developed. These countermeasures aim to detect, prevent, or investigate exfiltration of sensitive or private data. With the growing interest in data exfiltration, it is important to review data exfiltration attack vectors and countermeasures to support future research in this field. Objective This paper is aimed at identifying and critically analysing data exfiltration attack vectors and countermeasures for reporting the status of the art and determining gaps for future research. Method We have followed a structured process for selecting 108 papers from seven publication databases. Thematic analysis method has been applied to analyse the extracted data from the reviewed papers. Results We have developed a classification of (1) data exfiltration attack vectors used by external attackers and (2) the countermeasures in the face of external attacks. We have mapped the countermeasures to attack vectors. Furthermore, we have explored the applicability of various countermeasures for different states of data (i.e., in use, in transit, or at rest). Conclusion This review has revealed that (a) most of the state of the art is focussed on preventive and detective countermeasures and significant research is required on developing investigative countermeasures that are equally important; (b) Several data exfiltration countermeasures are not able to respond in real-time, which specifies that research efforts need to be invested to enable them to respond in real-time (c) A number of data exfiltration countermeasures do not take privacy and ethical concerns into consideration, which may become an obstacle in their full adoption (d) Existing research is primarily focussed on protecting data in ‘in use’ state, therefore, future research needs to be directed towards securing data in ‘in rest’ and ‘in transit’ states (e) There is no standard or framework for evaluation of data exfiltration countermeasures. We assert the need for developing such an evaluation framework

    Intranet of the future: functional study, comparison of products and practical implementation

    Get PDF
    Future intranet: functional study, comparison of products and practical implementation 1. Introduction The project has fulfilled three goals: 1) To perform a study of the functionalities which have to be covered in a modern intranet (web 2.0, unified communication, collaboration, etc) 2) To perform a comparison of tools of the market which can be used to implement intranets (commercial and open source products) 3) To test three of these tools (Oracle WebCenter, Liferay Portal and Microsoft SharePoint) and develop a prototype with Oracle WebCenter. In addition, it includes a research about the evolution of the Intranets among the time, as well as a work to discover the current state of this kind of platforms over the entire world. In this introductory research it is also convenient to include other topics which are not strictly technical involving the use of this Intranet. To be more concrete, there is an analysis of the importance of the human role and management of the Intranet, the process of deploying a new Intranet in an organization and methods to evaluate the performance of this new system.   2. Functional study The approach taken to fulfil this goal is to develop a theoretical model describing the relationship between the Intranet and its users, and a complete set of functionalities which could be covered in the Intranet of the future. These functionalities are categorized in groups. The project describes these groups and the functionalities included on them. 3. Comparison of products The project will describe and compare several technologies which can be used to develop an Intranet that we have previously modelled. The purpose here is to discover the strong points and weaknesses of each technology if it was used to develop the Intranet we desire. After having done such a review, the project focuses on three technologies and performs an extensive evaluation of them. Finally, an extensive comparison between these three technologies is done, highlighting where they offer better solutions and performance compared to the other possibilities. 4. Practical implementation The project focuses on three technologies: Oracle WebCenter, Liferay Portal and Microsoft SharePoint. Then, a prototype which covers a set of functionalities of the modelled Intranet has been built with Oracle WebCenter

    A new technique for intelligent web personal recommendation

    Get PDF
    Personal recommendation systems nowadays are very important in web applications because of the available huge volume of information on the World Wide Web, and the necessity to save users’ time, and provide appropriate desired information, knowledge, items, etc. The most popular recommendation systems are collaborative filtering systems, which suffer from certain problems such as cold-start, privacy, user identification, and scalability. In this thesis, we suggest a new method to solve the cold start problem taking into consideration the privacy issue. The method is shown to perform very well in comparison with alternative methods, while having better properties regarding user privacy. The cold start problem covers the situation when recommendation systems have not sufficient information about a new user’s preferences (the user cold start problem), as well as the case of newly added items to the system (the item cold start problem), in which case the system will not be able to provide recommendations. Some systems use users’ demographical data as a basis for generating recommendations in such cases (e.g. the Triadic Aspect method), but this solves only the user cold start problem and enforces user’s privacy. Some systems use users’ ’stereotypes’ to generate recommendations, but stereotypes often do not reflect the actual preferences of individual users. While some other systems use user’s ’filterbots’ by injecting pseudo users or bots into the system and consider these as existing ones, but this leads to poor accuracy. We propose the active node method, that uses previous and recent users’ browsing targets and browsing patterns to infer preferences and generate recommendations (node recommendations, in which a single suggestion is given, and batch recommendations, in which a set of possible target nodes are shown to the user at once). We compare the active node method with three alternative methods (Triadic Aspect Method, Naïve Filterbots Method, and MediaScout Stereotype Method), and we used a dataset collected from online web news to generate recommendations based on our method and based on the three alternative methods. We calculated the levels of novelty, coverage, and precision in these experiments, and we found that our method achieves higher levels of novelty in batch recommendation while achieving higher levels of coverage and precision in node recommendations comparing to these alternative methods. Further, we develop a variant of the active node method that incorporates semantic structure elements. A further experimental evaluation with real data and users showed that semantic node recommendation with the active node method achieved higher levels of novelty than nonsemantic node recommendation, and semantic-batch recommendation achieved higher levels of coverage and precision than non-semantic batch recommendation
    corecore