6 research outputs found

    Discovering Potential User Browsing Behaviors Using Custom-Built Apriori Algorithm

    Full text link

    User Identification, Classification and Recommendation in Web Usage Mining -An Approach for Personalized Web Mining

    Get PDF
    Abstract In recent years, Web Analytics (WA) is turning out to be an emerging research topic due to the extensive advancements in the techniques that aid in accessing the web contents, which millions of people have shared on the web. The information that has connection to the theme being searched may not be recognized always, if the personalization system operates in accordance with the usage-dependent outcomes alone. In this research work a new method is introduced for Personalized Web Search system, wherein, the users are enabled to have access to the relevant web pages as per their choice from the URL list. The first stage of this research deals with Semantic Web Personalization, which provides a merging between the content semantics as well as the usage data that are stated as ontology terms. This system supports the computation of the navigational patterns that are semantically improvised, so that constructive recommendations can be successfully engendered. It can be perceived that no other systems excluding the semantic web personalization system described here is employed in nonsemantic web sites. The second stage of the work is to assist in augmenting the quality of the recommendations depending on the structure lying beneath the website. Finally, the testing is achieved through the utilization of a prolonged database link. The analysis of variation that exists among the different classes of parameters is made later, when the privacy is formulated using the memory usage and the period of execution

    Workload characterization and customer interaction at e-commerce web servers

    Get PDF
    Electronic commerce servers have a significant presence in today's Internet. Corporations want to maintain high availability, sufficient capacity, and satisfactory performance for their E-commerce Web systems, and want to provide satisfactory services to customers. Workload characterization and the analysis of customers' interactions with Web sites are the bases upon which to analyze server performance, plan system capacity, manage system resources, and personalize services at the Web site. To date, little empirical evidence has been discovered that identifies the characteristics for Web workloads of E-commerce systems and the behaviours of customers. This thesis analyzes the Web access logs at public Web sites for three organizations: a car rental company, an IT company, and the Computer Science department of the University of Saskatchewan. In these case studies, the characteristics of Web workloads are explored at the request level, functionlevel, resource level, and session level; customers' interactions with Web sites are analyzed by identifying and characterizing session groups. The main E-commerce Web workload characteristics and performance implications are: i) The requests for dynamic Web objects are an important part of the workload. These requests should be characterized separately since the system processes them differently; ii) Some popular image files, which are embedded in the same Web page, are always requested together. If these files are requested and sent in a bundle, a system will greatly reduce the overheads in processing requests for these files; iii) The percentage of requests for each Web page category tends to be stable in the workload when the time scale is large enough. This observation is helpful in forecasting workload composition; iv) the Secure Socket Layer protocol (SSL) is heavily used and most Web objects are either requested primarily through SSL or primarily not through SSL; and v) Session groups of different characteristics are identified for all logs. The analysis of session groups may be helpful in improving system performance, maximizing revenue throughput of the system, providing better services to customers, and managing and planning system resources. A hybrid clustering algorithm, which is a combination of the minimum spanning tree method and k-means clustering algorithm, is proposed to identify session clusters. Session clusters obtained using the three session representations Pages Requested, Navigation Pattern, and Resource Usage are similar enough so that it is possible to use different session representations interchangeably to produce similar groupings. The grouping based on one session representation is believed to be sufficient to answer questions in server performance, resource management, capacity planning and Web site personalization, which previously would have required multiple different groupings. Grouping by Pages Requested is recommended since it is the simplest and data on Web pages requested is relatively easy to obtain in HTTP logs

    Identificação biométrica e comportamental de utilizadores em cenários de intrusão

    Get PDF
    Dissertação de mestrado em Engenharia InformáticaA usurpação de contas e o roubo de identidade são problemas muito frequentes nos atuais sistemas informáticos. A facilidade de acesso à internet e a exposição das pessoas a este meio, torna muito frequente a utilização indevida e a usurpação de contas (tais como: e-mail, redes sociais, contas bancárias) por outras pessoas que não as suas legítimas proprietárias. Atualmente o método de autenticação dominante é o da combinação nome de utilizador e palavra-chave. No entanto, este método pode não ser fiável, pois estas credenciais podem ser partilhadas, roubadas ou até esquecidas. Por outro lado podem-se combinar várias técnicas para reforçar a segurança dos sistemas. Cartões de acesso (tokens), certificados digitais e biometrias são algumas delas. Os cartões de acesso, por exemplo os das caixas multibanco, podem ser roubados ou duplicados, como é frequentemente noticiado em fraudes bancárias. Os certificados seguem o mesmo caminho dos tokens uma vez que estes podem ser distribuídos por correio eletrónico ou em dispositivos USB. As biometrias físicas (impressão digital, íris, retina ou geometria da mão por exemplo), para além de serem um pouco intrusivas, requerem a aquisição de equipamento caro. Uma possível solução para os problemas inumerados são as biometrias comportamentais. A forma como nos comportamos e agimos num computador pode ser usada como informação biométrica. Esta informação pode ser utilizada à posteriori, geralmente complementada com mais dados, para identificar, inequivocamente, (ou pelo menos com um determinado grau de confiança) um indivíduo. A informação recolhida pode variar desde o tipo de escrita no teclado, habilidade com o rato, hábitos, cliques, número de páginas abertas, origem do acesso, etc., que depois será sujeita à utilização de algoritmos comportamentais para autenticar, de forma inequívoca, um utilizador. Neste trabalho pretende-se implementar como reforço aos atuais sistemas de autenticação e de deteção de intrusões, a verificação de perfis comportamentais do proprietário da conta. Este sistema não irá apresentar grandes custos, já que só serão usados equipamentos básicos, e será completamente invisível para o utilizador, ou seja este será continuamente autenticado de forma silenciosa e não intrusiva.Session hijacking and identity theft are a problem increasingly common in computer systems nowadays. With the growing usage of online services, people become more exposed to different techniques, technological or social, that can be used to easy to their personal accounts, from services such as Emails, Facebook, bank accounts, among others. Currently, the dominant method of authentication is the combination of username and password. This method can be unreliable, because these credentials can be shared, forgotten or stolen. To offer better authentication mechanisms, other techniques are used; among then are the tokens or digital certificates and biometrics. None of them completely solve the problem once they can be duplicated or stolen. Moreover the physiological biometrics (fingerprint, iris, retina, hand geometry, etc.) are intrusive, require the purchase of expensive equipment and may not work in all the scenarios. The way we behave and act in a computer can be used as biometric information. This information supplemented with more data (i.e. contextual data) can be used to identify unequivocally (or at least with a certain degree of confidence) an individual. The information collected may vary from the way of typing on a keyboard (keystroke dynamics), skill with the mouse (mouse dynamics), habits, clicks, number of pages open, source access, etc., which will then be subject to the use of behavioral algorithms to identify and authenticate, unequivocally, the user. In this work we present the implementation of a system that strengthens existing authentication and intrusion detection systems, helping them by checking behavioral profiles of the account owner. This system will not be costly, since it only uses basic hardware. Additionally, it will be completely invisible to the user, i.e., it will be working in an unobtrusive way, collecting data in background mode. The aim of this paper is to present a system capable of recognizing biometric patterns and, through behavioral algorithms and complex event processing, create user profiles that are used as identification and continuously authentication to services

    Web structure mining of dynamic pages

    Get PDF
    Web structure mining in static web contents decreases the accuracy of mined outcomes and affects the quality of decision making activity. By structure mining in web hidden data, the accuracy ratio of mined outcomes can be improved, thus enhancing the reliability and quality of decision making activity. Data Mining is an automated or semi automated exploration and analysis of large volume of data in order to reveal meaningful patterns. The term web mining is the discovery and analysis of useful information from World Wide Web that helps web search engines to find high quality web pages and enhances web click stream analysis. One branch of web mining is web structure mining. The goal of which is to generate structural summary about the Web site and Web pages. Web structure mining tries to discover the link structure of the hyperlinks at the inter-document level. In recent years, Web link structure mining has been widely used to infer important information about Web pages. But a major part of the web is in hidden form, also called Deep Web or Hidden Web that refers to documents on the Web that are dynamic and not accessible by general search engines; most search engine spiders can access only publicly index able Web (or the visible Web). Most documents in the hidden Web, including pages hidden behind search forms, specialized databases, and dynamically generated Web pages, are not accessible by general Web mining applications. Dynamic content generation is used in modern web pages and user forms are used to get information from a particular user and stored in a database. The link structure lying in these forms can not be accessed during conventional mining procedures. To access these links, user forms are filled automatically by using a rule based framework which has robust ability to read a web page containing dynamic contents as activeX controls like input boxes, command buttons, combo boxes, etc. After reading these controls dummy values are filled in the available fields and the doGet or doPost methods are automatically executed to acquire the link of next subsequent web page. The accuracy ratio of web page hierarchical structures can phenomenally be improved by including these hidden web pages in the process of Web structure mining. The designed system framework is adequately strong to process the dynamic Web pages along with static ones

    A Framework for Personal Web Usage Mining

    No full text
    In this paper, we propose to mine Web usage data on client side, or personal Web usage mining, as a complement to the server side Web usage mining. By mining client side Web usage data, more complete knowledge about Web usage can be obtained. A framework for personal Web usage mining is proposed. Some related issues and applications of personal Web usage mining are also discussed
    corecore