6,057 research outputs found

    Multi-dimensional clustering in user profiling

    Get PDF
    User profiling has attracted an enormous number of technological methods and applications. With the increasing amount of products and services, user profiling has created opportunities to catch the attention of the user as well as achieving high user satisfaction. To provide the user what she/he wants, when and how, depends largely on understanding them. The user profile is the representation of the user and holds the information about the user. These profiles are the outcome of the user profiling. Personalization is the adaptation of the services to meet the user’s needs and expectations. Therefore, the knowledge about the user leads to a personalized user experience. In user profiling applications the major challenge is to build and handle user profiles. In the literature there are two main user profiling methods, collaborative and the content-based. Apart from these traditional profiling methods, a number of classification and clustering algorithms have been used to classify user related information to create user profiles. However, the profiling, achieved through these works, is lacking in terms of accuracy. This is because, all information within the profile has the same influence during the profiling even though some are irrelevant user information. In this thesis, a primary aim is to provide an insight into the concept of user profiling. For this purpose a comprehensive background study of the literature was conducted and summarized in this thesis. Furthermore, existing user profiling methods as well as the classification and clustering algorithms were investigated. Being one of the objectives of this study, the use of these algorithms for user profiling was examined. A number of classification and clustering algorithms, such as Bayesian Networks (BN) and Decision Trees (DTs) have been simulated using user profiles and their classification accuracy performances were evaluated. Additionally, a novel clustering algorithm for the user profiling, namely Multi-Dimensional Clustering (MDC), has been proposed. The MDC is a modified version of the Instance Based Learner (IBL) algorithm. In IBL every feature has an equal effect on the classification regardless of their relevance. MDC differs from the IBL by assigning weights to feature values to distinguish the effect of the features on clustering. Existing feature weighing methods, for instance Cross Category Feature (CCF), has also been investigated. In this thesis, three feature value weighting methods have been proposed for the MDC. These methods are; MDC weight method by Cross Clustering (MDC-CC), MDC weight method by Balanced Clustering (MDC-BC) and MDC weight method by changing the Lower-limit to Zero (MDC-LZ). All of these weighted MDC algorithms have been tested and evaluated. Additional simulations were carried out with existing weighted and non-weighted IBL algorithms (i.e. K-Star and Locally Weighted Learning (LWL)) in order to demonstrate the performance of the proposed methods. Furthermore, a real life scenario is implemented to show how the MDC can be used for the user profiling to improve personalized service provisioning in mobile environments. The experiments presented in this thesis were conducted by using user profile datasets that reflect the user’s personal information, preferences and interests. The simulations with existing classification and clustering algorithms (e.g. Bayesian Networks (BN), Naïve Bayesian (NB), Lazy learning of Bayesian Rules (LBR), Iterative Dichotomister 3 (Id3)) were performed on the WEKA (version 3.5.7) machine learning platform. WEKA serves as a workbench to work with a collection of popular learning schemes implemented in JAVA. In addition, the MDC-CC, MDC-BC and MDC-LZ have been implemented on NetBeans IDE 6.1 Beta as a JAVA application and MATLAB. Finally, the real life scenario is implemented as a Java Mobile Application (Java ME) on NetBeans IDE 7.1. All simulation results were evaluated based on the error rate and accuracy

    Website Personalization Based on Demographic Data

    Get PDF
    This study focuses on websites personalization based on user's demographic data. The main demographic data that used in this study are age, gender, race and occupation. These data is obtained through user profiling technique conducted during the study. Analysis of the data gathered is done to find the relationship between the user's demographic data and their preferences for a website design. These data will be used as a guideline in order to develop a website that will fulfill the visitor's need. The topic chose was Obesity. HCI issues are considered as one of the important factors in this study which are effectiveness and satisfaction. The methodologies used are website personalization process, incremental model, combination of these two methods and Cascading Style Sheet (CSS) which discussed detail in Chapter 3. After that, we will be discussing the effectiveness and evaluation of the personalization website that have been built. Last but not least, there will be conclusion that present the result of evaluation of the websites made by the respondents

    Web Data Extraction, Applications and Techniques: A Survey

    Full text link
    Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

    CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap

    Get PDF
    After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in multimedia search engines, we have identified and analyzed gaps within European research effort during our second year. In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio- economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal challenges

    AI Solutions for MDS: Artificial Intelligence Techniques for Misuse Detection and Localisation in Telecommunication Environments

    Get PDF
    This report considers the application of Articial Intelligence (AI) techniques to the problem of misuse detection and misuse localisation within telecommunications environments. A broad survey of techniques is provided, that covers inter alia rule based systems, model-based systems, case based reasoning, pattern matching, clustering and feature extraction, articial neural networks, genetic algorithms, arti cial immune systems, agent based systems, data mining and a variety of hybrid approaches. The report then considers the central issue of event correlation, that is at the heart of many misuse detection and localisation systems. The notion of being able to infer misuse by the correlation of individual temporally distributed events within a multiple data stream environment is explored, and a range of techniques, covering model based approaches, `programmed' AI and machine learning paradigms. It is found that, in general, correlation is best achieved via rule based approaches, but that these suffer from a number of drawbacks, such as the difculty of developing and maintaining an appropriate knowledge base, and the lack of ability to generalise from known misuses to new unseen misuses. Two distinct approaches are evident. One attempts to encode knowledge of known misuses, typically within rules, and use this to screen events. This approach cannot generally detect misuses for which it has not been programmed, i.e. it is prone to issuing false negatives. The other attempts to `learn' the features of event patterns that constitute normal behaviour, and, by observing patterns that do not match expected behaviour, detect when a misuse has occurred. This approach is prone to issuing false positives, i.e. inferring misuse from innocent patterns of behaviour that the system was not trained to recognise. Contemporary approaches are seen to favour hybridisation, often combining detection or localisation mechanisms for both abnormal and normal behaviour, the former to capture known cases of misuse, the latter to capture unknown cases. In some systems, these mechanisms even work together to update each other to increase detection rates and lower false positive rates. It is concluded that hybridisation offers the most promising future direction, but that a rule or state based component is likely to remain, being the most natural approach to the correlation of complex events. The challenge, then, is to mitigate the weaknesses of canonical programmed systems such that learning, generalisation and adaptation are more readily facilitated

    Performance Evaluation of Network Anomaly Detection Systems

    Get PDF
    Nowadays, there is a huge and growing concern about security in information and communication technology (ICT) among the scientific community because any attack or anomaly in the network can greatly affect many domains such as national security, private data storage, social welfare, economic issues, and so on. Therefore, the anomaly detection domain is a broad research area, and many different techniques and approaches for this purpose have emerged through the years. Attacks, problems, and internal failures when not detected early may badly harm an entire Network system. Thus, this thesis presents an autonomous profile-based anomaly detection system based on the statistical method Principal Component Analysis (PCADS-AD). This approach creates a network profile called Digital Signature of Network Segment using Flow Analysis (DSNSF) that denotes the predicted normal behavior of a network traffic activity through historical data analysis. That digital signature is used as a threshold for volume anomaly detection to detect disparities in the normal traffic trend. The proposed system uses seven traffic flow attributes: Bits, Packets and Number of Flows to detect problems, and Source and Destination IP addresses and Ports, to provides the network administrator necessary information to solve them. Via evaluation techniques, addition of a different anomaly detection approach, and comparisons to other methods performed in this thesis using real network traffic data, results showed good traffic prediction by the DSNSF and encouraging false alarm generation and detection accuracy on the detection schema. The observed results seek to contribute to the advance of the state of the art in methods and strategies for anomaly detection that aim to surpass some challenges that emerge from the constant growth in complexity, speed and size of today’s large scale networks, also providing high-value results for a better detection in real time.Atualmente, existe uma enorme e crescente preocupação com segurança em tecnologia da informação e comunicação (TIC) entre a comunidade científica. Isto porque qualquer ataque ou anomalia na rede pode afetar a qualidade, interoperabilidade, disponibilidade, e integridade em muitos domínios, como segurança nacional, armazenamento de dados privados, bem-estar social, questões econômicas, e assim por diante. Portanto, a deteção de anomalias é uma ampla área de pesquisa, e muitas técnicas e abordagens diferentes para esse propósito surgiram ao longo dos anos. Ataques, problemas e falhas internas quando não detetados precocemente podem prejudicar gravemente todo um sistema de rede. Assim, esta Tese apresenta um sistema autônomo de deteção de anomalias baseado em perfil utilizando o método estatístico Análise de Componentes Principais (PCADS-AD). Essa abordagem cria um perfil de rede chamado Assinatura Digital do Segmento de Rede usando Análise de Fluxos (DSNSF) que denota o comportamento normal previsto de uma atividade de tráfego de rede por meio da análise de dados históricos. Essa assinatura digital é utilizada como um limiar para deteção de anomalia de volume e identificar disparidades na tendência de tráfego normal. O sistema proposto utiliza sete atributos de fluxo de tráfego: bits, pacotes e número de fluxos para detetar problemas, além de endereços IP e portas de origem e destino para fornecer ao administrador de rede as informações necessárias para resolvê-los. Por meio da utilização de métricas de avaliação, do acrescimento de uma abordagem de deteção distinta da proposta principal e comparações com outros métodos realizados nesta tese usando dados reais de tráfego de rede, os resultados mostraram boas previsões de tráfego pelo DSNSF e resultados encorajadores quanto a geração de alarmes falsos e precisão de deteção. Com os resultados observados nesta tese, este trabalho de doutoramento busca contribuir para o avanço do estado da arte em métodos e estratégias de deteção de anomalias, visando superar alguns desafios que emergem do constante crescimento em complexidade, velocidade e tamanho das redes de grande porte da atualidade, proporcionando também alta performance. Ainda, a baixa complexidade e agilidade do sistema proposto contribuem para que possa ser aplicado a deteção em tempo real

    Analysis Guided Visual Exploration of Multivariate Data

    Get PDF
    Visualization systems traditionally focus on graphical representation of information. They tend not to provide integrated analytical services that could aid users in tackling complex knowledge discovery tasks. Users¡¯ exploration in such environments is usually impeded due to several problems: 1) Valuable information is hard to discover, when too much data is visualized on the screen. 2) They have to manage and organize their discoveries off line, because no systematic discovery management mechanism exists. 3) Their discoveries based on visual exploration alone may lack accuracy. 4) They have no convenient access to the important knowledge learned by other users. To tackle these problems, it has been recognized that analytical tools must be introduced into visualization systems. In this paper, we present a novel analysis-guided exploration system, called the Nugget Management System (NMS). It leverages the collaborative effort of human comprehensibility and machine computations to facilitate users¡¯ visual exploration process. Specifically, NMS first extracts the valuable information (nuggets) hidden in datasets based on the interests of users. Given that similar nuggets may be re-discovered by different users, NMS consolidates the nugget candidate set by clustering based on their semantic similarity. To solve the problem of inaccurate discoveries, data mining techniques are applied to refine the nuggets to best represent the patterns existing in datasets. Lastly, the resulting well-organized nugget pool is used to guide users¡¯ exploration. To evaluate the effectiveness of NMS, we integrated NMS into XmdvTool, a freeware multivariate visualization system. User studies were performed to compare the users¡¯ efficiency and accuracy of finishing tasks on real datasets, with and without the help of NMS. Our user studies confirmed the effectiveness of NMS. Keywords: Visual Analytics, Visual Knowledg
    • …
    corecore