6,057 research outputs found
Multi-dimensional clustering in user profiling
User profiling has attracted an enormous number of technological methods and
applications. With the increasing amount of products and services, user profiling
has created opportunities to catch the attention of the user as well as achieving
high user satisfaction. To provide the user what she/he wants, when and how,
depends largely on understanding them. The user profile is the representation of
the user and holds the information about the user. These profiles are the
outcome of the user profiling.
Personalization is the adaptation of the services to meet the user’s needs and
expectations. Therefore, the knowledge about the user leads to a personalized
user experience. In user profiling applications the major challenge is to build and
handle user profiles. In the literature there are two main user profiling methods,
collaborative and the content-based. Apart from these traditional profiling
methods, a number of classification and clustering algorithms have been used
to classify user related information to create user profiles. However, the profiling,
achieved through these works, is lacking in terms of accuracy. This is because,
all information within the profile has the same influence during the profiling even
though some are irrelevant user information.
In this thesis, a primary aim is to provide an insight into the concept of user
profiling. For this purpose a comprehensive background study of the literature
was conducted and summarized in this thesis. Furthermore, existing user
profiling methods as well as the classification and clustering algorithms were investigated. Being one of the objectives of this study, the use of these
algorithms for user profiling was examined. A number of classification and
clustering algorithms, such as Bayesian Networks (BN) and Decision Trees
(DTs) have been simulated using user profiles and their classification accuracy
performances were evaluated. Additionally, a novel clustering algorithm for the
user profiling, namely Multi-Dimensional Clustering (MDC), has been proposed.
The MDC is a modified version of the Instance Based Learner (IBL) algorithm.
In IBL every feature has an equal effect on the classification regardless of their
relevance. MDC differs from the IBL by assigning weights to feature values to
distinguish the effect of the features on clustering. Existing feature weighing
methods, for instance Cross Category Feature (CCF), has also been
investigated. In this thesis, three feature value weighting methods have been
proposed for the MDC. These methods are; MDC weight method by Cross
Clustering (MDC-CC), MDC weight method by Balanced Clustering (MDC-BC)
and MDC weight method by changing the Lower-limit to Zero (MDC-LZ). All of
these weighted MDC algorithms have been tested and evaluated. Additional
simulations were carried out with existing weighted and non-weighted IBL
algorithms (i.e. K-Star and Locally Weighted Learning (LWL)) in order to
demonstrate the performance of the proposed methods. Furthermore, a real life scenario is implemented to show how the MDC can be used for the user
profiling to improve personalized service provisioning in mobile environments.
The experiments presented in this thesis were conducted by using user profile
datasets that reflect the user’s personal information, preferences and interests.
The simulations with existing classification and clustering algorithms (e.g. Bayesian Networks (BN), NaĂŻve Bayesian (NB), Lazy learning of Bayesian
Rules (LBR), Iterative Dichotomister 3 (Id3)) were performed on the WEKA
(version 3.5.7) machine learning platform. WEKA serves as a workbench to
work with a collection of popular learning schemes implemented in JAVA. In
addition, the MDC-CC, MDC-BC and MDC-LZ have been implemented on
NetBeans IDE 6.1 Beta as a JAVA application and MATLAB. Finally, the real life
scenario is implemented as a Java Mobile Application (Java ME) on NetBeans
IDE 7.1. All simulation results were evaluated based on the error rate and
accuracy
Website Personalization Based on Demographic Data
This study focuses on websites personalization based on user's demographic data. The
main demographic data that used in this study are age, gender, race and occupation.
These data is obtained through user profiling technique conducted during the study.
Analysis of the data gathered is done to find the relationship between the user's
demographic data and their preferences for a website design. These data will be used as a
guideline in order to develop a website that will fulfill the visitor's need. The topic chose
was Obesity. HCI issues are considered as one of the important factors in this study
which are effectiveness and satisfaction. The methodologies used are website
personalization process, incremental model, combination of these two methods and
Cascading Style Sheet (CSS) which discussed detail in Chapter 3. After that, we will be
discussing the effectiveness and evaluation of the personalization website that have been
built. Last but not least, there will be conclusion that present the result of evaluation of
the websites made by the respondents
Web Data Extraction, Applications and Techniques: A Survey
Web Data Extraction is an important problem that has been studied by means of
different scientific tools and in a broad range of applications. Many
approaches to extracting data from the Web have been designed to solve specific
problems and operate in ad-hoc domains. Other approaches, instead, heavily
reuse techniques and algorithms developed in the field of Information
Extraction.
This survey aims at providing a structured and comprehensive overview of the
literature in the field of Web Data Extraction. We provided a simple
classification framework in which existing Web Data Extraction applications are
grouped into two main classes, namely applications at the Enterprise level and
at the Social Web level. At the Enterprise level, Web Data Extraction
techniques emerge as a key tool to perform data analysis in Business and
Competitive Intelligence systems as well as for business process
re-engineering. At the Social Web level, Web Data Extraction techniques allow
to gather a large amount of structured data continuously generated and
disseminated by Web 2.0, Social Media and Online Social Network users and this
offers unprecedented opportunities to analyze human behavior at a very large
scale. We discuss also the potential of cross-fertilization, i.e., on the
possibility of re-using Web Data Extraction techniques originally designed to
work in a given domain, in other domains.Comment: Knowledge-based System
CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap
After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in
multimedia search engines, we have identified and analyzed gaps within European research effort during our second year.
In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio-
economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown
of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on
requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the
community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our
Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as
National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core
technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research
challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal
challenges
AI Solutions for MDS: Artificial Intelligence Techniques for Misuse Detection and Localisation in Telecommunication Environments
This report considers the application of Articial Intelligence (AI) techniques to
the problem of misuse detection and misuse localisation within telecommunications
environments. A broad survey of techniques is provided, that covers inter alia
rule based systems, model-based systems, case based reasoning, pattern matching,
clustering and feature extraction, articial neural networks, genetic algorithms, arti
cial immune systems, agent based systems, data mining and a variety of hybrid
approaches. The report then considers the central issue of event correlation, that
is at the heart of many misuse detection and localisation systems. The notion of
being able to infer misuse by the correlation of individual temporally distributed
events within a multiple data stream environment is explored, and a range of techniques,
covering model based approaches, `programmed' AI and machine learning
paradigms. It is found that, in general, correlation is best achieved via rule based approaches,
but that these suffer from a number of drawbacks, such as the difculty of
developing and maintaining an appropriate knowledge base, and the lack of ability
to generalise from known misuses to new unseen misuses. Two distinct approaches
are evident. One attempts to encode knowledge of known misuses, typically within
rules, and use this to screen events. This approach cannot generally detect misuses
for which it has not been programmed, i.e. it is prone to issuing false negatives.
The other attempts to `learn' the features of event patterns that constitute normal
behaviour, and, by observing patterns that do not match expected behaviour, detect
when a misuse has occurred. This approach is prone to issuing false positives,
i.e. inferring misuse from innocent patterns of behaviour that the system was not
trained to recognise. Contemporary approaches are seen to favour hybridisation,
often combining detection or localisation mechanisms for both abnormal and normal
behaviour, the former to capture known cases of misuse, the latter to capture
unknown cases. In some systems, these mechanisms even work together to update
each other to increase detection rates and lower false positive rates. It is concluded
that hybridisation offers the most promising future direction, but that a rule or state
based component is likely to remain, being the most natural approach to the correlation
of complex events. The challenge, then, is to mitigate the weaknesses of
canonical programmed systems such that learning, generalisation and adaptation
are more readily facilitated
Performance Evaluation of Network Anomaly Detection Systems
Nowadays, there is a huge and growing concern about security in information and communication
technology (ICT) among the scientific community because any attack or anomaly in
the network can greatly affect many domains such as national security, private data storage,
social welfare, economic issues, and so on. Therefore, the anomaly detection domain is a broad
research area, and many different techniques and approaches for this purpose have emerged
through the years.
Attacks, problems, and internal failures when not detected early may badly harm an
entire Network system. Thus, this thesis presents an autonomous profile-based anomaly detection
system based on the statistical method Principal Component Analysis (PCADS-AD). This
approach creates a network profile called Digital Signature of Network Segment using Flow Analysis
(DSNSF) that denotes the predicted normal behavior of a network traffic activity through
historical data analysis. That digital signature is used as a threshold for volume anomaly detection
to detect disparities in the normal traffic trend. The proposed system uses seven traffic flow
attributes: Bits, Packets and Number of Flows to detect problems, and Source and Destination IP
addresses and Ports, to provides the network administrator necessary information to solve them.
Via evaluation techniques, addition of a different anomaly detection approach, and
comparisons to other methods performed in this thesis using real network traffic data, results
showed good traffic prediction by the DSNSF and encouraging false alarm generation and detection
accuracy on the detection schema.
The observed results seek to contribute to the advance of the state of the art in methods
and strategies for anomaly detection that aim to surpass some challenges that emerge from
the constant growth in complexity, speed and size of today’s large scale networks, also providing
high-value results for a better detection in real time.Atualmente, existe uma enorme e crescente preocupação com segurança em tecnologia
da informação e comunicação (TIC) entre a comunidade cientĂfica. Isto porque qualquer
ataque ou anomalia na rede pode afetar a qualidade, interoperabilidade, disponibilidade, e integridade
em muitos domĂnios, como segurança nacional, armazenamento de dados privados,
bem-estar social, questões econômicas, e assim por diante. Portanto, a deteção de anomalias
é uma ampla área de pesquisa, e muitas técnicas e abordagens diferentes para esse propósito
surgiram ao longo dos anos.
Ataques, problemas e falhas internas quando nĂŁo detetados precocemente podem prejudicar
gravemente todo um sistema de rede. Assim, esta Tese apresenta um sistema autĂ´nomo
de deteção de anomalias baseado em perfil utilizando o mĂ©todo estatĂstico Análise de Componentes
Principais (PCADS-AD). Essa abordagem cria um perfil de rede chamado Assinatura Digital
do Segmento de Rede usando Análise de Fluxos (DSNSF) que denota o comportamento normal
previsto de uma atividade de tráfego de rede por meio da análise de dados históricos. Essa
assinatura digital é utilizada como um limiar para deteção de anomalia de volume e identificar
disparidades na tendência de tráfego normal. O sistema proposto utiliza sete atributos de fluxo
de tráfego: bits, pacotes e número de fluxos para detetar problemas, além de endereços IP e
portas de origem e destino para fornecer ao administrador de rede as informações necessárias
para resolvĂŞ-los.
Por meio da utilização de métricas de avaliação, do acrescimento de uma abordagem
de deteção distinta da proposta principal e comparações com outros métodos realizados nesta
tese usando dados reais de tráfego de rede, os resultados mostraram boas previsões de tráfego
pelo DSNSF e resultados encorajadores quanto a geração de alarmes falsos e precisão de deteção.
Com os resultados observados nesta tese, este trabalho de doutoramento busca contribuir
para o avanço do estado da arte em métodos e estratégias de deteção de anomalias,
visando superar alguns desafios que emergem do constante crescimento em complexidade, velocidade
e tamanho das redes de grande porte da atualidade, proporcionando também alta
performance. Ainda, a baixa complexidade e agilidade do sistema proposto contribuem para
que possa ser aplicado a deteção em tempo real
Analysis Guided Visual Exploration of Multivariate Data
Visualization systems traditionally focus on graphical representation of information. They tend not to provide integrated analytical services that could aid users in tackling complex knowledge discovery tasks. Users¡¯ exploration in such environments is usually impeded due to several problems: 1) Valuable information is hard to discover, when too much data is visualized on the screen. 2) They have to manage and organize their discoveries off line, because no systematic discovery management mechanism exists. 3) Their discoveries based on visual exploration alone may lack accuracy. 4) They have no convenient access to the important knowledge learned by other users. To tackle these problems, it has been recognized that analytical tools must be introduced into visualization systems. In this paper, we present a novel analysis-guided exploration system, called the Nugget Management System (NMS). It leverages the collaborative effort of human comprehensibility and machine computations to facilitate users¡¯ visual exploration process. Specifically, NMS first extracts the valuable information (nuggets) hidden in datasets based on the interests of users. Given that similar nuggets may be re-discovered by different users, NMS consolidates the nugget candidate set by clustering based on their semantic similarity. To solve the problem of inaccurate discoveries, data mining techniques are applied to refine the nuggets to best represent the patterns existing in datasets. Lastly, the resulting well-organized nugget pool is used to guide users¡¯ exploration. To evaluate the effectiveness of NMS, we integrated NMS into XmdvTool, a freeware multivariate visualization system. User studies were performed to compare the users¡¯ efficiency and accuracy of finishing tasks on real datasets, with and without the help of NMS. Our user studies confirmed the effectiveness of NMS. Keywords: Visual Analytics, Visual Knowledg
- …