1,792 research outputs found
Web usage mining for click fraud detection
Estágio realizado na AuditMark e orientado pelo Eng.º Pedro FortunaTese de mestrado integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 201
A Comprehensive Survey of Data Mining-based Fraud Detection Research
This survey paper categorises, compares, and summarises from almost all
published technical and review articles in automated fraud detection within the
last 10 years. It defines the professional fraudster, formalises the main types
and subtypes of known fraud, and presents the nature of data evidence collected
within affected industries. Within the business context of mining the data to
achieve higher cost savings, this research presents methods and techniques
together with their problems. Compared to all related reviews on fraud
detection, this survey covers much more technical articles and is the only one,
to the best of our knowledge, which proposes alternative data and solutions
from related domains.Comment: 14 page
AudioPairBank: Towards A Large-Scale Tag-Pair-Based Audio Content Analysis
Recently, sound recognition has been used to identify sounds, such as car and
river. However, sounds have nuances that may be better described by
adjective-noun pairs such as slow car, and verb-noun pairs such as flying
insects, which are under explored. Therefore, in this work we investigate the
relation between audio content and both adjective-noun pairs and verb-noun
pairs. Due to the lack of datasets with these kinds of annotations, we
collected and processed the AudioPairBank corpus consisting of a combined total
of 1,123 pairs and over 33,000 audio files. One contribution is the previously
unavailable documentation of the challenges and implications of collecting
audio recordings with these type of labels. A second contribution is to show
the degree of correlation between the audio content and the labels through
sound recognition experiments, which yielded results of 70% accuracy, hence
also providing a performance benchmark. The results and study in this paper
encourage further exploration of the nuances in audio and are meant to
complement similar research performed on images and text in multimedia
analysis.Comment: This paper is a revised version of "AudioSentibank: Large-scale
Semantic Ontology of Acoustic Concepts for Audio Content Analysis
Graph Mining for Cybersecurity: A Survey
The explosive growth of cyber attacks nowadays, such as malware, spam, and
intrusions, caused severe consequences on society. Securing cyberspace has
become an utmost concern for organizations and governments. Traditional Machine
Learning (ML) based methods are extensively used in detecting cyber threats,
but they hardly model the correlations between real-world cyber entities. In
recent years, with the proliferation of graph mining techniques, many
researchers investigated these techniques for capturing correlations between
cyber entities and achieving high performance. It is imperative to summarize
existing graph-based cybersecurity solutions to provide a guide for future
studies. Therefore, as a key contribution of this paper, we provide a
comprehensive review of graph mining for cybersecurity, including an overview
of cybersecurity tasks, the typical graph mining techniques, and the general
process of applying them to cybersecurity, as well as various solutions for
different cybersecurity tasks. For each task, we probe into relevant methods
and highlight the graph types, graph approaches, and task levels in their
modeling. Furthermore, we collect open datasets and toolkits for graph-based
cybersecurity. Finally, we outlook the potential directions of this field for
future research
Mining a Small Medical Data Set by Integrating the Decision Tree and t-test
[[abstract]]Although several researchers have used statistical methods to prove that aspiration followed by the injection of 95% ethanol left in situ (retention) is an effective treatment for ovarian endometriomas, very few discuss the different conditions that could generate different recovery rates for the patients. Therefore, this study adopts the statistical method and decision tree techniques together to analyze the postoperative status of ovarian endometriosis patients under different conditions. Since our collected data set is small, containing only 212 records, we use all of these data as the training data. Therefore, instead of using a resultant tree to generate rules directly, we use the value of each node as a cut point to generate all possible rules from the tree first. Then, using t-test, we verify the rules to discover some useful description rules after all possible rules from the tree have been generated. Experimental results show that our approach can find some new interesting knowledge about recurrent ovarian endometriomas under different conditions.[[journaltype]]國外[[incitationindex]]EI[[booktype]]紙本[[countrycodes]]FI
Detecting Structure In Chaos: A Customer Process Analysis Method
Detecting typical patterns in customer processes is the precondition for gaining an understanding about customer issues and needs in the course of performing their processes. Such insights can be translated into customer-centric service offerings that provide added value by enabling customers to reach their process objectives more effectively and rapidly, and with less effort. However, customer processes performed in less restrictive environments are extremely heterogeneous, which makes them difficult to analyse. Current approaches deal with this issue by considering customer processes in large scope and low detail, or vice versa. However, both views are required to understand customer processes comprehensively. Therefore, we present a novel customer process analysis method capable of detecting the hidden activity-cluster structure of customer processes. Consequently, both the detailed level of process activities and the aggregated cluster level are available for customer process analysis, which increases the chances of detecting patterns in these heterogeneous processes. We apply the method to two datasets and evaluate the results’ validity and utility. Moreover, we demonstrate that the method outperforms alternative solution technologies. Finally, we provide new insights into customer process theory
Applications of Virtual Reality
Information Technology is growing rapidly. With the birth of high-resolution graphics, high-speed computing and user interaction devices Virtual Reality has emerged as a major new technology in the mid 90es, last century. Virtual Reality technology is currently used in a broad range of applications. The best known are games, movies, simulations, therapy. From a manufacturing standpoint, there are some attractive applications including training, education, collaborative work and learning. This book provides an up-to-date discussion of the current research in Virtual Reality and its applications. It describes the current Virtual Reality state-of-the-art and points out many areas where there is still work to be done. We have chosen certain areas to cover in this book, which we believe will have potential significant impact on Virtual Reality and its applications. This book provides a definitive resource for wide variety of people including academicians, designers, developers, educators, engineers, practitioners, researchers, and graduate students
Prometheus: a generic e-commerce crawler for the study of business markets and other e-commerce problems
Dissertação de mestrado em Computer ScienceThe continuous social and economic development has led over time to an increase in consumption,
as well as greater demand from the consumer for better and cheaper products.
Hence, the selling price of a product assumes a fundamental role in the purchase decision
by the consumer. In this context, online stores must carefully analyse and define the best
price for each product, based on several factors such as production/acquisition cost, positioning
of the product (e.g. anchor product) and the competition companies strategy. The
work done by market analysts changed drastically over the last years.
As the number of Web sites increases exponentially, the number of E-commerce web
sites also prosperous. Web page classification becomes more important in fields like Web
mining and information retrieval. The traditional classifiers are usually hand-crafted and
non-adaptive, that makes them inappropriate to use in a broader context. We introduce an
ensemble of methods and the posterior study of its results to create a more generic and
modular crawler and scraper for detection and information extraction on E-commerce web
pages. The collected information may then be processed and used in the pricing decision.
This framework goes by the name Prometheus and has the goal of extracting knowledge
from E-commerce Web sites.
The process requires crawling an online store and gathering product pages. This implies
that given a web page the framework must be able to determine if it is a product page.
In order to achieve this we classify the pages in three categories: catalogue, product and
”spam”. The page classification stage was addressed based on the html text as well as on
the visual layout, featuring both traditional methods and Deep Learning approaches.
Once a set of product pages has been identified we proceed to the extraction of the pricing
information. This is not a trivial task due to the disparity of approaches to create a web
page. Furthermore, most product pages are dynamic in the sense that they are truly a page
for a family of related products. For instance, when visiting a shoe store, for a particular
model there are probably a number of sizes and colours available. Such a model may be
displayed in a single dynamic web page making it necessary for our framework to explore
all the relevant combinations. This process is called scraping and is the last stage of the
Prometheus framework.O contínuo desenvolvimento social e económico tem conduzido ao longo do tempo a um
aumento do consumo, assim como a uma maior exigência do consumidor por produtos
melhores e mais baratos. Naturalmente, o preço de venda de um produto assume um papel
fundamental na decisão de compra por parte de um consumidor. Nesse sentido, as lojas
online precisam de analisar e definir qual o melhor preço para cada produto, tendo como
base diversos fatores, tais como o custo de produção/venda, posicionamento do produto
(e.g. produto âncora) e as próprias estratégias das empresas concorrentes. O trabalho dos
analistas de mercado mudou drasticamente nos últimos anos.
O crescimento de sites na Web tem sido exponencial, o número de sites E-commerce
também tem prosperado. A classificação de páginas da Web torna-se cada vez mais importante,
especialmente em campos como mineração de dados na Web e coleta/extração
de informações. Os classificadores tradicionais são geralmente feitos manualmente e não
adaptativos, o que os torna inadequados num contexto mais amplo. Nós introduzimos
um conjunto de métodos e o estudo posterior dos seus resultados para criar um crawler
e scraper mais genéricos e modulares para extração de conhecimento em páginas de Ecommerce.
A informação recolhida pode então ser processada e utilizada na tomada de
decisão sobre o preço de venda. Esta Framework chama-se Prometheus e tem como intuito
extrair conhecimento de Web sites de E-commerce.
Este processo necessita realizar a navegação sobre lojas online e armazenar páginas de
produto. Isto implica que dado uma página web a framework seja capaz de determinar
se é uma página de produto. Para atingir este objetivo nós classificamos as páginas em
três categorias: catálogo, produto e spam. A classificação das páginas foi realizada tendo
em conta o html e o aspeto visual das páginas, utilizando tanto métodos tradicionais como
Deep Learning.
Depois de identificar um conjunto de páginas de produto procedemos à extração de
informação sobre o preço. Este processo não é trivial devido à quantidade de abordagens
possíveis para criar uma página web. A maioria dos produtos são dinâmicos no sentido
em que um produto é na realidade uma família de produtos relacionados. Por exemplo,
quando visitamos uma loja online de sapatos, para um modelo em especifico existe
a provavelmente um conjunto de tamanhos e cores disponíveis. Esse modelo pode ser
apresentado numa única página dinâmica fazendo com que seja necessário para a nossa
Framework explorar estas combinações relevantes. Este processo é chamado de scraping e
é o último passo da Framework Prometheus
Products and Services
Today’s global economy offers more opportunities, but is also more complex and competitive than ever before. This fact leads to a wide range of research activity in different fields of interest, especially in the so-called high-tech sectors. This book is a result of widespread research and development activity from many researchers worldwide, covering the aspects of development activities in general, as well as various aspects of the practical application of knowledge
- …