8,131 research outputs found
An interactive human centered data science approach towards crime pattern analysis
The traditional machine learning systems lack a pathway for a human to integrate their domain knowledge into the underlying machine learning algorithms. The utilization of such systems, for domains where decisions can have serious consequences (e.g. medical decision-making and crime analysis), requires the incorporation of human experts' domain knowledge. The challenge, however, is how to effectively incorporate domain expert knowledge with machine learning algorithms to develop effective models for better decision making.
In crime analysis, the key challenge is to identify plausible linkages in unstructured crime reports for the hypothesis formulation. Crime analysts painstakingly perform time-consuming searches of many different structured and unstructured databases to collate these associations without any proper visualization. To tackle these challenges and aiming towards facilitating the crime analysis, in this paper, we examine unstructured crime reports through text mining to extract plausible associations. Specifically, we present associative questioning based searching model to elicit multi-level associations among crime entities. We coupled this model with partition clustering to develop an interactive, human-assisted knowledge discovery and data mining scheme.
The proposed human-centered knowledge discovery and data mining scheme for crime text mining is able to extract plausible associations between crimes, identifying crime pattern, grouping similar crimes, eliciting co-offender network and suspect list based on spatial-temporal and behavioral similarity. These similarities are quantified through calculating Cosine, Jacquard,
and Euclidean distances. Additionally, each suspect is also ranked by a similarity score in the plausible suspect list. These associations are then visualized through creating a two-dimensional re-configurable crime cluster space along with a bipartite knowledge graph. This proposed scheme also inspects the grand challenge of integrating effective human interaction
with the machine learning algorithms through a visualization feedback loop. It allows the analyst to feed his/her domain knowledge including choosing of similarity functions for identifying associations, dynamic feature selection for interactive clustering of crimes and assigning weights to each component of the crime pattern to rank suspects for an unsolved crime.
We demonstrate the proposed scheme through a case study using the Anonymized burglary dataset. The scheme is found to facilitate human reasoning and analytic discourse for intelligence analysis
Role of Semantic web in the changing context of Enterprise Collaboration
In order to compete with the global giants, enterprises are concentrating on
their core competencies and collaborating with organizations that compliment their
skills and core activities. The current trend is to develop temporary alliances of
independent enterprises, in which companies can come together to share skills, core
competencies and resources. However, knowledge sharing and communication
among multidiscipline companies is a complex and challenging problem. In a
collaborative environment, the meaning of knowledge is drastically affected by the
context in which it is viewed and interpreted; thus necessitating the treatment of
structure as well as semantics of the data stored in enterprise repositories. Keeping
the present market and technological scenario in mind, this research aims to propose
tools and techniques that can enable companies to assimilate distributed information
resources and achieve their business goals
Enable Reliable and Secure Data Transmission in Resource-Constrained Emerging Networks
The increasing deployment of wireless devices has connected humans and objects all around the world, benefiting our daily life and the entire society in many aspects. Achieving those connectivity motivates the emergence of different types of paradigms, such as cellular networks, large-scale Internet of Things (IoT), cognitive networks, etc. Among these networks, enabling reliable and secure data transmission requires various resources including spectrum, energy, and computational capability. However, these resources are usually limited in many scenarios, especially when the number of devices is considerably large, bringing catastrophic consequences to data transmission. For example, given the fact that most of IoT devices have limited computational abilities and inadequate security protocols, data transmission is vulnerable to various attacks such as eavesdropping and replay attacks, for which traditional security approaches are unable to address. On the other hand, in the cellular network, the ever-increasing data traffic has exacerbated the depletion of spectrum along with the energy consumption. As a result, mobile users experience significant congestion and delays when they request data from the cellular service provider, especially in many crowded areas.
In this dissertation, we target on reliable and secure data transmission in resource-constrained emerging networks. The first two works investigate new security challenges in the current heterogeneous IoT environment, and then provide certain countermeasures for reliable data communication. To be specific, we identify a new physical-layer attack, the signal emulation attack, in the heterogeneous environment, such as smart home IoT. To defend against the attack, we propose two defense strategies with the help of a commonly found wireless device. In addition, to enable secure data transmission in large-scale IoT network, e.g., the industrial IoT, we apply the amply-and-forward cooperative communication to increase the secrecy capacity by incentivizing relay IoT devices. Besides security concerns in IoT network, we seek data traffic alleviation approaches to achieve reliable and energy-efficient data transmission for a group of users in the cellular network. The concept of mobile participation is introduced to assist data offloading from the base station to users in the group by leveraging the mobility of users and the social features among a group of users. Following with that, we deploy device-to-device data offloading within the group to achieve the energy efficiency at the user side while adapting to their increasing traffic demands. In the end, we consider a perpendicular topic - dynamic spectrum access (DSA) - to alleviate the spectrum scarcity issue in cognitive radio network, where the spectrum resource is limited to users. Specifically, we focus on the security concerns and further propose two physical-layer schemes to prevent spectrum misuse in DSA in both additive white Gaussian noise and fading environments
Prometheus: a generic e-commerce crawler for the study of business markets and other e-commerce problems
Dissertação de mestrado em Computer ScienceThe continuous social and economic development has led over time to an increase in consumption,
as well as greater demand from the consumer for better and cheaper products.
Hence, the selling price of a product assumes a fundamental role in the purchase decision
by the consumer. In this context, online stores must carefully analyse and define the best
price for each product, based on several factors such as production/acquisition cost, positioning
of the product (e.g. anchor product) and the competition companies strategy. The
work done by market analysts changed drastically over the last years.
As the number of Web sites increases exponentially, the number of E-commerce web
sites also prosperous. Web page classification becomes more important in fields like Web
mining and information retrieval. The traditional classifiers are usually hand-crafted and
non-adaptive, that makes them inappropriate to use in a broader context. We introduce an
ensemble of methods and the posterior study of its results to create a more generic and
modular crawler and scraper for detection and information extraction on E-commerce web
pages. The collected information may then be processed and used in the pricing decision.
This framework goes by the name Prometheus and has the goal of extracting knowledge
from E-commerce Web sites.
The process requires crawling an online store and gathering product pages. This implies
that given a web page the framework must be able to determine if it is a product page.
In order to achieve this we classify the pages in three categories: catalogue, product and
”spam”. The page classification stage was addressed based on the html text as well as on
the visual layout, featuring both traditional methods and Deep Learning approaches.
Once a set of product pages has been identified we proceed to the extraction of the pricing
information. This is not a trivial task due to the disparity of approaches to create a web
page. Furthermore, most product pages are dynamic in the sense that they are truly a page
for a family of related products. For instance, when visiting a shoe store, for a particular
model there are probably a number of sizes and colours available. Such a model may be
displayed in a single dynamic web page making it necessary for our framework to explore
all the relevant combinations. This process is called scraping and is the last stage of the
Prometheus framework.O contínuo desenvolvimento social e económico tem conduzido ao longo do tempo a um
aumento do consumo, assim como a uma maior exigência do consumidor por produtos
melhores e mais baratos. Naturalmente, o preço de venda de um produto assume um papel
fundamental na decisão de compra por parte de um consumidor. Nesse sentido, as lojas
online precisam de analisar e definir qual o melhor preço para cada produto, tendo como
base diversos fatores, tais como o custo de produção/venda, posicionamento do produto
(e.g. produto âncora) e as próprias estratégias das empresas concorrentes. O trabalho dos
analistas de mercado mudou drasticamente nos últimos anos.
O crescimento de sites na Web tem sido exponencial, o número de sites E-commerce
também tem prosperado. A classificação de páginas da Web torna-se cada vez mais importante,
especialmente em campos como mineração de dados na Web e coleta/extração
de informações. Os classificadores tradicionais são geralmente feitos manualmente e não
adaptativos, o que os torna inadequados num contexto mais amplo. Nós introduzimos
um conjunto de métodos e o estudo posterior dos seus resultados para criar um crawler
e scraper mais genéricos e modulares para extração de conhecimento em páginas de Ecommerce.
A informação recolhida pode então ser processada e utilizada na tomada de
decisão sobre o preço de venda. Esta Framework chama-se Prometheus e tem como intuito
extrair conhecimento de Web sites de E-commerce.
Este processo necessita realizar a navegação sobre lojas online e armazenar páginas de
produto. Isto implica que dado uma página web a framework seja capaz de determinar
se é uma página de produto. Para atingir este objetivo nós classificamos as páginas em
três categorias: catálogo, produto e spam. A classificação das páginas foi realizada tendo
em conta o html e o aspeto visual das páginas, utilizando tanto métodos tradicionais como
Deep Learning.
Depois de identificar um conjunto de páginas de produto procedemos à extração de
informação sobre o preço. Este processo não é trivial devido à quantidade de abordagens
possíveis para criar uma página web. A maioria dos produtos são dinâmicos no sentido
em que um produto é na realidade uma família de produtos relacionados. Por exemplo,
quando visitamos uma loja online de sapatos, para um modelo em especifico existe
a provavelmente um conjunto de tamanhos e cores disponíveis. Esse modelo pode ser
apresentado numa única página dinâmica fazendo com que seja necessário para a nossa
Framework explorar estas combinações relevantes. Este processo é chamado de scraping e
é o último passo da Framework Prometheus
- …