8,130 research outputs found

    An interactive human centered data science approach towards crime pattern analysis

    Get PDF
    The traditional machine learning systems lack a pathway for a human to integrate their domain knowledge into the underlying machine learning algorithms. The utilization of such systems, for domains where decisions can have serious consequences (e.g. medical decision-making and crime analysis), requires the incorporation of human experts' domain knowledge. The challenge, however, is how to effectively incorporate domain expert knowledge with machine learning algorithms to develop effective models for better decision making. In crime analysis, the key challenge is to identify plausible linkages in unstructured crime reports for the hypothesis formulation. Crime analysts painstakingly perform time-consuming searches of many different structured and unstructured databases to collate these associations without any proper visualization. To tackle these challenges and aiming towards facilitating the crime analysis, in this paper, we examine unstructured crime reports through text mining to extract plausible associations. Specifically, we present associative questioning based searching model to elicit multi-level associations among crime entities. We coupled this model with partition clustering to develop an interactive, human-assisted knowledge discovery and data mining scheme. The proposed human-centered knowledge discovery and data mining scheme for crime text mining is able to extract plausible associations between crimes, identifying crime pattern, grouping similar crimes, eliciting co-offender network and suspect list based on spatial-temporal and behavioral similarity. These similarities are quantified through calculating Cosine, Jacquard, and Euclidean distances. Additionally, each suspect is also ranked by a similarity score in the plausible suspect list. These associations are then visualized through creating a two-dimensional re-configurable crime cluster space along with a bipartite knowledge graph. This proposed scheme also inspects the grand challenge of integrating effective human interaction with the machine learning algorithms through a visualization feedback loop. It allows the analyst to feed his/her domain knowledge including choosing of similarity functions for identifying associations, dynamic feature selection for interactive clustering of crimes and assigning weights to each component of the crime pattern to rank suspects for an unsolved crime. We demonstrate the proposed scheme through a case study using the Anonymized burglary dataset. The scheme is found to facilitate human reasoning and analytic discourse for intelligence analysis

    Role of Semantic web in the changing context of Enterprise Collaboration

    Get PDF
    In order to compete with the global giants, enterprises are concentrating on their core competencies and collaborating with organizations that compliment their skills and core activities. The current trend is to develop temporary alliances of independent enterprises, in which companies can come together to share skills, core competencies and resources. However, knowledge sharing and communication among multidiscipline companies is a complex and challenging problem. In a collaborative environment, the meaning of knowledge is drastically affected by the context in which it is viewed and interpreted; thus necessitating the treatment of structure as well as semantics of the data stored in enterprise repositories. Keeping the present market and technological scenario in mind, this research aims to propose tools and techniques that can enable companies to assimilate distributed information resources and achieve their business goals

    Automated subject classification of textual web documents

    Full text link

    Enable Reliable and Secure Data Transmission in Resource-Constrained Emerging Networks

    Get PDF
    The increasing deployment of wireless devices has connected humans and objects all around the world, benefiting our daily life and the entire society in many aspects. Achieving those connectivity motivates the emergence of different types of paradigms, such as cellular networks, large-scale Internet of Things (IoT), cognitive networks, etc. Among these networks, enabling reliable and secure data transmission requires various resources including spectrum, energy, and computational capability. However, these resources are usually limited in many scenarios, especially when the number of devices is considerably large, bringing catastrophic consequences to data transmission. For example, given the fact that most of IoT devices have limited computational abilities and inadequate security protocols, data transmission is vulnerable to various attacks such as eavesdropping and replay attacks, for which traditional security approaches are unable to address. On the other hand, in the cellular network, the ever-increasing data traffic has exacerbated the depletion of spectrum along with the energy consumption. As a result, mobile users experience significant congestion and delays when they request data from the cellular service provider, especially in many crowded areas. In this dissertation, we target on reliable and secure data transmission in resource-constrained emerging networks. The first two works investigate new security challenges in the current heterogeneous IoT environment, and then provide certain countermeasures for reliable data communication. To be specific, we identify a new physical-layer attack, the signal emulation attack, in the heterogeneous environment, such as smart home IoT. To defend against the attack, we propose two defense strategies with the help of a commonly found wireless device. In addition, to enable secure data transmission in large-scale IoT network, e.g., the industrial IoT, we apply the amply-and-forward cooperative communication to increase the secrecy capacity by incentivizing relay IoT devices. Besides security concerns in IoT network, we seek data traffic alleviation approaches to achieve reliable and energy-efficient data transmission for a group of users in the cellular network. The concept of mobile participation is introduced to assist data offloading from the base station to users in the group by leveraging the mobility of users and the social features among a group of users. Following with that, we deploy device-to-device data offloading within the group to achieve the energy efficiency at the user side while adapting to their increasing traffic demands. In the end, we consider a perpendicular topic - dynamic spectrum access (DSA) - to alleviate the spectrum scarcity issue in cognitive radio network, where the spectrum resource is limited to users. Specifically, we focus on the security concerns and further propose two physical-layer schemes to prevent spectrum misuse in DSA in both additive white Gaussian noise and fading environments

    Prometheus: a generic e-commerce crawler for the study of business markets and other e-commerce problems

    Get PDF
    Dissertação de mestrado em Computer ScienceThe continuous social and economic development has led over time to an increase in consumption, as well as greater demand from the consumer for better and cheaper products. Hence, the selling price of a product assumes a fundamental role in the purchase decision by the consumer. In this context, online stores must carefully analyse and define the best price for each product, based on several factors such as production/acquisition cost, positioning of the product (e.g. anchor product) and the competition companies strategy. The work done by market analysts changed drastically over the last years. As the number of Web sites increases exponentially, the number of E-commerce web sites also prosperous. Web page classification becomes more important in fields like Web mining and information retrieval. The traditional classifiers are usually hand-crafted and non-adaptive, that makes them inappropriate to use in a broader context. We introduce an ensemble of methods and the posterior study of its results to create a more generic and modular crawler and scraper for detection and information extraction on E-commerce web pages. The collected information may then be processed and used in the pricing decision. This framework goes by the name Prometheus and has the goal of extracting knowledge from E-commerce Web sites. The process requires crawling an online store and gathering product pages. This implies that given a web page the framework must be able to determine if it is a product page. In order to achieve this we classify the pages in three categories: catalogue, product and ”spam”. The page classification stage was addressed based on the html text as well as on the visual layout, featuring both traditional methods and Deep Learning approaches. Once a set of product pages has been identified we proceed to the extraction of the pricing information. This is not a trivial task due to the disparity of approaches to create a web page. Furthermore, most product pages are dynamic in the sense that they are truly a page for a family of related products. For instance, when visiting a shoe store, for a particular model there are probably a number of sizes and colours available. Such a model may be displayed in a single dynamic web page making it necessary for our framework to explore all the relevant combinations. This process is called scraping and is the last stage of the Prometheus framework.O contínuo desenvolvimento social e económico tem conduzido ao longo do tempo a um aumento do consumo, assim como a uma maior exigência do consumidor por produtos melhores e mais baratos. Naturalmente, o preço de venda de um produto assume um papel fundamental na decisão de compra por parte de um consumidor. Nesse sentido, as lojas online precisam de analisar e definir qual o melhor preço para cada produto, tendo como base diversos fatores, tais como o custo de produção/venda, posicionamento do produto (e.g. produto âncora) e as próprias estratégias das empresas concorrentes. O trabalho dos analistas de mercado mudou drasticamente nos últimos anos. O crescimento de sites na Web tem sido exponencial, o número de sites E-commerce também tem prosperado. A classificação de páginas da Web torna-se cada vez mais importante, especialmente em campos como mineração de dados na Web e coleta/extração de informações. Os classificadores tradicionais são geralmente feitos manualmente e não adaptativos, o que os torna inadequados num contexto mais amplo. Nós introduzimos um conjunto de métodos e o estudo posterior dos seus resultados para criar um crawler e scraper mais genéricos e modulares para extração de conhecimento em páginas de Ecommerce. A informação recolhida pode então ser processada e utilizada na tomada de decisão sobre o preço de venda. Esta Framework chama-se Prometheus e tem como intuito extrair conhecimento de Web sites de E-commerce. Este processo necessita realizar a navegação sobre lojas online e armazenar páginas de produto. Isto implica que dado uma página web a framework seja capaz de determinar se é uma página de produto. Para atingir este objetivo nós classificamos as páginas em três categorias: catálogo, produto e spam. A classificação das páginas foi realizada tendo em conta o html e o aspeto visual das páginas, utilizando tanto métodos tradicionais como Deep Learning. Depois de identificar um conjunto de páginas de produto procedemos à extração de informação sobre o preço. Este processo não é trivial devido à quantidade de abordagens possíveis para criar uma página web. A maioria dos produtos são dinâmicos no sentido em que um produto é na realidade uma família de produtos relacionados. Por exemplo, quando visitamos uma loja online de sapatos, para um modelo em especifico existe a provavelmente um conjunto de tamanhos e cores disponíveis. Esse modelo pode ser apresentado numa única página dinâmica fazendo com que seja necessário para a nossa Framework explorar estas combinações relevantes. Este processo é chamado de scraping e é o último passo da Framework Prometheus
    corecore