275 research outputs found

    A Survey on Behavioral Pattern Mining from Sensor Data in Internet of Things

    Get PDF
    The deployment of large-scale wireless sensor networks (WSNs) for the Internet of Things (IoT) applications is increasing day-by-day, especially with the emergence of smart city services. The sensor data streams generated from these applications are largely dynamic, heterogeneous, and often geographically distributed over large areas. For high-value use in business, industry and services, these data streams must be mined to extract insightful knowledge, such as about monitoring (e.g., discovering certain behaviors over a deployed area) or network diagnostics (e.g., predicting faulty sensor nodes). However, due to the inherent constraints of sensor networks and application requirements, traditional data mining techniques cannot be directly used to mine IoT data streams efficiently and accurately in real-time. In the last decade, a number of works have been reported in the literature proposing behavioral pattern mining algorithms for sensor networks. This paper presents the technical challenges that need to be considered for mining sensor data. It then provides a thorough review of the mining techniques proposed in the recent literature to mine behavioral patterns from sensor data in IoT, and their characteristics and differences are highlighted and compared. We also propose a behavioral pattern mining framework for IoT and discuss possible future research directions in this area. © 2013 IEEE

    Recommender Systems for Grocery Retail - A Machine Learning Approach

    Get PDF
    Recommender systems are present in our daily activities in different moments, such as when choosing a song to listen to or when doing online shopping. It is an everyday reality for people to have the help of computer systems in order to simplify regular decision activities. Grocery shopping is an essential part of people’s life and a frequent activity. Despite being a common habit, each customer has unique routines, needs and preferences regarding products and brands. This information is valuable for grocery retailers to know their customers better and to improve their marketing and operational activities. This dissertation aims to apply machine learning algorithms to the development of a recommender system capable of preparing personalized grocery shopping lists. The proposed architecture is designed to allow integration with different grocery retailers and support distinct TensorFlow algorithms. The process of extracting information from the dataset as features was explored, as well as the tuning of the model hyperparameters, to obtain better results. The recommendation engine is exposed via a distributed software architecture designed to allow retailers to integrate the recommender system with different existing solutions (e.g., websites or mobile applications). A case study to validate the implemented solution was performed, integrating it with a public dataset provided by Instacart. A comparison study between different machine learning algorithms over the adopted dataset has lead to the choice of the gradient boosted trees algorithm. The solution developed in the case study was compared against two non-machine learning approaches at predicting the last purchase of 360 arbitrary test customers. A pattern miningbased solution and a SQL-based heuristic were used. Different evaluation metrics (namely, the average accuracy, precision, recall, and f1-score) were registered. The way association rules with different strengths were reflected in the predictions of the developed solution was also analyzed. The gradient boosted trees-based implementation from the case study was capable of outperforming the compared solutions as far as evaluation metrics are concerned, and has shown a higher capability of predicting at least one correct item per customer. Also, it became evident that the strictest association rules were frequently found in the recommendations. The adopted solution and algorithm have shown promising results and a remarkable capability to provide meaningful predictions to the different customers, evidencing its capability to add value to grocery retail. Nevertheless, there is still potential for further expansion.Os sistemas de recomendação estão presentes no nosso quotidiano, em momentos como a escolha da música a ouvir ou a preparação de compras online. Estamos acostumados a contar com a ajuda de sistemas computacionais para simplificar tarefas habituais que envolvem decisões. Realizar compras de retalho alimentar é uma parte importante e frequente da nossa vida. Apesar de ser um hábito comum, cada um de nós tem as suas próprias rotinas, necessidades e preferências no que toca a produtos e marcas. Esta informação é valiosa para que os retalhistas alimentares consigam conhecer melhor os seus clientes e melhorar atividades operacionais e de marketing. Esta dissertação tem como objetivo a aplicação de algoritmos de machine learning na criação de um sistema de recomendação capaz de preparar listas de compras personalizadas. A arquitetura proposta é desenhada com o objetivo de permitir a integração com diferentes retalhistas e a utilização de diferentes algoritmos em TensorFlow. O processo de extração de informação na forma de features foi explorado, tal como a afinação dos hiperparâmetros do modelo, para obter melhores resultados. O motor de recomendações é exposto através de uma arquitetura de software distribuída, com o propósito de permitir que os retalhistas alimentares possam integrar este sistema com diferentes soluções existentes (e.g., websites ou aplicações móveis). Foi realizado um caso de estudo para validar a solução implementada, através da integração da solução com os dados públicos disponibilizados pelo retalhista Instacart. Uma comparação entre a aplicação de diferentes algoritmos de machine learning aos dados utilizados, levou à adoção do algoritmo gradient boosted trees. A solução desenvolvida no caso de estudo foi comparada com duas abordagens não baseadas em machine learning para a previsão da última compra de 360 clientes arbitrários. Foi usada uma abordagem baseada em pattern mining e uma abordagem baseada em SQL. Diferentes métricas de avaliação (nomeadamente accuracy, precision, recall e f1-score médios) foram registadas. Foi também analisada a forma como diferentes regras de associação se encontraram refletidas nas recomendações da solução desenvolvida. A implementação baseada em gradient boosted trees do caso de estudo superou as soluções com as quais foi comparada quanto às métricas de avaliação, e mostrou uma maior capacidade de recomendar pelo menos um produto correto por cliente. Verificou-se também que as regras de associação mais fortes estão frequentemente refletidas nas recomendações. A abordagem adotada e o algoritmo aprofundado mostraram resultados promissores e uma capacidade notável de fornecer recomendações úteis aos diferentes clientes, evidenciando a sua aptidão para adicionar valor ao retalho alimentar. Ainda assim, este sistema apresenta um elevado potencial para expansão

    Trade marketing analytics in consumer goods industry

    Get PDF
    Project Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Information Systems and Technologies ManagementWe address transparency of trade spends in consumer goods industry and propose a set of business performance indicators that follow Pareto (80/20) rule – a popular concept in optimization problem solving. Discovery of power laws in behaviors of travelling sales persons, buying patterns of customers, popularity of products, and market demand fluctuations – all that leads to better-informed decisions among all those involved into planning, execution, and post-promotion evaluation. Practical result of our work is a prototype implementation of proposed measures. The most remarkable finding – consistency of travelling sales person journey between customer locations. Loyalty to brand, or brand market power – whatever forces field sales representatives to put at least one product of market player of interest into nearly every market basket – fits into small world model. This behavior not only changes from person to person, but also remains the same after reassignment into different territory. For industrialization stage of this project, we outline key design considerations for information system capable of handling real-time workload scalable to petabytes. We built our analyses for collaborative processes of integrated planning that requires joint effort of multidisciplinary team. Field tests demonstrate how insights from data can trigger business transformation. That is why we end up with recommendation for system integrators to include Knowledge Discovery into information system deployment projects

    IoT trust and reputation: a survey and taxonomy

    Full text link
    IoT is one of the fastest-growing technologies and it is estimated that more than a billion devices would be utilized across the globe by the end of 2030. To maximize the capability of these connected entities, trust and reputation among IoT entities is essential. Several trust management models have been proposed in the IoT environment; however, these schemes have not fully addressed the IoT devices features, such as devices role, device type and its dynamic behavior in a smart environment. As a result, traditional trust and reputation models are insufficient to tackle these characteristics and uncertainty risks while connecting nodes to the network. Whilst continuous study has been carried out and various articles suggest promising solutions in constrained environments, research on trust and reputation is still at its infancy. In this paper, we carry out a comprehensive literature review on state-of-the-art research on the trust and reputation of IoT devices and systems. Specifically, we first propose a new structure, namely a new taxonomy, to organize the trust and reputation models based on the ways trust is managed. The proposed taxonomy comprises of traditional trust management-based systems and artificial intelligence-based systems, and combine both the classes which encourage the existing schemes to adapt these emerging concepts. This collaboration between the conventional mathematical and the advanced ML models result in design schemes that are more robust and efficient. Then we drill down to compare and analyse the methods and applications of these systems based on community-accepted performance metrics, e.g. scalability, delay, cooperativeness and efficiency. Finally, built upon the findings of the analysis, we identify and discuss open research issues and challenges, and further speculate and point out future research directions.Comment: 20 pages, 5 Figures, 3 tables, Journal of cloud computin

    IoT trust and reputation: a survey and taxonomy

    Get PDF
    IoT is one of the fastest-growing technologies and it is estimated that more than a billion devices would be utilized across the globe by the end of 2030. To maximize the capability of these connected entities, trust and reputation among IoT entities is essential. Several trust management models have been proposed in the IoT environment; however, these schemes have not fully addressed the IoT devices features, such as devices role, device type and its dynamic behavior in a smart environment. As a result, traditional trust and reputation models are insufficient to tackle these characteristics and uncertainty risks while connecting nodes to the network. Whilst continuous study has been carried out and various articles suggest promising solutions in constrained environments, research on trust and reputation is still at its infancy. In this paper, we carry out a comprehensive literature review on state-of-the-art research on the trust and reputation of IoT devices and systems. Specifically, we first propose a new structure, namely a new taxonomy, to organize the trust and reputation models based on the ways trust is managed. The proposed taxonomy comprises of traditional trust management-based systems and artificial intelligence-based systems, and combine both the classes which encourage the existing schemes to adapt these emerging concepts. This collaboration between the conventional mathematical and the advanced ML models result in design schemes that are more robust and efficient. Then we drill down to compare and analyse the methods and applications of these systems based on community-accepted performance metrics, e.g. scalability, delay, cooperativeness and efficiency. Finally, built upon the findings of the analysis, we identify and discuss open research issues and challenges, and further speculate and point out future research directions.Comment: 20 pages, 5 Figures, 3 tables, Journal of cloud computin

    Analysis and evaluation of Wi-Fi indoor positioning systems using smartphones

    Get PDF
    This paper attempts to analyze the main algorithms used in Machine Learning applied to the indoor location. New technologies are facing new challenges. Satellite positioning has become a typical application of mobile phones, but stops working satisfactorily in enclosed spaces. Currently there is a problem in positioning which is unresolved. This circumstance motivates the research of new methods. After the introduction, the first chapter presents current methods of positioning and the problem of positioning indoors. This part of the work shows globally the current state of the art. It mentions a taxonomy that helps classify the different types of indoor positioning and a selection of current commercial solutions. The second chapter is more focused on the algorithms that will be analyzed. It explains how the most widely used of Machine Learning algorithms work. The aim of this section is to present mathematical algorithms theoretically. These algorithms were not designed for indoor location but can be used for countless solutions. In the third chapter, we learn gives tools work: Weka and Python. the results obtained after thousands of executions with different algorithms and parameters showing main problems of Machine Learning shown. In the fourth chapter the results are collected and the conclusions drawn are shown

    Frequent itemset mining on multiprocessor systems

    Get PDF
    Frequent itemset mining is an important building block in many data mining applications like market basket analysis, recommendation, web-mining, fraud detection, and gene expression analysis. In many of them, the datasets being mined can easily grow up to hundreds of gigabytes or even terabytes of data. Hence, efficient algorithms are required to process such large amounts of data. In recent years, there have been many frequent-itemset mining algorithms proposed, which however (1) often have high memory requirements and (2) do not exploit the large degrees of parallelism provided by modern multiprocessor systems. The high memory requirements arise mainly from inefficient data structures that have only been shown to be sufficient for small datasets. For large datasets, however, the use of these data structures force the algorithms to go out-of-core, i.e., they have to access secondary memory, which leads to serious performance degradations. Exploiting available parallelism is further required to mine large datasets because the serial performance of processors almost stopped increasing. Algorithms should therefore exploit the large number of available threads and also the other kinds of parallelism (e.g., vector instruction sets) besides thread-level parallelism. In this work, we tackle the high memory requirements of frequent itemset mining twofold: we (1) compress the datasets being mined because they must be kept in main memory during several mining invocations and (2) improve existing mining algorithms with memory-efficient data structures. For compressing the datasets, we employ efficient encodings that show a good compression performance on a wide variety of realistic datasets, i.e., the size of the datasets is reduced by up to 6.4x. The encodings can further be applied directly while loading the dataset from disk or network. Since encoding and decoding is repeatedly required for loading and mining the datasets, we reduce its costs by providing parallel encodings that achieve high throughputs for both tasks. For a memory-efficient representation of the mining algorithms’ intermediate data, we propose compact data structures and even employ explicit compression. Both methods together reduce the intermediate data’s size by up to 25x. The smaller memory requirements avoid or delay expensive out-of-core computation when large datasets are mined. For coping with the high parallelism provided by current multiprocessor systems, we identify the performance hot spots and scalability issues of existing frequent-itemset mining algorithms. The hot spots, which form basic building blocks of these algorithms, cover (1) counting the frequency of fixed-length strings, (2) building prefix trees, (3) compressing integer values, and (4) intersecting lists of sorted integer values or bitmaps. For all of them, we discuss how to exploit available parallelism and provide scalable solutions. Furthermore, almost all components of the mining algorithms must be parallelized to keep the sequential fraction of the algorithms as small as possible. We integrate the parallelized building blocks and components into three well-known mining algorithms and further analyze the impact of certain existing optimizations. Our algorithms are already single-threaded often up an order of magnitude faster than existing highly optimized algorithms and further scale almost linear on a large 32-core multiprocessor system. Although our optimizations are intended for frequent-itemset mining algorithms, they can be applied with only minor changes to algorithms that are used for mining of other types of itemsets
    corecore