35 research outputs found

    New Approach for Market Intelligence Using Artificial and Computational Intelligence

    Get PDF
    Small and medium sized retailers are central to the private sector and a vital contributor to economic growth, but often they face enormous challenges in unleashing their full potential. Financial pitfalls, lack of adequate access to markets, and difficulties in exploiting technology have prevented them from achieving optimal productivity. Market Intelligence (MI) is the knowledge extracted from numerous internal and external data sources, aimed at providing a holistic view of the state of the market and influence marketing related decision-making processes in real-time. A related, burgeoning phenomenon and crucial topic in the field of marketing is Artificial Intelligence (AI) that entails fundamental changes to the skillssets marketers require. A vast amount of knowledge is stored in retailers’ point-of-sales databases. The format of this data often makes the knowledge they store hard to access and identify. As a powerful AI technique, Association Rules Mining helps to identify frequently associated patterns stored in large databases to predict customers’ shopping journeys. Consequently, the method has emerged as the key driver of cross-selling and upselling in the retail industry. At the core of this approach is the Market Basket Analysis that captures knowledge from heterogeneous customer shopping patterns and examines the effects of marketing initiatives. Apriori, that enumerates frequent itemsets purchased together (as market baskets), is the central algorithm in the analysis process. Problems occur, as Apriori lacks computational speed and has weaknesses in providing intelligent decision support. With the growth of simultaneous database scans, the computation cost increases and results in dramatically decreasing performance. Moreover, there are shortages in decision support, especially in the methods of finding rarely occurring events and identifying the brand trending popularity before it peaks. As the objective of this research is to find intelligent ways to assist small and medium sized retailers grow with MI strategy, we demonstrate the effects of AI, with algorithms in data preprocessing, market segmentation, and finding market trends. We show with a sales database of a small, local retailer how our Åbo algorithm increases mining performance and intelligence, as well as how it helps to extract valuable marketing insights to assess demand dynamics and product popularity trends. We also show how this results in commercial advantage and tangible return on investment. Additionally, an enhanced normal distribution method assists data pre-processing and helps to explore different types of potential anomalies.Små och medelstora detaljhandlare är centrala aktörer i den privata sektorn och bidrar starkt till den ekonomiska tillväxten, men de möter ofta enorma utmaningar i att uppnå sin fulla potential. Finansiella svårigheter, brist på marknadstillträde och svårigheter att utnyttja teknologi har ofta hindrat dem från att nå optimal produktivitet. Marknadsintelligens (MI) består av kunskap som samlats in från olika interna externa källor av data och som syftar till att erbjuda en helhetssyn av marknadsläget samt möjliggöra beslutsfattande i realtid. Ett relaterat och växande fenomen, samt ett viktigt tema inom marknadsföring är artificiell intelligens (AI) som ställer nya krav på marknadsförarnas färdigheter. Enorma mängder kunskap finns sparade i databaser av transaktioner samlade från detaljhandlarnas försäljningsplatser. Ändå är formatet på dessa data ofta sådant att det inte är lätt att tillgå och utnyttja kunskapen. Som AI-verktyg erbjuder affinitetsanalys en effektiv teknik för att identifiera upprepade mönster som statistiska associationer i data lagrade i stora försäljningsdatabaser. De hittade mönstren kan sedan utnyttjas som regler som förutser kundernas köpbeteende. I detaljhandel har affinitetsanalys blivit en nyckelfaktor bakom kors- och uppförsäljning. Som den centrala metoden i denna process fungerar marknadskorgsanalys som fångar upp kunskap från de heterogena köpbeteendena i data och hjälper till att utreda hur effektiva marknadsföringsplaner är. Apriori, som räknar upp de vanligt förekommande produktkombinationerna som köps tillsammans (marknadskorgen), är den centrala algoritmen i analysprocessen. Trots detta har Apriori brister som algoritm gällande låg beräkningshastighet och svag intelligens. När antalet parallella databassökningar stiger, ökar också beräkningskostnaden, vilket har negativa effekter på prestanda. Dessutom finns det brister i beslutstödet, speciellt gällande metoder att hitta sällan förekommande produktkombinationer, och i att identifiera ökande popularitet av varumärken från trenddata och utnyttja det innan det når sin höjdpunkt. Eftersom målet för denna forskning är att hjälpa små och medelstora detaljhandlare att växa med hjälp av MI-strategier, demonstreras effekter av AI med hjälp av algoritmer i förberedelsen av data, marknadssegmentering och trendanalys. Med hjälp av försäljningsdata från en liten, lokal detaljhandlare visar vi hur Åbo-algoritmen ökar prestanda och intelligens i datautvinningsprocessen och hjälper till att avslöja värdefulla insikter för marknadsföring, framför allt gällande dynamiken i efterfrågan och trender i populariteten av produkterna. Ytterligare visas hur detta resulterar i kommersiella fördelar och konkret avkastning på investering. Dessutom hjälper den utvidgade normalfördelningsmetoden i förberedelsen av data och med att hitta olika slags anomalier

    Frequent itemset mining on multiprocessor systems

    Get PDF
    Frequent itemset mining is an important building block in many data mining applications like market basket analysis, recommendation, web-mining, fraud detection, and gene expression analysis. In many of them, the datasets being mined can easily grow up to hundreds of gigabytes or even terabytes of data. Hence, efficient algorithms are required to process such large amounts of data. In recent years, there have been many frequent-itemset mining algorithms proposed, which however (1) often have high memory requirements and (2) do not exploit the large degrees of parallelism provided by modern multiprocessor systems. The high memory requirements arise mainly from inefficient data structures that have only been shown to be sufficient for small datasets. For large datasets, however, the use of these data structures force the algorithms to go out-of-core, i.e., they have to access secondary memory, which leads to serious performance degradations. Exploiting available parallelism is further required to mine large datasets because the serial performance of processors almost stopped increasing. Algorithms should therefore exploit the large number of available threads and also the other kinds of parallelism (e.g., vector instruction sets) besides thread-level parallelism. In this work, we tackle the high memory requirements of frequent itemset mining twofold: we (1) compress the datasets being mined because they must be kept in main memory during several mining invocations and (2) improve existing mining algorithms with memory-efficient data structures. For compressing the datasets, we employ efficient encodings that show a good compression performance on a wide variety of realistic datasets, i.e., the size of the datasets is reduced by up to 6.4x. The encodings can further be applied directly while loading the dataset from disk or network. Since encoding and decoding is repeatedly required for loading and mining the datasets, we reduce its costs by providing parallel encodings that achieve high throughputs for both tasks. For a memory-efficient representation of the mining algorithms’ intermediate data, we propose compact data structures and even employ explicit compression. Both methods together reduce the intermediate data’s size by up to 25x. The smaller memory requirements avoid or delay expensive out-of-core computation when large datasets are mined. For coping with the high parallelism provided by current multiprocessor systems, we identify the performance hot spots and scalability issues of existing frequent-itemset mining algorithms. The hot spots, which form basic building blocks of these algorithms, cover (1) counting the frequency of fixed-length strings, (2) building prefix trees, (3) compressing integer values, and (4) intersecting lists of sorted integer values or bitmaps. For all of them, we discuss how to exploit available parallelism and provide scalable solutions. Furthermore, almost all components of the mining algorithms must be parallelized to keep the sequential fraction of the algorithms as small as possible. We integrate the parallelized building blocks and components into three well-known mining algorithms and further analyze the impact of certain existing optimizations. Our algorithms are already single-threaded often up an order of magnitude faster than existing highly optimized algorithms and further scale almost linear on a large 32-core multiprocessor system. Although our optimizations are intended for frequent-itemset mining algorithms, they can be applied with only minor changes to algorithms that are used for mining of other types of itemsets

    Annales Mathematicae et Informaticae 2020

    Get PDF

    Uncertainty Management of Intelligent Feature Selection in Wireless Sensor Networks

    Get PDF
    Wireless sensor networks (WSN) are envisioned to revolutionize the paradigm of monitoring complex real-world systems at a very high resolution. However, the deployment of a large number of unattended sensor nodes in hostile environments, frequent changes of environment dynamics, and severe resource constraints pose uncertainties and limit the potential use of WSN in complex real-world applications. Although uncertainty management in Artificial Intelligence (AI) is well developed and well investigated, its implications in wireless sensor environments are inadequately addressed. This dissertation addresses uncertainty management issues of spatio-temporal patterns generated from sensor data. It provides a framework for characterizing spatio-temporal pattern in WSN. Using rough set theory and temporal reasoning a novel formalism has been developed to characterize and quantify the uncertainties in predicting spatio-temporal patterns from sensor data. This research also uncovers the trade-off among the uncertainty measures, which can be used to develop a multi-objective optimization model for real-time decision making in sensor data aggregation and samplin

    On Privacy-Enhanced Distributed Analytics in Online Social Networks

    Get PDF
    More than half of the world's population benefits from online social network (OSN) services. A considerable part of these services is mainly based on applying analytics on user data to infer their preferences and enrich their experience accordingly. At the same time, user data is monetized by service providers to run their business models. Therefore, providers tend to extensively collect (personal) data about users. However, this data is oftentimes used for various purposes without informed consent of the users. Providers share this data in different forms with third parties (e.g., data brokers). Moreover, user sensitive data was repeatedly a subject of unauthorized access by malicious parties. These issues have demonstrated the insufficient commitment of providers to user privacy, and consequently, raised users' concerns. Despite the emergence of privacy regulations (e.g., GDPR and CCPA), recent studies showed that user personal data collection and sharing sensitive data are still continuously increasing. A number of privacy-friendly OSNs have been proposed to enhance user privacy by reducing the need for central service providers. However, this improvement in privacy protection usually comes at the cost of losing social connectivity and many analytics-based services of the wide-spread OSNs. This dissertation addresses this issue by first proposing an approach to privacy-friendly OSNs that maintains established social connections. Second, approaches that allow users to collaboratively apply distributed analytics while preserving their privacy are presented. Finally, the dissertation contributes to better assessment and mitigation of the risks associated with distributed analytics. These three research directions are treated through the following six contributions. Conceptualizing Hybrid Online Social Networks: We conceptualize a hybrid approach to privacy-friendly OSNs, HOSN. This approach combines the benefits of using COSNs and DOSN. Users can maintain their social experience in their preferred COSN while being provided with additional means to enhance their privacy. Users can seamlessly post public content or private content that is accessible only by authorized users (friends) beyond the reach of the service providers. Improving the Trustworthiness of HOSNs: We conceptualize software features to address users' privacy concerns in OSNs. We prototype these features in our HOSN}approach and evaluate their impact on the privacy concerns and the trustworthiness of the approach. Also, we analyze the relationships between four important aspects that influence users' behavior in OSNs: privacy concerns, trust beliefs, risk beliefs, and the willingness to use. Privacy-Enhanced Association Rule Mining: We present an approach to enable users to apply efficiently privacy-enhanced association rule mining on distributed data. This approach can be employed in DOSN and HOSN to generate recommendations. We leverage a privacy-enhanced distributed graph sampling method to reduce the data required for the mining and lower the communication and computational overhead. Then, we apply a distributed frequent itemset mining algorithm in a privacy-friendly manner. Privacy Enhancements on Federated Learning (FL): We identify several privacy-related issues in the emerging distributed machine learning technique, FL. These issues are mainly due to the centralized nature of this technique. We discuss tackling these issues by applying FL in a hierarchical architecture. The benefits of this approach include a reduction in the centralization of control and the ability to place defense and verification methods more flexibly and efficiently within the hierarchy. Systematic Analysis of Threats in Federated Learning: We conduct a critical study of the existing attacks in FL to better understand the actual risk of these attacks under real-world scenarios. First, we structure the literature in this field and show the research foci and gaps. Then, we highlight a number of issues in (1) the assumptions commonly made by researchers and (2) the evaluation practices. Finally, we discuss the implications of these issues on the applicability of the proposed attacks and recommend several remedies. Label Leakage from Gradients: We identify a risk of information leakage when sharing gradients in FL. We demonstrate the severity of this risk by proposing a novel attack that extracts the user annotations that describe the data (i.e., ground-truth labels) from gradients. We show the high effectiveness of the attack under different settings such as different datasets and model architectures. We also test several defense mechanisms to mitigate this attack and conclude the effective ones

    Deklarative Verarbeitung von Datenströmen in Sensornetzwerken

    Get PDF
    Sensors can now be found in many facets of every day life, and are used to capture and transfer both physical and chemical characteristics into digitally analyzable data. Wireless sensor networks play a central role in the proliferation of the industrial employment of wide-range, primarily autonomous surveillance of regions or buildings. The development of suitable systems involves a number of challenges. Current solutions are often designed with a specific task in mind, rendering them unsuitable for use in other environments. Suitable solutions for distributed systems are therefore continuously built from scratch on both the hardware and software levels, more often than not resulting in products in the market's higher price segments. Users would therefore profit from the reuse of existing modules in both areas of development. Once prefabricated solutions are available, the remaining challenge is to find a suitable combination of these solutions which fulfills the user's specifications. However, the development of suitable solutions often requires expert knowledge, especially in the case of wireless sensor networks in which resources are limited. The primary focus of this dissertation is energy-efficient data analysis in sensor networks. The AnduIN system, which is outlined in this dissertation, plays a central role in this task by reducing the software design phase to the mere formulation of the solution's specifications in a declarative query language. The system then reaches the user's defined goals in a fully automated fashion. Thus, the user is integrated into the design process only through the original definition of desired characteristics. The continuous surveillance of objects using wireless sensor networks depends strongly on a plethora of parameters. Experience has shown that energy consumption is one of the major weaknesses of wireless data transfer. One strategy for the reduction of energy consumption is to reduce the communication overhead by implementing an early analysis of measurement data on the sensor nodes. Often, it is neither possible nor practical to perform the complete data analysis of complex algorithms within the sensor network. In this case, portions of the analysis must be performed on a central computing unit. The AnduIN system integrates both simple methods as well as complex methods which are evaluated only partially in network. The system autonomously resolves which application fragments are executed on which components based on a multi-dimensional cost model. This work also includes various novel methods for the analysis of sensor data, such as methods for evaluating spatial data, data cleaning using burst detection, and the identification of frequent patters using quantitative item sets.Sensoren finden sich heutzutage in vielen Teilen des täglichen Lebens. Sie dienen dabei der Erfassung und Überführung von physikalischen oder chemischen Eigenschaften in digital auswertbare Größen. Drahtlose Sensornetzwerke als Mittel zur großflächigen, weitestgehend autarken Überwachung von Regionen oder Gebäuden sind Teil dieser Brücke und halten immer stärker Einzug in den industriellen Einsatz. die Entwicklung von geeigneten Systemen ist mit einer Vielzahl von Herausforderungen verbunden. Aktuelle Lösungen werden oftmals gezielt für eine spezielle Aufgabe entworfen, welche sich nur bedingt für den Einsatz in anderen Umgebungen eignen. Die sich wiederholende Neuentwicklung entsprechender verteilter Systeme sowohl auf Hardwareebene als auch auf Softwareebene, zählt zu den wesentlichen Gründen, weshalb entsprechende Lösungen sich zumeist im hochpreisigen Segment einordnen. In beiden Entwicklungsbereichen ist daher die Wiederverwendung existierender Module im Interesse des Anwenders. Stehen entsprechende vorgefertigte Lösungen bereit, besteht weiterhin die Aufgabe, diese in geeigneter Form zu kombinieren, so dass den vom Anwender geforderten Zielen in allen Bereichen genügt wird. Insbesondere im Kontext drahtloser Sensornetzwerke, bei welchen mit stark beschränkten Ressourcen umgegangen werden muss, ist für das Erzeugen passender Lösungen oftmals Expertenwissen von Nöten. Im Mittelpunkt der vorliegenden Arbeit steht die energie-effiziente Datenanalyse in drahtlosen Sensornetzwerken. Hierzu wird mit \AnduIN ein System präsentiert, welches den Entwurf auf Softwareebene dahingehend vereinfachen soll, dass der Anwender lediglich die Aufgabenstellung unter Verwendung einer deklarativen Anfragesprache beschreibt. Wie das vom Anwender definierte Ziel erreicht wird, soll vollautomatisch vom System bestimmt werden. Der Nutzer wird lediglich über die Definition gewünschter Eigenschaften in den Entwicklungsprozess integriert. Die dauerhafte Überwachung von Objekten mittels drahtloser Sensornetzwerke hängt von einer Vielzahl von Parametern ab. Es hat sich gezeigt, dass insbesondere der Energieverbrauch bei der drahtlosen Datenübertragung eine der wesentlichen Schwachstellen ist. Ein möglicher Ansatz zur Reduktion des Energiekonsums ist die Verringerung des Kommunikationsaufwands aufgrund einer frühzeitigen Auswertung von Messergebnissen bereits auf den Sensorknoten. Oftmals ist eine vollständige Verarbeitung von komplexen Algorithmen im Sensornetzwerk aber nicht möglich bzw. nicht sinnvoll. Teile der Verarbeitungslogik müssen daher auf einer zentralen Instanz ausgeführt werden. Das in der Arbeit entwickelte System integriert hierzu sowohl einfache als auch komplexe, nur teilweise im Sensornetzwerk verarbeitbare Verfahren. Die Entscheidung, welche Teile einer Applikation auf welcher Komponente ausgeführt werden, wird vom System selbstständig auf Basis eines mehrdimensionalen Kostenmodells gefällt. Im Rahmen der Arbeit werden weiterhin verschiedene Verfahren entwickelt, welche insbesondere im Zusammenhang mit der Analyse von Sensordaten von Interesse sind. Die erweiterten Algorithmen umfassen Methoden zur Auswertung von Daten mit räumlichem Bezug, das Data Cleaning mittels adaptiver Burst-Erkennung und die Identifikation von häufigen Mustern über quantitativen Itemsets