36 research outputs found

    RETAIL DATA ANALYTICS USING GRAPH DATABASE

    Get PDF
    Big data is an area focused on storing, processing and visualizing huge amount of data. Today data is growing faster than ever before. We need to find the right tools and applications and build an environment that can help us to obtain valuable insights from the data. Retail is one of the domains that collects huge amount of transaction data everyday. Retailers need to understand their customer’s purchasing pattern and behavior in order to take better business decisions. Market basket analysis is a field in data mining, that is focused on discovering patterns in retail’s transaction data. Our goal is to find tools and applications that can be used by retailers to quickly understand their data and take better business decisions. Due to the amount and complexity of data, it is not possible to do such activities manually. We witness that trends change very quickly and retailers want to be quick in adapting the change and taking actions. This needs automation of processes and using algorithms that are efficient and fast. In our work, we mine transaction data by modeling the data as graphs. We use clustering algorithms to discover communities (clusters) in the data and then use the clusters for building a recommendation system that can recommend products to customers based on their buying behavior

    Privacy and spectral analysis of social network randomization

    Get PDF
    Social networks are of significant importance in various application domains. Un- derstanding the general properties of real social networks has gained much attention due to the proliferation of networked data. Many applications of networks such as anonymous web browsing and data publishing require relationship anonymity due to the sensitive, stigmatizing, or confidential nature of the relationship. One general ap- proach for this problem is to randomize the edges in true networks, and only release the randomized networks for data analysis. Our research focuses on the development of randomization techniques such that the released networks can preserve data utility while preserving data privacy. Data privacy refers to the sensitive information in the network data. The released network data after a simple randomization could incur various disclosures including identity disclosure, link disclosure and attribute disclosure. Data utility refers to the information, features, and patterns contained in the network data. Many important features may not be preserved in the released network data after a simple randomiza- tion. In this dissertation, we develop advanced randomization techniques to better preserve data utility of the network data while still preserving data privacy. Specifi- cally we develop two advanced randomization strategies that can preserve the spectral properties of the network or can preserve the real features (e.g., modularity) of the network. We quantify to what extent various randomization techniques can protect data privacy when attackers use different attacks or have different background knowl- edge. To measure the data utility, we also develop a consistent spectral framework to measure the non-randomness (importance) of the edges, nodes, and the overall graph. Exploiting the spectral space of network topology, we further develop fraud detection techniques for various collaborative attacks in social networks. Extensive theoretical analysis and empirical evaluations are conducted to demonstrate the efficacy of our developed techniques

    Umělá inteligence v kybernetické bezpečnosti

    Get PDF
    Artifcial intelligence (AI) and machine learning (ML) have grown rapidly in recent years, and their applications in practice can be seen in many felds, ranging from facial recognition to image analysis. Recent developments in Artificial intelligence have a vast transformative potential for both cybersecurity defenders and cybercriminals. Anti-malware solutions adopt intelligent techniques to detect and prevent threats to the digital space. In contrast, cybercriminals are aware of the new prospects too and likely to adapt AI techniques to their operations. This thesis presents advances made so far in the field of applying AI techniques in cybersecurity for combating against cyber threats, to demonstrate how this promising technology can be a useful tool for detection and prevention of cyberattacks. Furthermore, the research examines how transnational criminal organizations and cybercriminals may leverage developing AI technology to conduct more sophisticated criminal activities. Next, the research outlines the possible dynamic new kind of malware, called X-Ware and X-sWarm, which simulates the swarm system behaviour and integrates the neural network to operate more efficiently as a background for the forthcoming anti-malware solution. This research proposes how to record and visualize the behaviour of these type of malware when it propagates through the file system, computer network (virus process is known) or by observed data analysis (virus process is not known and we observe only the data from the system). Finally, a paradigm of an anti-malware solution, named Multi agent antivirus system has been proposed in the thesis that gives the insight to develop a more robust, adaptive and flexible defence system.Význam umělé inteligence (AI) a strojového učení (ML) v posledních letech rychle rostl a na jejich aplikacích lze vidět, že v mnoha oblastech, od rozpoznávání obličeje až po analýzu obrazu, byl učiněn velký pokrok. Poslední vývoj v oblasti umělé inteligence má obrovský potenciál jak pro obránce v oblasti kybernetické bezpečnosti, tak pro ůtočníky. AI se stává řešením v otázce obrany proti modernímu malware a hraje tak důležitou roli v detekci a prevenci hrozeb v digitálním prostoru. Naproti tomu kyberzločinci jsou si vědomi nových vyhlídek ve spojení s AI a pravděpodobně přizpůsobí tyto techniky novým generacím malware, vektorům útoku a celkově jejich operacím. Tato práce představuje dosavadní pokroky aplikace technik AI v oblasti kybernetické bezpečnosti. V této oblasti tzn. v boji proti kybernetickým hrozbám se ukázuje jako slibná technologie a užitečný nástroj pro detekci a prevenci kybernetických útoků. V práci si rovněž pokládme otázku, jak mohou nadnárodní zločinecké organizace a počítačoví zločinci využít vyvíjející se technologii umělé inteligence k provádění sofistikovanějších trestných činností. Konečně, výzkum nastíní možný nový druh malware, nazvaný X-Ware, který simuluje chování hejnového systému a integruje neuronovou síť tak, aby fungovala efektivněji a tak se celý X-Ware a X-sWarm dal použít nejen jako kybernetická zbraň na útok, ale i jako antivirové obranné řešení. Tento výzkum navrhuje, jak zaznamenat a vizualizovat chování X-Ware, když se šíří prostřednictvím systému souborů, sítí a to jak analýzou jeho dynamiky (proces je znám), tak analýzou dat (proces není znám, pozorujeme jen data). Nakonec bylo v disertační práci navrženo paradigma řešení proti malwaru, jež bylo nazváno „Multi agent antivirus system“. Tato práce tedy poskytuje pohled na vývoj robustnějšího, adaptivnějšího a flexibilnějšího obranného systému.460 - Katedra informatikyvyhově

    Method for Enabling Causal Inference in Relational Domains

    Get PDF
    The analysis of data from complex systems is quickly becoming a fundamental aspect of modern business, government, and science. The field of causal learning is concerned with developing a set of statistical methods that allow practitioners make inferences about unseen interventions. This field has seen significant advances in recent years. However, the vast majority of this work assumes that data instances are independent, whereas many systems are best described in terms of interconnected instances, i.e. relational systems. This discrepancy prevents causal inference techniques from being reliably applied in many real-world settings. In this thesis, I will present three contributions to the field of causal inference that seek to enable the analysis of relational systems. First, I will present theory for consistently testing statistical dependence in relational domains. I then show how the significance of this test can be measured in practice using a novel bootstrap method for structured domains. Second, I show that statistical dependence in relational domains is inherently asymmetric, implying a simple test of causal direction from observational data. This test requires no assumptions on either the marginal distributions of variables or the functional form of dependence. Third, I describe relational causal adjustment, a procedure to identify the effects of arbitrary interventions from observational relational data via an extension of Pearl\u27s backdoor criterion. A series of evaluations on synthetic domains shows the estimates obtained by relational causal adjustment are close to those obtained from explicit experimentation

    Supporting meaningful social networks

    No full text
    Recent years have seen exponential growth of social network sites (SNSs) such as Friendster, MySpace and Facebook. SNSs flatten the real-world social network by making personal information and social structure visible to users outside the ego-centric networks. They provide a new basis of trust and credibility upon the Internet and Web infrastructure for users to communicate and share information. For the vast majority of social networks, it takes only a few clicks to befriend other members. People’s dynamic ever-changing real-world connections are translated to static links which, once formed, are permanent – thus entailing zero maintenance. The existence of static links as public exhibition of private connections causes the problem of friendship inflation, which refers to the online practice that users will usually acquire much more “friends” on SNSs than they can actually maintain in the real world. There is mounting evidence both in social science and statistical analysis to support the idea that there has been an inflated number of digital friendship connections on most SNSs. The theory of friendship inflation is also evidenced by our nearly 3-year observation on Facebook users in the University of Southampton. Friendship inflation can devalue the social graph and eventually lead to the decline of a social network site. From Sixdegrees.com to Facebook.com, there have been rise and fall of many social networks. We argue that friendship inflation is one of the main forces driving this move. Despite the gravity of the issue, there is surprisingly little academic research carried out to address the problems. The thesis proposes a novel algorithm, called ActiveLink, to identify meaningful online social connections. The innovation of the algorithm lies in the combination of preferential attachment and assortativity. The algorithm can identify long-range connections which may not be captured by simple reciprocity algorithms. We have tested the key ideas of the algorithms on the data set of 22,553 Facebook users in the network of University of Southampton. To better support the development of SNSs, we discuss an SNS model called RealSpace, a social network architecture based on active links. The system introduces three other algorithms: social connectivity, proximity index and community structure detection. Finally, we look at the problems relating to improving the network model and social network systems
    corecore