481 research outputs found

    Behavioral analysis in cybersecurity using machine learning: a study based on graph representation, class imbalance and temporal dissection

    Get PDF
    The main goal of this thesis is to improve behavioral cybersecurity analysis using machine learning, exploiting graph structures, temporal dissection, and addressing imbalance problems.This main objective is divided into four specific goals: OBJ1: To study the influence of the temporal resolution on highlighting micro-dynamics in the entity behavior classification problem. In real use cases, time-series information could be not enough for describing the entity behavior classification. For this reason, we plan to exploit graph structures for integrating both structured and unstructured data in a representation of entities and their relationships. In this way, it will be possible to appreciate not only the single temporal communication but the whole behavior of these entities. Nevertheless, entity behaviors evolve over time and therefore, a static graph may not be enoughto describe all these changes. For this reason, we propose to use a temporal dissection for creating temporal subgraphs and therefore, analyze the influence of the temporal resolution on the graph creation and the entity behaviors within. Furthermore, we propose to study how the temporal granularity should be used for highlighting network micro-dynamics and short-term behavioral changes which can be a hint of suspicious activities. OBJ2: To develop novel sampling methods that work with disconnected graphs for addressing imbalanced problems avoiding component topology changes. Graph imbalance problem is a very common and challenging task and traditional graph sampling techniques that work directly on these structures cannot be used without modifying the graph’s intrinsic information or introducing bias. Furthermore, existing techniques have shown to be limited when disconnected graphs are used. For this reason, novel resampling methods for balancing the number of nodes that can be directly applied over disconnected graphs, without altering component topologies, need to be introduced. In particular, we propose to take advantage of the existence of disconnected graphs to detect and replicate the most relevant graph components without changing their topology, while considering traditional data-level strategies for handling the entity behaviors within. OBJ3: To study the usefulness of the generative adversarial networks for addressing the class imbalance problem in cybersecurity applications. Although traditional data-level pre-processing techniques have shown to be effective for addressing class imbalance problems, they have also shown downside effects when highly variable datasets are used, as it happens in cybersecurity. For this reason, new techniques that can exploit the overall data distribution for learning highly variable behaviors should be investigated. In this sense, GANs have shown promising results in the image and video domain, however, their extension to tabular data is not trivial. For this reason, we propose to adapt GANs for working with cybersecurity data and exploit their ability in learning and reproducing the input distribution for addressing the class imbalance problem (as an oversampling technique). Furthermore, since it is not possible to find a unique GAN solution that works for every scenario, we propose to study several GAN architectures with several training configurations to detect which is the best option for a cybersecurity application. OBJ4: To analyze temporal data trends and performance drift for enhancing cyber threat analysis. Temporal dynamics and incoming new data can affect the quality of the predictions compromising the model reliability. This phenomenon makes models get outdated without noticing. In this sense, it is very important to be able to extract more insightful information from the application domain analyzing data trends, learning processes, and performance drifts over time. For this reason, we propose to develop a systematic approach for analyzing how the data quality and their amount affect the learning process. Moreover, in the contextof CTI, we propose to study the relations between temporal performance drifts and the input data distribution for detecting possible model limitations, enhancing cyber threat analysis.Programa de Doctorado en Ciencias y Tecnologías Industriales (RD 99/2011) Industria Zientzietako eta Teknologietako Doktoretza Programa (ED 99/2011

    K-Means Algorithm for Recognizing Fraud Users on a Bitcoin Exchange Platform

    Get PDF
    This paper addresses recognizing fraud users on a Bitcoin exchange website-bitcoin-otc. According to online rating records provided by the website, some users behave significantly different from others. Seeing that, the classical K-means clustering algorithm is proposed to identify these abnormal users. K-means algorithm is an unsupervised clustering algorithm that clusters users based on feature similarity. Therefore, performance of K-means algorithm relies on the features. This paper explored and found the best collection of features based on real record data, e.g., mean of total ratings sent. Since the selected features are not observed for record set, the website should offer these features for potential traders

    The bow tie structure of the Bitcoin users graph

    Get PDF
    Abstract The availability of the entire Bitcoin transaction history, stored in its public blockchain, offers interesting opportunities for analysing the transaction graph to obtain insight on users behaviour. This paper presents an analysis of the Bitcoin users graph, obtained by clustering the transaction graph, to highlight its connectivity structure and the economical meaning of the different obtained components. In fact, the bow tie structure, already observed for the graph of the web, is augmented, in the Bitocoin users graph, with the economical information about the entities involved. We study the connectivity components of the users graph individually, to infer their macroscopic contribution to the whole economy. We define and evaluate a set of measures of nodes inside each component to characterize and quantify such a contribution. We also perform a temporal analysis of the evolution of the resulting bow tie structure. Our findings confirm our hypothesis on the components semantic, defined in terms of their economical role in the flow of value inside the graph

    Cryptocurrencies as a financial asset: a systematic analysis

    Get PDF
    This paper provides a systematic review of the empirical literature based on the major topics that have been associated with the market for cryptocurrencies since their development as a financial asset in 2009. Despite astonishing price appreciation in recent years, cryptocurrencies have been subjected to accusations of pricing bubbles central to the trilemma that exists between regulatory oversight, the potential for illicit use through its anonymity within a young under-developed exchange system, and infrastructural breaches influenced by the growth of cybercriminality. Each influences the perception of the role of cryptocurrencies as a credible investment asset class and legitimate of value

    Modelling Determinants of Cryptocurrency Prices: A Bayesian Network Approach

    Full text link
    The growth of market capitalisation and the number of altcoins (cryptocurrencies other than Bitcoin) provide investment opportunities and complicate the prediction of their price movements. A significant challenge in this volatile and relatively immature market is the problem of predicting cryptocurrency prices which needs to identify the factors influencing these prices. The focus of this study is to investigate the factors influencing altcoin prices, and these factors have been investigated from a causal analysis perspective using Bayesian networks. In particular, studying the nature of interactions between five leading altcoins, traditional financial assets including gold, oil, and S\&P 500, and social media is the research question. To provide an answer to the question, we create causal networks which are built from the historic price data of five traditional financial assets, social media data, and price data of altcoins. The ensuing networks are used for causal reasoning and diagnosis, and the results indicate that social media (in particular Twitter data in this study) is the most significant influencing factor of the prices of altcoins. Furthermore, it is not possible to generalise the coins' reactions against the changes in the factors. Consequently, the coins need to be studied separately for a particular price movement investigation