Processing social media text for the quantamental analyses of cryptoasset time series

Abstract

This thesis analyses social media text to identify which events and concerns are associated with changes between phases of rising and falling cryptoasset prices. A new cryptoasset classification system, based on token functionality, highlights Bitcoin as the largest example of a 'crypto-transaction' system and Ethereum as the largest example of a 'crypto-fuel' system. The price of ether is only weakly correlated with that of bitcoin (Spearman's rho 0.3849). Both bitcoin and ether show distinct phases of rising or falling prices and have a large, dedicated social media forum on Reddit. A process is developed to extract events and concerns discussed on social media associated with these different phases of price movement. This innovative data-driven approach circumvents the need to pre-judge social media metrics. First, a new, non-parametric Data-Driven Phasic Word Identification methodology is developed to find words associated with the phase of declining bitcoin prices in 2017-18. This approach is further developed to find the context of these words, from which topics are inferred. Then, neural networks (word2vec) are applied to evolve analysis from extracting words to extracting topics. Finally, this work enables the development of a framework for identifying which events and concerns are plausible causes of changes between different phases in the ether and bitcoin price series. Consistent with Bitcoin providing a form of money and Ethereum providing a platform for developing applications, these results show the one-off effect of regulatory bans on bitcoin, and the recurring effects of rival innovations on ether price. The results also suggest the influence of technical traders, captured through market price discourse, on both cryptoassets. This thesis demonstrates the value of a quantamental approach to the analysis of cryptoasset prices

    Similar works