6 research outputs found
A methodology for the resolution of cashtag collisions on Twitter – A natural language processing & data fusion approach
Investors utilise social media such as Twitter as a means of sharing news surrounding financials stocks
listed on international stock exchanges. Company ticker symbols are used to uniquely identify companies
listed on stock exchanges and can be embedded within tweets to create clickable hyperlinks referred to
as cashtags, allowing investors to associate their tweets with specific companies. The main limitation is
that identical ticker symbols are present on exchanges all over the world, and when searching for such
cashtags on Twitter, a stream of tweets is returned which match any company in which the cashtag
refers to - we refer to this as a cashtag collision. The presence of colliding cashtags could sow confusion
for investors seeking news regarding a specific company. A resolution to this issue would benefit investors
who rely on the speediness of tweets for financial information, saving them precious time. We propose
a methodology to resolve this problem which combines Natural Language Processing and Data Fusion
to construct company-specific corpora to aid in the detection and resolution of colliding cashtags, so
that tweets can be classified as being related to a specific stock exchange or not. Supervised machine
learning classifiers are trained twice on each tweet – once on a count vectorisation of the tweet text,
and again with the assistance of features contained in the company-specific corpora. We validate the
cashtag collision methodology by carrying out an experiment involving companies listed on the London
Stock Exchange. Results show that several machine learning classifiers benefit from the use of the custom
corpora, yielding higher classification accuracy in the prediction and resolution of colliding cashtags
A methodology for the resolution of cashtag collisions on Twitter – A natural language processing & data fusion approach
© 2019 The Authors. Investors utilise social media such as Twitter as a means of sharing news surrounding financials stocks listed on international stock exchanges. Company ticker symbols are used to uniquely identify companies listed on stock exchanges and can be embedded within tweets to create clickable hyperlinks referred to as cashtags, allowing investors to associate their tweets with specific companies. The main limitation is that identical ticker symbols are present on exchanges all over the world, and when searching for such cashtags on Twitter, a stream of tweets is returned which match any company in which the cashtag refers to - we refer to this as a cashtag collision. The presence of colliding cashtags could sow confusion for investors seeking news regarding a specific company. A resolution to this issue would benefit investors who rely on the speediness of tweets for financial information, saving them precious time. We propose a methodology to resolve this problem which combines Natural Language Processing and Data Fusion to construct company-specific corpora to aid in the detection and resolution of colliding cashtags, so that tweets can be classified as being related to a specific stock exchange or not. Supervised machine learning classifiers are trained twice on each tweet – once on a count vectorisation of the tweet text, and again with the assistance of features contained in the company-specific corpora. We validate the cashtag collision methodology by carrying out an experiment involving companies listed on the London Stock Exchange. Results show that several machine learning classifiers benefit from the use of the custom corpora, yielding higher classification accuracy in the prediction and resolution of colliding cashtags
A Smart Data Ecosystem for the Monitoring of Financial Market Irregularities
Investments made on the stock market depend on timely and credible information being made available to investors. Such information can be sourced from online news articles, broker agencies, and discussion platforms such as financial discussion boards and Twitter. The monitoring of such discussion is a challenging yet necessary task to support the transparency of the financial market. Although financial discussion boards are typically monitored by administrators who respond to other users reporting posts for misconduct, actively monitoring social media such as Twitter remains a difficult task.
Users sharing news about stock-listed companies on Twitter can embed cashtags in their tweets that mimic a company’s stock ticker symbol (e.g. TSCO on the London Stock Exchange refers to Tesco PLC). A cashtag is simply the ticker characters prefixed with a ’$’ symbol, which then becomes a clickable hyperlink – similar to a hashtag. Twitter, however, does not distinguish between companies with identical ticker symbols that belong to different exchanges. TSCO, for example, refers to Tesco PLC on the London Stock Exchange but also refers to the Tractor Supply Company listed on the NASDAQ. This research has referred to such scenarios as a ’cashtag collision’. Investors who wish to capitalise on the fast dissemination that Twitter provides
may become susceptible to tweets containing colliding cashtags. Further exacerbating
this issue is the presence of tweets referring to cryptocurrencies, which also
feature cashtags that could be identical to the cashtags used for stock-listed companies.
A system that is capable of identifying stock-specific tweets by resolving such
collisions, and assessing the credibility of such messages, would be of great benefit to
a financial market monitoring system by filtering out non-significant messages. This
project has involved the design and development of a novel, multi-layered, smart
data ecosystem to monitor potential irregularities within the financial market. This
ecosystem is primarily concerned with the behaviour of participants’ communicative
practices on discussion platforms and the activity surrounding company events
(e.g. a broker rating being issued for a company). A wide array of data sources –
such as tweets, discussion board posts, broker ratings, and share prices – is collected
to support this process. A novel data fusion model fuses together these data sources
to provide synchronicity to the data and allow easier analysis of the data to be undertaken
by combining data sources for a given time window (based on the company
the data refers to and the date and time). This data fusion model, located within the
data layer of the ecosystem, utilises supervised machine learning classifiers - due to
the domain expertise needed to accurately describe the origin of a tweet in a binary
way - that are trained on a novel set of features to classify tweets as being related to a
London Stock Exchange-listed company or not. Experiments involving the training
of such classifiers have achieved accuracy scores of up to 94.9%.
The ecosystem also adopts supervised learning to classify tweets concerning
their credibility. Credibility classifiers are trained on both general features found in
all tweets, and a novel set of features only found within financial stock tweets. The
experiments in which these credibility classifiers were trained have yielded AUC
scores of up to 94.3.
Once the data has been fused, and irrelevant tweets have been identified, unsupervised
clustering algorithms are then used within the detection layer of the
ecosystem to cluster tweets and posts for a specific time window or event as potentially
irregular. The results are then presented to the user within the presentation
and decision layer, where the user may wish to perform further analysis or additional
clustering
Credibility assessment of financial stock tweets
© 2020 The Authors Social media plays an important role in facilitating conversations and news dissemination. Specifically, Twitter has recently seen use by investors to facilitate discussions surrounding stock exchange-listed companies. Investors depend on timely, credible information being made available in order to make well-informed investment decisions, with credibility being defined as the believability of information. Much work has been done on assessing credibility on Twitter in domains such as politics and natural disaster events, but the work on assessing the credibility of financial statements is scant within the literature. Investments made on apocryphal information could hamper efforts of social media's aim of providing a transparent arena for sharing news and encouraging discussion of stock market events. This paper presents a novel methodology to assess the credibility of financial stock market tweets, which is evaluated by conducting an experiment using tweets pertaining to companies listed on the London Stock Exchange. Three sets of traditional machine learning classifiers (using three different feature sets) are trained using an annotated dataset. We highlight the importance of considering features specific to the domain in which credibility needs to be assessed for – in the case of this paper, financial features. In total, after discarding non-informative features, 34 general features are combined with over 15 novel financial features for training classifiers. Results show that classifiers trained on both general and financial features can yield improved performance than classifiers trained on general features alone, with Random Forest being the top performer, although the Random Forest model requires more features (37) than that of other classifiers (such as K-Nearest Neighbours − 9) to achieve such performance
Studi Netnografi Pola Komunikasi Jaringan Komunitas Cryptocurrency Dogecoin Pada Twitter
Cryptocurrency Dogecoin awalnya dianggap sebagai meme coin namun telah mengalami kenaikan nilai tukar sebanyak 800% pada Januari 2021 dan bertambah lagi sebesar 400% pada April 2021. Hal ini tidak lepas dari dukungan kuat dari komunitas cryptocurrency Dogecoin dan top public profiles pada media sosial Twitter. Penelitian ini menggunakan metode digital netnography untuk melihat pola komunikasi jaringan komunitas cryptocurrency Dogecoin di Twitter. Komunitas yang diteliti tidak terpusat pada akun komunitas tertentu namun meliputi seluruh akun Twitter yang aktif berdiskusi mengenai Dogecoin. Batasan penelitan adalah pada tanggal 1 April - 9 Mei 2021 bertepatan dengan beberapa peristiwa penting yang terjadi. Data yang digunakan adalah semua percakapan pada Twitter dengan kata kunci "Doge" dan diambil menggunakan social network analysis tools Brand24 dan Netlytic. Penelitian ini menemukan adanya 5 tipe interaksi yang merupakan pola komunikasi jaringan Dogecoin. Pola komunikasi yang ditemukan pada penelitian ini dapat memberikan masukan bagi pengembang Dogecoin dan cryptocurrency lainnya tentang pentingnya memberikan informasi yang dapat meyakinkan komunitas untuk tetap hold sebuah cryptocurrency. Kemudian pentingnya membina komunitas yang saling mendukung dan memberi semangat di antara anggota komunitas, dan pentingnya bekerjasama dengan top public profiles untuk memberikan keyakinan dan konfirmasi untuk mengatasi keresahan komunitas terkait volatility yang tinggi dari sebuah cryptocurrency