3 research outputs found
Reverse Intervention for Dealing with Malicious Information in Online Social Networks
Malicious information is often hidden in the massive data flow of online social networks. In “We Media'' era, if the system is closed without intervention, malicious information may spread to the entire network quickly, which would cause severe economic and political losses. This paper adopts a reverse intervention strategy from the perspective of topology control, so that the spread of malicious information could be suppressed at a minimum cost. Noting that as the information spreads, social networks often present a community structure and multiple malicious information promoters may appear. Therefore, this paper adopts a divide and conquer strategy and proposes an intervention algorithm based on subgraph partitioning, in which we search for some influential nodes to block or release clarification. The main algorithm consists of two main phases. Firstly, a subgraph partitioning method based on community structure is given to quickly extract the community structure of the information dissemination network. Secondly, a node blocking and clarification publishing algorithm based on the Jordan Center is proposed in the obtained subgraphs. Experiments show that the proposed algorithm can effectively suppress the spread of malicious information with a low time complexity compared with the benchmark algorithms
Credibility assessment of financial stock tweets
© 2020 The Authors Social media plays an important role in facilitating conversations and news dissemination. Specifically, Twitter has recently seen use by investors to facilitate discussions surrounding stock exchange-listed companies. Investors depend on timely, credible information being made available in order to make well-informed investment decisions, with credibility being defined as the believability of information. Much work has been done on assessing credibility on Twitter in domains such as politics and natural disaster events, but the work on assessing the credibility of financial statements is scant within the literature. Investments made on apocryphal information could hamper efforts of social media's aim of providing a transparent arena for sharing news and encouraging discussion of stock market events. This paper presents a novel methodology to assess the credibility of financial stock market tweets, which is evaluated by conducting an experiment using tweets pertaining to companies listed on the London Stock Exchange. Three sets of traditional machine learning classifiers (using three different feature sets) are trained using an annotated dataset. We highlight the importance of considering features specific to the domain in which credibility needs to be assessed for – in the case of this paper, financial features. In total, after discarding non-informative features, 34 general features are combined with over 15 novel financial features for training classifiers. Results show that classifiers trained on both general and financial features can yield improved performance than classifiers trained on general features alone, with Random Forest being the top performer, although the Random Forest model requires more features (37) than that of other classifiers (such as K-Nearest Neighbours − 9) to achieve such performance
A Smart Data Ecosystem for the Monitoring of Financial Market Irregularities
Investments made on the stock market depend on timely and credible information being made available to investors. Such information can be sourced from online news articles, broker agencies, and discussion platforms such as financial discussion boards and Twitter. The monitoring of such discussion is a challenging yet necessary task to support the transparency of the financial market. Although financial discussion boards are typically monitored by administrators who respond to other users reporting posts for misconduct, actively monitoring social media such as Twitter remains a difficult task.
Users sharing news about stock-listed companies on Twitter can embed cashtags in their tweets that mimic a company’s stock ticker symbol (e.g. TSCO on the London Stock Exchange refers to Tesco PLC). A cashtag is simply the ticker characters prefixed with a ’$’ symbol, which then becomes a clickable hyperlink – similar to a hashtag. Twitter, however, does not distinguish between companies with identical ticker symbols that belong to different exchanges. TSCO, for example, refers to Tesco PLC on the London Stock Exchange but also refers to the Tractor Supply Company listed on the NASDAQ. This research has referred to such scenarios as a ’cashtag collision’. Investors who wish to capitalise on the fast dissemination that Twitter provides
may become susceptible to tweets containing colliding cashtags. Further exacerbating
this issue is the presence of tweets referring to cryptocurrencies, which also
feature cashtags that could be identical to the cashtags used for stock-listed companies.
A system that is capable of identifying stock-specific tweets by resolving such
collisions, and assessing the credibility of such messages, would be of great benefit to
a financial market monitoring system by filtering out non-significant messages. This
project has involved the design and development of a novel, multi-layered, smart
data ecosystem to monitor potential irregularities within the financial market. This
ecosystem is primarily concerned with the behaviour of participants’ communicative
practices on discussion platforms and the activity surrounding company events
(e.g. a broker rating being issued for a company). A wide array of data sources –
such as tweets, discussion board posts, broker ratings, and share prices – is collected
to support this process. A novel data fusion model fuses together these data sources
to provide synchronicity to the data and allow easier analysis of the data to be undertaken
by combining data sources for a given time window (based on the company
the data refers to and the date and time). This data fusion model, located within the
data layer of the ecosystem, utilises supervised machine learning classifiers - due to
the domain expertise needed to accurately describe the origin of a tweet in a binary
way - that are trained on a novel set of features to classify tweets as being related to a
London Stock Exchange-listed company or not. Experiments involving the training
of such classifiers have achieved accuracy scores of up to 94.9%.
The ecosystem also adopts supervised learning to classify tweets concerning
their credibility. Credibility classifiers are trained on both general features found in
all tweets, and a novel set of features only found within financial stock tweets. The
experiments in which these credibility classifiers were trained have yielded AUC
scores of up to 94.3.
Once the data has been fused, and irrelevant tweets have been identified, unsupervised
clustering algorithms are then used within the detection layer of the
ecosystem to cluster tweets and posts for a specific time window or event as potentially
irregular. The results are then presented to the user within the presentation
and decision layer, where the user may wish to perform further analysis or additional
clustering