61 research outputs found

    Six papers on computational methods for the analysis of structured and unstructured data in the economic domain

    Get PDF
    This work investigates the application of computational methods for structured and unstructured data. The domains of application are two closely connected fields with the common goal of promoting the stability of the financial system: systemic risk and bank supervision. The work explores different families of models and applies them to different tasks: graphical Gaussian network models to address bank interconnectivity, topic models to monitor bank news and deep learning for text classification. New applications and variants of these models are investigated posing a particular attention on the combined use of textual and structured data. In the penultimate chapter is introduced a sentiment polarity classification tool in Italian, based on deep learning, to simplify future researches relying on sentiment analysis. The different models have proven useful for leveraging numerical (structured) and textual (unstructured) data. Graphical Gaussian Models and Topic models have been adopted for inspection and descriptive tasks while deep learning has been applied more for predictive (classification) problems. Overall, the integration of textual (unstructured) and numerical (structured) information has proven useful for systemic risk and bank supervision related analysis. The integration of textual data with numerical data in fact, has brought either to higher predictive performances or enhanced capability of explaining phenomena and correlating them to other events.This work investigates the application of computational methods for structured and unstructured data. The domains of application are two closely connected fields with the common goal of promoting the stability of the financial system: systemic risk and bank supervision. The work explores different families of models and applies them to different tasks: graphical Gaussian network models to address bank interconnectivity, topic models to monitor bank news and deep learning for text classification. New applications and variants of these models are investigated posing a particular attention on the combined use of textual and structured data. In the penultimate chapter is introduced a sentiment polarity classification tool in Italian, based on deep learning, to simplify future researches relying on sentiment analysis. The different models have proven useful for leveraging numerical (structured) and textual (unstructured) data. Graphical Gaussian Models and Topic models have been adopted for inspection and descriptive tasks while deep learning has been applied more for predictive (classification) problems. Overall, the integration of textual (unstructured) and numerical (structured) information has proven useful for systemic risk and bank supervision related analysis. The integration of textual data with numerical data in fact, has brought either to higher predictive performances or enhanced capability of explaining phenomena and correlating them to other events

    The predictor impact of Web Search and Social Media

    Get PDF
    In recent years, web search and social media have emerged online. Search engine technology has had to speed up to keep up with the growth of the World Wide Web, that has turned the Internet into a wide information space with different and badly managed content. Millions of people all over the world search online several information each day, which makes Web search queries a valuable source of information. Due to the huge amount of available information, searching has become dominant in the use of Internet. Users that daily interact with search engines, produce valuable sources of interesting data regarding several aspects of the world. Social media increasingly pervades life in several fields of the world, enabling communication among users and collecting massive amount of information for social media companies that want to refine their products. Popular services like Twitter and Facebook attract a lot of users who share facts of their daily life. This kind of content has become more present on the web and, due to its public nature, even appears in search results from search engines, like Google and Bing. With the explosion of user generated content, came the need by politicians, analysts, researcher to monitor the content of different users. During my PhD, I decided to investigate whether social media activity or information collected by web search media could be profitable and used for predictive purposes. I studied whether some relationship exists between particular phenomena and volume of search data, considering the examined topic on web engines. Then, I analyzed the related social volume in order to discover whether the chatter of the community can be used to make qualitative predictions about the considered phenomena, attempting to establish whether there is any correlation. Simultaneously, I decided to apply automated Sentiment Analysis on shared short messages of users on Twitter in order to automatically analyze people opinions, sentiments, evaluations and attitude

    Exploring the value of big data analysis of Twitter tweets and share prices

    Get PDF
    Over the past decade, the use of social media (SM) such as Facebook, Twitter, Pinterest and Tumblr has dramatically increased. Using SM, millions of users are creating large amounts of data every day. According to some estimates ninety per cent of the content on the Internet is now user generated. Social Media (SM) can be seen as a distributed content creation and sharing platform based on Web 2.0 technologies. SM sites make it very easy for its users to publish text, pictures, links, messages or videos without the need to be able to program. Users post reviews on products and services they bought, write about their interests and intentions or give their opinions and views on political subjects. SM has also been a key factor in mass movements such as the Arab Spring and the Occupy Wall Street protests and is used for human aid and disaster relief (HADR). There is a growing interest in SM analysis from organisations for detecting new trends, getting user opinions on their products and services or finding out about their online reputation. Companies such as Amazon or eBay use SM data for their recommendation engines and to generate more business. TV stations buy data about opinions on their TV programs from Facebook to find out what the popularity of a certain TV show is. Companies such as Topsy, Gnip, DataSift and Zoomph have built their entire business models around SM analysis. The purpose of this thesis is to explore the economic value of Twitter tweets. The economic value is determined by trying to predict the share price of a company. If the share price of a company can be predicted using SM data, it should be possible to deduce a monetary value. There is limited research on determining the economic value of SM data for “nowcasting”, predicting the present, and for forecasting. This study aims to determine the monetary value of Twitter by correlating the daily frequencies of positive and negative Tweets about the Apple company and some of its most popular products with the development of the Apple Inc. share price. If the number of positive tweets about Apple increases and the share price follows this development, the tweets have predictive information about the share price. A literature review has found that there is a growing interest in analysing SM data from different industries. A lot of research is conducted studying SM from various perspectives. Many studies try to determine the impact of online marketing campaigns or try to quantify the value of social capital. Others, in the area of behavioural economics, focus on the influence of SM on decision-making. There are studies trying to predict financial indicators such as the Dow Jones Industrial Average (DJIA). However, the literature review has indicated that there is no study correlating sentiment polarity on products and companies in tweets with the share price of the company. The theoretical framework used in this study is based on Computational Social Science (CSS) and Big Data. Supporting theories of CSS are Social Media Mining (SMM) and sentiment analysis. Supporting theories of Big Data are Data Mining (DM) and Predictive Analysis (PA). Machine learning (ML) techniques have been adopted to analyse and classify the tweets. In the first stage of the study, a body of tweets was collected and pre-processed, and then analysed for their sentiment polarity towards Apple Inc., the iPad and the iPhone. Several datasets were created using different pre-processing and analysis methods. The tweet frequencies were then represented as time series. The time series were analysed against the share price time series using the Granger causality test to determine if one time series has predictive information about the share price time series over the same period of time. For this study, several Predictive Analytics (PA) techniques on tweets were evaluated to predict the Apple share price. To collect and analyse the data, a framework has been developed based on the LingPipe (LingPipe 2015) Natural Language Processing (NLP) tool kit for sentiment analysis, and using R, the functional language and environment for statistical computing, for correlation analysis. Twitter provides an API (Application Programming Interface) to access and collect its data programmatically. Whereas no clear correlation could be determined, at least one dataset was showed to have some predictive information on the development of the Apple share price. The other datasets did not show to have any predictive capabilities. There are many data analysis and PA techniques. The techniques applied in this study did not indicate a direct correlation. However, some results suggest that this is due to noise or asymmetric distributions in the datasets. The study contributes to the literature by providing a quantitative analysis of SM data, for example tweets about Apple and its most popular products, the iPad and iPhone. It shows how SM data can be used for PA. It contributes to the literature on Big Data and SMM by showing how SM data can be collected, analysed and classified and explore if the share price of a company can be determined based on sentiment time series. It may ultimately lead to better decision making, for instance for investments or share buyback

    Analyzing Granger causality in climate data with time series classification methods

    Get PDF
    Attribution studies in climate science aim for scientifically ascertaining the influence of climatic variations on natural or anthropogenic factors. Many of those studies adopt the concept of Granger causality to infer statistical cause-effect relationships, while utilizing traditional autoregressive models. In this article, we investigate the potential of state-of-the-art time series classification techniques to enhance causal inference in climate science. We conduct a comparative experimental study of different types of algorithms on a large test suite that comprises a unique collection of datasets from the area of climate-vegetation dynamics. The results indicate that specialized time series classification methods are able to improve existing inference procedures. Substantial differences are observed among the methods that were tested

    Diffusion of Falsehoods on Social Media

    Get PDF
    Misinformation has captured the interest of academia in recent years with several studies looking at the topic broadly. However, these studies mostly focused on rumors which are social in nature and can be either classified as false or real. In this research, we attempt to bridge the gap in the literature by examining the impacts of user characteristics and feature contents on the diffusion of (mis)information using verified true and false information. We apply a topic allocation model augmented by both supervised and unsupervised machine learning algorithms to identify tweets on novel topics. We find that retweet count is higher for fake news, novel tweets, and tweets with negative sentiment and lower lexical structure. In addition, our results show that the impacts of sentiment are opposite for fake news versus real news. We also find that tweets on the environment have a lower retweet count than the baseline religious news and real social news tweets are shared more often than fake social news. Furthermore, our studies show the counter intuitive nature of current correction endeavors by FEMA and other fact checking organizations in combating falsehoods. Specifically, we show that even though fake news causes an increase in correction messages, they influenced the propagation of falsehoods. Finally our empirical results reveal that correction messages, positive tweets and emotionally charged tweets morph faster. Furthermore, we show that tweets with positive sentiment or are emotionally charged morph faster over time. Word count and past morphing history also positively affect morphing behavior

    Towards a National Security Analysis Approach via Machine Learning and Social Media Analytics

    Get PDF
    Various severe threats at national and international level, such as health crises, radicalisation, or organised crime, have the potential of unbalancing a nation's stability. Such threats impact directly on elements linked to people's security, known in the literature as human security components. Protecting the citizens from such risks is the primary objective of the various organisations that have as their main objective the protection of the legitimacy, stability and security of the state. Given the importance of maintaining security and stability, governments across the globe have been developing a variety of strategies to diminish or negate the devastating effects of the aforementioned threats. Technological progress plays a pivotal role in the evolution of these strategies. Most recently, artificial intelligence has enabled the examination of large volumes of data and the creation of bespoke analytical tools that are able to perform complex tasks towards the analysis of multiple scenarios, tasks that would usually require significant amounts of human resources. Several research projects have already proposed and studied the use of artificial intelligence to analyse crucial problems that impact national security components, such as violence or ideology. However, the focus of all this prior research was examining isolated components. However, understanding national security issues requires studying and analysing a multitude of closely interrelated elements and constructing a holistic view of the problem. The work documented in this thesis aims at filling this gap. Its main contribution is the creation of a complete pipeline for constructing a big picture that helps understand national security problems. The proposed pipeline covers different stages and begins with the analysis of the unfolding event, which produces timely detection points that indicate that society might head toward a disruptive situation. Then, a further examination based on machine learning techniques enables the interpretation of an already confirmed crisis in terms of high-level national security concepts. Apart from using widely accepted national security theoretical constructions developed over years of social and political research, the second pillar of the approach is the modern computational paradigms, especially machine learning and its applications in natural language processing

    Digital user's decision journey

    Full text link
    The landscape of the Internet is continually evolving. This creates huge opportunities for different industries to optimize vital channels online, resulting in various-forms of new Internet services. As a result, digital users are interacting with many digital systems and they are exhibiting dynamic behaviors. Their shopping behaviors are drastically different today than it used to be, with offline and online shopping interacting with each other. They have many channels to access online media but their consumption patterns on different channels are quite different. They do philanthropy online to help others but their heterogeneous motivations and different fundraising campaigns leads to distinct path-to-contribution. Understanding the digital user’s decision making process behind their dynamic behaviors is critical as they interact with various digital systems for the firms to improve user experience and improve their bottom line. In this thesis, I study digital users’ decision journeys and the corresponding digital technology firms’ strategies using inter-disciplinary approaches that combine econometrics, economic structural modeling and machine learning. The uncovered decision journey not only offer empirical managerial insights but also provide guideline for introducing intervention to better serve digital users

    Combining machine-based and econometrics methods for policy analytics insights

    Get PDF
    National Research Foundation (NRF) Singapore under International Research Centre @ Singapore Funding Initiativ
    • …
    corecore