4,140 research outputs found
Textual Information and IPO Underpricing: A Machine Learning Approach
This study examines the predictive power of textual information from S-1 filings in explaining IPO underpricing. Our empirical approach differs from previous research, as we utilize several machine learning algorithms to predict whether an IPO will be underpriced, or not. We analyze a large sample of 2,481 U.S. IPOs from 1997 to 2016, and we find that textual information can effectively complement traditional financial variables in terms of prediction accuracy. In fact, models that use both textual data and financial variables as inputs have superior performance compared to models using a single type of input. We attribute our findings to the fact that textual information can reduce the ex-ante valuation uncertainty of IPO firms, thus leading to more accurate estimates
Essays on Financial Applications of Nonlinear Models
In this thesis, we examine the relationship between news and the
stock market. Further, we explore methods and build new nonlinear
models for forecasting stock price movement and portfolio
optimization based on past stock prices and on one type of big
data, news items, which are obtained through the RavenPack News
Analytics Global Equities editions.
The thesis consists of three essays. In Essay 1, we investigate
the relationship between news items and stock prices using the
artificial neural network (ANN) model. First, we use Granger
causality to ascertain how news items affect stock prices. The
results show that news volume is not the Granger cause of stock
price change; rather, news sentiment is. Second, we test the
semi–strong form efficient market hypothesis, whereas most
existing research testing efficient market hypothesis focuses on
the weak–form version. Our ANN strategies consistently
outperform the passive buy–and–hold strategy and this finding
is apparently at odds with the notion of the efficient market
hypothesis. Finally, using news sentiment analytics from
RavenPack Dow Jones News Analytics, we show positive
profitability with out–of–sample prediction using the
proposed ANN strategies for Google Inc. (NASDAQ: GOOG).
In Essay 2, we expand the utility of the information from news
volume and news sentiments to encompass portfolio
diversification. For the Dow Jones Industrial Average (DJIA)
components, we assign different weights to build portfolios
according to their weekly news volumes or news sentiments. Our
results show that news volume contributes to portfolio variance
both in–sample and out–of–sample: positive news sentiment
contributes to the portfolio return in–sample, while negative
contributes to the portfolio return out–of–sample, which is a
consequence of investors overreacting to the news sentiment.
Further, we propose a novel approach to portfolio diversification
using the k–Nearest Neighbors (kNN) algorithm based on the idea
that news sentiment correlates with stock returns.
Out–of–sample results indicate that such strategy dominates
the benchmark DJIA index portfolio.
In Essay 3, we propose a new model called the Combined Markov and
Hidden Markov Model (CMHMM), in which observation is affected by
a Markov model and an HMM (Hidden Markov Model) model. The three
fundamental questions of the CMHMM are discussed. Further, the
application of the CMHMM, in which the news sentiment is one
observation and the stock return is the other, is discussed. The
empirical results of the trading strategy based on the CMHMM show
the potential applications of the proposed model in finance.
This thesis contributes to the literature in a number of ways.
First, it extends the literature on financial applications of
nonlinear models. We explore the applications of the ANNs and kNN
in the financial market. Besides, the proposed new CMHMM model
adheres to the nature of the stock market and has better
potential prediction ability. Second, the empirical results from
this dissertation contribute to the understanding of the
relationship between news and the stock market. For instance, our
research found that news volume contributes to the portfolio
return and that investors overreact to news sentiment—a
phenomenon that has been discussed by other scholars from
different angles
A system to predict the S&P 500 using a bio-inspired algorithm
The goal of this research was to develop an algorithmic system capable of predicting the directional trend of the S&P 500 financial index. The approach I have taken was inspired by the biology of the human retina. Extensive research has been published attempting to predict different financial markets using historical data, testing on an in-sample and trend basis with many employing sophisticated mathematical techniques. In reviewing and evaluating these in-sample methodologies, it became evident that this approach was unable to achieve sufficiently reliable prediction performance for commercial exploitation. For these reasons, I moved to an out-of-sample strategy and am able to predict tomorrow’s (t+1) directional trend of the S&P 500 at 55.1%.
The key elements that underpin my bio-inspired out-of-sample system are:
Identification of 51 financial market data (FMD) inputs, including other indices, currency pairs, swap rates, that affect the 500 component companies of the S&P 500.
The use of an extensive historical data set, comprising the actual daily closing prices of the chosen 51 FMD inputs and S&P 500.
The ability to compute this large data set in a time frame of less than 24 hours.
The data set was fed into a linear regression algorithm to determine the predicted value of tomorrow’s (t+1) S&P 500 closing price. This process was initially carried out in MatLab which proved the concept of my approach, but (3) above was not met. In order to successfully meet the requirement of handling such a large data set to complete the prediction target on time, I decided to adopt a novel graphics processing unit (GPU) based computational architecture. Through extensive optimisation of my GPU engine, I was able to achieve a sufficient speed up of 150x to meet (3).
In achieving my optimum directional trend of 55.1%, an extensive range of tests exploring a number of trade offs were carried out using an 8 year data set. The results I have obtained will form the basis of a commercial investment fund.
It should be noted that my algorithm uses financial data of the past 60-days, and as such would not be able to predict rapid market changes such as a stock market crash
Predicting Startup Success Using Publicly Available Data
Predicting the success of an early-stage startup has always been a major effort for investors and venture funds. Statistically, there are about 305 million total startups created in a year, but less than 10% of them succeed to become profitable businesses. Accurately identifying the signs of startup growth is the work of countless investors, and in recent years, research has turned to machine learning in hopes of improving the accuracy and speed of startup success prediction.
To learn about a startup, investors have to navigate many different internet sources and often rely on personal intuition to determine the startup’s potential and likelihood of success. This thesis explores whether online data about a company, particularly general company data, previous funding events, published news articles, internet presence, and social media activity can be used to identify fast-growing startups. Data collected from Crunchbase, the Google Search API, and Twitter was used to predict whether a company will raise a round of funding within a fixed time horizon.
A total of ten machine learning models were evaluated and the CatBoost ensemble method achieved the best performance with precision, recall, and F1 scores of 0.663, 0.827, and 0.736 respectively for predicting funding within 3 years. The same ensem- ble method achieved F1 scores of 0.528, 0.683, 0.736, 0.763, and 0.777 at predicting funding 1-5 years into the future. The final objective was to predict whether a startup that had already raised an angel or seed round would raise another investment within a one-year horizon. The CatBoost model with a 0.75 cutoff achieved precision and F0.1 scores of 0.790 and 0.774, beating the results of previous work in this field
Can Deep Learning Techniques Improve the Risk Adjusted Returns from Enhanced Indexing Investment Strategies
Deep learning techniques have been widely applied in the field of stock market prediction particularly with respect to the implementation of active trading strategies. However, the area of portfolio management and passive portfolio management in particular has been much less well served by research to date. This research project conducts an investigation into the science underlying the implementation of portfolio management strategies in practice focusing on enhanced indexing strategies. Enhanced indexing is a passive management approach which introduces an element of active management with the aim of achieving a level of active return through small adjustments to the portfolio weights. It then proceeds to investigate current applications of deep learning techniques in the field of financial market predictions and also in the specific area of portfolio management. A series of successively deeper neural network models were then developed and assessed in terms of their ability to accurately predict whether a sample of stocks would either outperform or underperform the selected benchmark index. The predictions generated by these models were then used to guide the adjustment of portfolio weightings to implement and forward test an enhanced indexing strategy on a hypothetical stock portfolio
- …