884 research outputs found
La Semaine égyptienne, de 1926 à 1939 ou la littérature comme ailleurs
Histoire d'une revue littéraire et intellectuelle dans l'Egypte francophone dans l'entre-deux guerre
La Semaine égyptienne, de 1926 à 1939 ou la littérature comme ailleurs
Histoire d'une revue littéraire et intellectuelle dans l'Egypte francophone dans l'entre-deux guerre
Spartan Daily, May 14, 1962
Volume 49, Issue 118https://scholarworks.sjsu.edu/spartandaily/4305/thumbnail.jp
German FinBERT: A German Pre-trained Language Model
This study presents German FinBERT, a novel pre-trained German language model
tailored for financial textual data. The model is trained through a
comprehensive pre-training process, leveraging a substantial corpus comprising
financial reports, ad-hoc announcements and news related to German companies.
The corpus size is comparable to the data sets commonly used for training
standard BERT models. I evaluate the performance of German FinBERT on
downstream tasks, specifically sentiment prediction, topic recognition and
question answering against generic German language models. My results
demonstrate improved performance on finance-specific data, indicating the
efficacy of German FinBERT in capturing domain-specific nuances. The presented
findings suggest that German FinBERT holds promise as a valuable tool for
financial text analysis, potentially benefiting various applications in the
financial domain
Innovative Sentiment Analysis and Prediction of Stock Price Using FinBERT, GPT-4 and Logistic Regression:A Data-Driven Approach
This study explores the comparative performance of cutting-edge AI models, i.e., Finaance Bidirectional Encoder representations from Transsformers (FinBERT), Generatice Pre-trained Transformer GPT-4, and Logistic Regression, for sentiment analysis and stock index prediction using financial news and the NGX All-Share Index data label. By leveraging advanced natural language processing models like GPT-4 and FinBERT, alongside a traditional machine learning model, Logistic Regression, we aim to classify market sentiment, generate sentiment scores, and predict market price movements. This research highlights global AI advancements in stock markets, showcasing how state-of-the-art language models can contribute to understanding complex financial data. The models were assessed using metrics such as accuracy, precision, recall, F1 score, and ROC AUC. Results indicate that Logistic Regression outperformed the more computationally intensive FinBERT and predefined approach of versatile GPT-4, with an accuracy of 81.83% and a ROC AUC of 89.76%. The GPT-4 predefined approach exhibited a lower accuracy of 54.19% but demonstrated strong potential in handling complex data. FinBERT, while offering more sophisticated analysis, was resource-demanding and yielded a moderate performance. Hyperparameter optimization using Optuna and cross-validation techniques ensured the robustness of the models. This study highlights the strengths and limitations of the practical applications of AI approaches in stock market prediction and presents Logistic Regression as the most efficient model for this task, with FinBERT and GPT-4 representing emerging tools with potential for future exploration and innovation in AI-driven financial analytics
Sentiment trading with large language models
We analyse the performance of the large language models (LLMs) OPT, BERT, and FinBERT, alongside the traditional Loughran-McDonald dictionary, in the sentiment analysis of 965,375 U.S. financial news articles from 2010 to 2023. Our findings reveal that the GPT-3-based OPT model significantly outperforms the others, predicting stock market returns with an accuracy of 74.4%. A long-short strategy based on OPT, accounting for 10 basis points (bps) in transaction costs, yields an exceptional Sharpe ratio of 3.05. From August 2021 to July 2023, this strategy produces an impressive 355% gain, outperforming other strategies and traditional market portfolios. This underscores the transformative potential of LLMs in financial market prediction and portfolio management and the necessity of employing sophisticated language models to develop effective investment strategies based on news sentiment
Prediction of Stock Market Volatility Utilizing Sentiment from News and Social Media Texts : A study on the practical implementation of sentiment analysis and deep learning models for predicting day-ahead volatility
This thesis studies the impact of sentiment on the prediction of volatility for 100 of the largest
stocks in the S&P500 index. The purpose is to find out if sentiment can improve the forecast
of day-ahead volatility wherein volatility is measured as the realized volatility of intraday
returns.
The textual data has been gathered from three different sources: Eikon, Twitter, and Reddit.
The data consists of respectively 397 564 headlines from Eikon, 35 811 098 tweets, and 4
109 008 comments from Reddit. These numbers represent the uncleaned data before
filtration. The data has been collected for the period between 01.08.2021 and 31.08.2022.
Sentiment is calculated by the FinBERT model, an NLP model created by further pre-training
of the BERT model on financial text. To predict volatility with the sentiment from FinBERT,
three different deep learning models have been applied: A feed forward neural network, a
recurrent neural network, and a long short-term memory model. They are used to solve both
regression and classification problems.
The inference analysis shows significant effects from the computed sentiment variables, and
it implies that there exists a correlation between the number of text items and volatility. This
is in line with previous literature on sentiment and volatility. The results from the deep
learning models show that sentiment has an impact on the prediction of volatility. Both in
terms of lower MSE and MAE for the regression problem and higher accuracy for the
classification problem.
Moreover, this thesis looks at potential weaknesses that could influence the validity of the
results. Potential weaknesses include how sentiment is represented, noise in the data, and the
Absftarcatc tthat the FinBERT model is not trained on financial oriented text from social media.nhhma
Decoding the numbers and language behind financial statement fraud
Financial statement fraud costs companies, in addition to corruption and asset misappropriation, over 5 trillion US dollars annually. The timely detection of this offense plays a crucial role in the damage suffered. Therefore, automated methods capable of identifying high-probability fraud occurrences are essential. Therefore, this study evaluates the potential of Large Language Models (LLMs) such as BERT and FinBERT by comparing their performance to that of well-established models like the Logistic Regression and the XGBoost.
To accomplished this, in our study, we went over the Management’s Discussion & Analysis (MD&A) section of 1850 10-K reports (1436 non-fraud and 414 fraud), alongside financial ratios and raw accounting variables from companies which were known to have manipulated at least a single report in the past spanning from 1993 to 2014. Models were trained using three variable types: financial, text, and a combination of both. Evaluation was done using three metrics, AUC, NDCG@k and a threshold-based ‘Capture’, as to the specific problem, probabilities can be more informative than labels.
The results suggest that the last part of the MD&A section captures more relevant information than the beginning. Additionally, rank-averaging predictions from models based on the first and last parts of the section did not yield significant improvements despite the improved capture. FinBERT outperformed BERT and achieved AUC scores comparable to traditional models that leverage OpenAI’s ‘text-embedding-3-large’ and surpass them in both NDCG@k and capture rates. Thus, FinBERT’s domain-specific pretraining proved to be particularly advantageous in enhancing fraud detection performance.A fraude de reporte financeiro custa às empresas, a par da corrupção e da apropriação indevida de ativos, mais de 5 biliões de dólares americanos por ano. A deteção atempada desta infração desempenha um papel crucial nos danos sofridos. Por conseguinte, é essencial dispor de métodos automatizados capazes de identificar ocorrências com elevada probabilidade de fraude. Neste sentido, este estudo avaliou o potencial dos Modelos de Linguagem de Grande Escala (LLMs) como o BERT e o FinBERT, comparando o seu desempenho com modelos como a Regressão Logística e o XGBoost.
Para tal, analisou-se a secção “Management's Discussion & Analysis” de 1850 relatórios 10-K (1436 não fraudulentos e 414 fraudulentos), juntamente com rácios financeiros e variáveis contabilísticas de empresas, entre 1993 e 2014. Os modelos treinados utilizaram três tipos de variáveis: financeiras, textuais e uma combinação de ambas. A avaliação baseou-se em três métricas: AUC, NDCG@k e uma ‘Captura’ baseada num valor limite, visto que, neste caso, as probabilidades de fraude podem ser mais informativas do que as classes preditas pelo modelo.
Os resultados sugerem que a última parte da secção MD&A capta informações mais relevantes do que a inicial. Além disso, a média das previsões dos modelos baseados na primeira e na última parte da secção aparenta não melhorar significativamente os resultados apesar de melhorar a captura. O FinBERT superou o BERT e obteve valores de AUC comparáveis aos modelos tradicionais que utilizam o 'text-embedding-3-large' da OpenAI, obtendo também valores superiores de NDCG@k e de ‘Captura’
- …
