4 research outputs found
Auto-clustering of Financial Reports Based on Formatting Style and Author’s Fingerprint
peer reviewedWe present a new clustering algorithm of financial reports that is based on the reports’ formatting and style. The algorithm uses layout and content information to automatically generate as many clusters as needed. This allows us to reduce the effort of labeling the reports in order to train text-based machine learning models for extracting person or company names, addresses, financial categories, etc. In addition, the algorithm also produces a set of sub-clusters inside each cluster, where each sub-cluster corresponds to a set of reports made by the same author (person or firm). The information about sub-clusters allows us to evaluate the change in the author over time. We have applied the algorithm to a dataset with over 38,000 financial reports (last Annual Account presented by a company) from the Luxembourg Business Registers (LBR) and found 2,165 clusters between 2 and 850 documents with a median of 4 and an average of 14. When adding 2,500 new documents to the existing cluster set (previous annual accounts presented by companies), we found that 67.3% of the financial reports were placed in the correct cluster and sub-cluster. From the remaining documents, 65% were placed in a different subcluster because the company changed the formatting style, which is expected and correct behavior. Finally, labeling 11% of the entire dataset, we can replicate these labels up to 72% of the dataset, keeping a high feature coverage.U-AGR-7012 - BRIDGES2020/IS/15403349/SCRiPT_Yoba Cont (01/04/2021 - 31/03/2024) - BRORSSON Mats Hakan9. Industry, innovation and infrastructur
Effective Automatic Feature Engineering on Financial Statements for Bankruptcy Prediction
peer reviewedFeature engineering on financial records for
bankruptcy prediction has traditionally relied significantly on
domain knowledge and typically results in a range of financial
ratios but with limited complexity and feature utilization due
to manual design. It is often a time-consuming and error-prone
procedure, confined to the domain experts’ experience, without
taking into account the characteristics of different data sets.
In this paper, we propose an automated feature engineering
approach to generate effective, explainable, and extensible model
training features. The experiments have been conducted using
a publicly available record of financial statements submitted
to the Luxembourg Business Registers. This approach aims to
improve bankruptcy prediction for professionals who may not
possess the necessary engineering expertise or efficient data.
The experimental results suggest that the proposed approach
can provide valuable features for model training and in most
of the cases, the model’s outcomes outperforms predominantly
as compared to the traditional approaches and the well-known
approaches the models, thus can provide valuable features for
model training
Towards an autonomous vision-based unmanned aerial system against wildlife poachers.
Poaching is an illegal activity that remains out of control in many countries. Based on the 2014 report of the United Nations and Interpol, the illegal trade of global wildlife and natural resources amounts to nearly $ 213 billion every year, which is even helping to fund armed conflicts. Poaching activities around the world are further pushing many animal species on the brink of extinction. Unfortunately, the traditional methods to fight against poachers are not enough, hence the new demands for more efficient approaches. In this context, the use of new technologies on sensors and algorithms, as well as aerial platforms is crucial to face the high increase of poaching activities in the last few years. Our work is focused on the use of vision sensors on UAVs for the detection and tracking of animals and poachers, as well as the use of such sensors to control quadrotors during autonomous vehicle following and autonomous landing