13,263 research outputs found

    VAT tax gap prediction: a 2-steps Gradient Boosting approach

    Full text link
    Tax evasion is the illegal evasion of taxes by individuals, corporations, and trusts. The revenue loss from tax avoidance can undermine the effectiveness and equity of the government policies. A standard measure of tax evasion is the tax gap, that can be estimated as the difference between the total amounts of tax theoretically collectable and the total amounts of tax actually collected in a given period. This paper presents an original contribution to bottom-up approach, based on results from fiscal audits, through the use of Machine Learning. The major disadvantage of bottom-up approaches is represented by selection bias when audited taxpayers are not randomly selected, as in the case of audits performed by the Italian Revenue Agency. Our proposal, based on a 2-steps Gradient Boosting model, produces a robust tax gap estimate and, embeds a solution to correct for the selection bias which do not require any assumptions on the underlying data distribution. The 2-steps Gradient Boosting approach is used to estimate the Italian Value-added tax (VAT) gap on individual firms on the basis of fiscal and administrative data income tax returns gathered from Tax Administration Data Base, for the fiscal year 2011. The proposed method significantly boost the performance in predicting with respect to the classical parametric approaches.Comment: 27 pages, 4 figures, 8 tables Presented at NTTS 2019 conference Under review at another peer-reviewed journa

    Ennustemallin kehittäminen suomalaisten PK-yritysten konkurssiriskin määritykseen

    Get PDF
    Bankruptcy prediction is a subject of significant interest to both academics and practitioners because of its vast economic and societal impact. Academic research in the field is extensive and diverse; no consensus has formed regarding the superiority of different prediction methods or predictor variables. Most studies focus on large companies; small and medium-sized enterprises (SMEs) have received less attention, mainly due to data unavailability. Despite recent academic advances, simple statistical models are still favored in practical use, largely due to their understandability and interpretability. This study aims to construct a high-performing but user-friendly and interpretable bankruptcy prediction model for Finnish SMEs using financial statement data from 2008–2010. A literature review is conducted to explore the key aspects of bankruptcy prediction; the findings are used for designing an empirical study. Five prediction models are trained on different predictor subsets and training samples, and two models are chosen for detailed examination based on the findings. A prediction model using the random forest method, utilizing all available predictors and the unadjusted training data containing an imbalance of bankrupt and non-bankrupt firms, is found to perform best. Superior performance compared to a benchmark model is observed in terms of both key metrics, and the random forest model is deemed easy to use and interpretable; it is therefore recommended for practical application. Equity ratio and financial expenses to total assets consistently rank as the best two predictors for different models; otherwise the findings on predictor importance are mixed, but mainly in line with the prevalent views in the related literature. This study shows that constructing an accurate but practical bankruptcy prediction model is feasible, and serves as a guideline for future scholars and practitioners seeking to achieve the same. Some further research avenues to follow are recognized based on empirical findings and the extant literature. In particular, this study raises an important question regarding the appropriateness of the most commonly used performance metrics in bankruptcy prediction. Area under the precision-recall curve (PR AUC), which is widely used in other fields of study, is deemed a suitable alternative and is recommended for measuring model performance in future bankruptcy prediction studies.Konkurssien ennustaminen on taloudellisten ja yhteiskunnallisten vaikutustensa vuoksi merkittävä aihe akateemisesta ja käytännöllisestä näkökulmasta. Alan tutkimus on laajaa ja monipuolista, eikä konsensusta parhaiden ennustemallien ja -muuttujien suhteen ole saavutettu. Valtaosa tutkimuksista keskittyy suuryrityksiin; pienten ja keskisuurten (PK)-yritysten konkurssimallinnus on jäänyt vähemmälle huomiolle. Akateemisen tutkimuksen viimeaikaisesta kehityksestä huolimatta käytännön sovellukset perustuvat usein yksinkertaisille tilastollisille malleille johtuen niiden paremmasta ymmärrettävyydestä. Tässä diplomityössä rakennetaan ennustemalli suomalaisten PK-yritysten konkurssiriskin määritykseen käyttäen tilinpäätösdataa vuosilta 2008–2010. Tavoitteena on tarkka, mutta käyttäjäystävällinen ja helposti tulkittava malli. Konkurssimallinnuksen keskeisiin osa-alueisiin perehdytään kirjallisuuskatsauksessa, jonka pohjalta suunnitellaan empiirinen tutkimus. Viiden mallinnusmenetelmän suoriutumista vertaillaan erilaisia opetusaineiston ja ennustemuuttujien osajoukkoja käyttäen, ja löydösten perusteella kaksi parasta menetelmää otetaan lähempään tarkasteluun. Satunnaismetsä (random forest) -koneoppimismenetelmää käyttävä, kaikkia saatavilla olevia ennustemuuttujia ja muokkaamatonta, epäsuhtaisesti konkurssi- ja ei-konkurssitapauksia sisältävää opetusaineistoa hyödyntävä malli toimii parhaiten. Keskeisten suorituskykymittarien valossa satunnaismetsämalli suoriutuu käytettyä verrokkia paremmin, ja todetaan helppokäyttöiseksi ja hyvin tulkittavaksi; sitä suositellaan sovellettavaksi käytäntöön. Omavaraisuusaste ja rahoituskulujen suhde taseen loppusummaan osoittautuvat johdonmukaisesti parhaiksi ennustemuuttujiksi eri mallinnusmetodeilla, mutta muilta osin havainnot muuttujien keskinäisestä paremmuudesta ovat vaihtelevia. Tämä diplomityö osoittaa, että konkurssiennustemalli voi olla sekä tarkka että käytännöllinen, ja tarjoaa suuntaviivoja tuleville tutkimuksille. Empiiristen havaintojen ja kirjallisuuslöydösten pohjalta esitetään jatkotutkimusehdotuksia. Erityisen tärkeä huomio on se, että konkurssiennustamisessa tyypillisesti käytettyjen suorituskykymittarien soveltuvuus on kyseenalaista konkurssitapausten harvinaisuudesta johtuen. Muilla tutkimusaloilla laajasti käytetty tarkkuus-saantikäyrän alle jäävä pinta-ala (PR AUC) todetaan soveliaaksi vaihtoehdoksi, ja sitä suositellaan käytettäväksi konkurssimallien suorituskyvyn mittaukseen. Avainsanat konkurssien ennustaminen, luottoriski, koneoppiminen

    See5 Algorithm versus Discriminant Analysis. An Application to the Prediction of Insolvency in Spanish Non-life Insurance Companies

    Get PDF
    Prediction of insurance companies insolvency has arised as an important problem in the field of financial research, due to the necessity of protecting the general public whilst minimizing the costs associated to this problem. Most methods applied in the past to tackle this question are traditional statistical techniques which use financial ratios as explicative variables. However, these variables do not usually satisfy statistical assumptions, what complicates the application of the mentioned methods.In this paper, a comparative study of the performance of a well-known parametric statistical technique (Linear Discriminant Analysis) and a non-parametric machine learning technique (See5) is carried out. We have applied the two methods to the problem of the prediction of insolvency of Spanish non-life insurance companies upon the basis of a set of financial ratios. Results indicate a higher performance of the machine learning technique, what shows that this method can be a useful tool to evaluate insolvency of insurance firms.Insolvency, Insurance Companies, Discriminant Analysis, See5.

    Company bankruptcy prediction framework based on the most influential features using XGBoost and stacking ensemble learning

    Get PDF
    Company bankruptcy is often a very big problem for companies. The impact of bankruptcy can cause losses to elements of the company such as owners, investors, employees, and consumers. One way to prevent bankruptcy is to predict the possibility of bankruptcy based on the company's financial data. Therefore, this study aims to find the best predictive model or method to predict company bankruptcy using the dataset from Polish companies bankruptcy. The prediction analysis process uses the best feature selection and ensemble learning. The best feature selection is selected using feature importance to XGBoost with a weight value filter of 10. The ensemble learning method used is stacking. Stacking is composed of the base model and meta learner. The base model consists of K-nearest neighbor, decision tree, SVM, and random forest, while the meta learner used is LightGBM. The stacking model accuracy results can outperform the base model accuracy with an accuracy rate of 97%

    The Impacts of Machine Learning in Financial Crisis Prediction

    Get PDF
    The most complicated and expected issue to be handled in corporate firms, small-scale businesses, and investors’ even governments are financial crisis prediction. To this effect, it was of interest to us to investigate the current impact of the newly employed technique that is machine learning (ML) to handle this menace in all spheres of business both private and public. The study uses systematic literature assessment to study the impact of ML in financial crisis prediction. From the selected works of literature, we have been able to establish the important role play by this method in the prediction of bankruptcy and creditworthiness that was not handled appropriately by others method. Also, machine learning helps in data handling, data privacy, and confidentiality. This study presents a leading approach to achieving financial growth and plasticity in corporate organizations. We, therefore, recommend a real-time study to investigate the impact of ML in FCP. &nbsp
    corecore