66 research outputs found

    Ennustemallin kehittäminen suomalaisten PK-yritysten konkurssiriskin määritykseen

    Get PDF
    Bankruptcy prediction is a subject of significant interest to both academics and practitioners because of its vast economic and societal impact. Academic research in the field is extensive and diverse; no consensus has formed regarding the superiority of different prediction methods or predictor variables. Most studies focus on large companies; small and medium-sized enterprises (SMEs) have received less attention, mainly due to data unavailability. Despite recent academic advances, simple statistical models are still favored in practical use, largely due to their understandability and interpretability. This study aims to construct a high-performing but user-friendly and interpretable bankruptcy prediction model for Finnish SMEs using financial statement data from 2008–2010. A literature review is conducted to explore the key aspects of bankruptcy prediction; the findings are used for designing an empirical study. Five prediction models are trained on different predictor subsets and training samples, and two models are chosen for detailed examination based on the findings. A prediction model using the random forest method, utilizing all available predictors and the unadjusted training data containing an imbalance of bankrupt and non-bankrupt firms, is found to perform best. Superior performance compared to a benchmark model is observed in terms of both key metrics, and the random forest model is deemed easy to use and interpretable; it is therefore recommended for practical application. Equity ratio and financial expenses to total assets consistently rank as the best two predictors for different models; otherwise the findings on predictor importance are mixed, but mainly in line with the prevalent views in the related literature. This study shows that constructing an accurate but practical bankruptcy prediction model is feasible, and serves as a guideline for future scholars and practitioners seeking to achieve the same. Some further research avenues to follow are recognized based on empirical findings and the extant literature. In particular, this study raises an important question regarding the appropriateness of the most commonly used performance metrics in bankruptcy prediction. Area under the precision-recall curve (PR AUC), which is widely used in other fields of study, is deemed a suitable alternative and is recommended for measuring model performance in future bankruptcy prediction studies.Konkurssien ennustaminen on taloudellisten ja yhteiskunnallisten vaikutustensa vuoksi merkittävä aihe akateemisesta ja käytännöllisestä näkökulmasta. Alan tutkimus on laajaa ja monipuolista, eikä konsensusta parhaiden ennustemallien ja -muuttujien suhteen ole saavutettu. Valtaosa tutkimuksista keskittyy suuryrityksiin; pienten ja keskisuurten (PK)-yritysten konkurssimallinnus on jäänyt vähemmälle huomiolle. Akateemisen tutkimuksen viimeaikaisesta kehityksestä huolimatta käytännön sovellukset perustuvat usein yksinkertaisille tilastollisille malleille johtuen niiden paremmasta ymmärrettävyydestä. Tässä diplomityössä rakennetaan ennustemalli suomalaisten PK-yritysten konkurssiriskin määritykseen käyttäen tilinpäätösdataa vuosilta 2008–2010. Tavoitteena on tarkka, mutta käyttäjäystävällinen ja helposti tulkittava malli. Konkurssimallinnuksen keskeisiin osa-alueisiin perehdytään kirjallisuuskatsauksessa, jonka pohjalta suunnitellaan empiirinen tutkimus. Viiden mallinnusmenetelmän suoriutumista vertaillaan erilaisia opetusaineiston ja ennustemuuttujien osajoukkoja käyttäen, ja löydösten perusteella kaksi parasta menetelmää otetaan lähempään tarkasteluun. Satunnaismetsä (random forest) -koneoppimismenetelmää käyttävä, kaikkia saatavilla olevia ennustemuuttujia ja muokkaamatonta, epäsuhtaisesti konkurssi- ja ei-konkurssitapauksia sisältävää opetusaineistoa hyödyntävä malli toimii parhaiten. Keskeisten suorituskykymittarien valossa satunnaismetsämalli suoriutuu käytettyä verrokkia paremmin, ja todetaan helppokäyttöiseksi ja hyvin tulkittavaksi; sitä suositellaan sovellettavaksi käytäntöön. Omavaraisuusaste ja rahoituskulujen suhde taseen loppusummaan osoittautuvat johdonmukaisesti parhaiksi ennustemuuttujiksi eri mallinnusmetodeilla, mutta muilta osin havainnot muuttujien keskinäisestä paremmuudesta ovat vaihtelevia. Tämä diplomityö osoittaa, että konkurssiennustemalli voi olla sekä tarkka että käytännöllinen, ja tarjoaa suuntaviivoja tuleville tutkimuksille. Empiiristen havaintojen ja kirjallisuuslöydösten pohjalta esitetään jatkotutkimusehdotuksia. Erityisen tärkeä huomio on se, että konkurssiennustamisessa tyypillisesti käytettyjen suorituskykymittarien soveltuvuus on kyseenalaista konkurssitapausten harvinaisuudesta johtuen. Muilla tutkimusaloilla laajasti käytetty tarkkuus-saantikäyrän alle jäävä pinta-ala (PR AUC) todetaan soveliaaksi vaihtoehdoksi, ja sitä suositellaan käytettäväksi konkurssimallien suorituskyvyn mittaukseen. Avainsanat konkurssien ennustaminen, luottoriski, koneoppiminen

    A Corpus Driven Computational Intelligence Framework for Deception Detection in Financial Text

    Get PDF
    Financial fraud rampages onwards seemingly uncontained. The annual cost of fraud in the UK is estimated to be as high as £193bn a year [1] . From a data science perspective and hitherto less explored this thesis demonstrates how the use of linguistic features to drive data mining algorithms can aid in unravelling fraud. To this end, the spotlight is turned on Financial Statement Fraud (FSF), known to be the costliest type of fraud [2]. A new corpus of 6.3 million words is composed of102 annual reports/10-K (narrative sections) from firms formally indicted for FSF juxtaposed with 306 non-fraud firms of similar size and industrial grouping. Differently from other similar studies, this thesis uniquely takes a wide angled view and extracts a range of features of different categories from the corpus. These linguistic correlates of deception are uncovered using a variety of techniques and tools. Corpus linguistics methodology is applied to extract keywords and to examine linguistic structure. N-grams are extracted to draw out collocations. Readability measurement in financial text is advanced through the extraction of new indices that probe the text at a deeper level. Cognitive and perceptual processes are also picked out. Tone, intention and liquidity are gauged using customised word lists. Linguistic ratios are derived from grammatical constructs and word categories. An attempt is also made to determine ‘what’ was said as opposed to ‘how’. Further a new module is developed to condense synonyms into concepts. Lastly frequency counts from keywords unearthed from a previous content analysis study on financial narrative are also used. These features are then used to drive machine learning based classification and clustering algorithms to determine if they aid in discriminating a fraud from a non-fraud firm. The results derived from the battery of models built typically exceed classification accuracy of 70%. The above process is amalgamated into a framework. The process outlined, driven by empirical data demonstrates in a practical way how linguistic analysis could aid in fraud detection and also constitutes a unique contribution made to deception detection studies

    The detection of fraudulent financial statements using textual and financial data

    Get PDF
    Das Vertrauen in die Korrektheit veröffentlichter Jahresabschlüsse bildet ein Fundament für funktionierende Kapitalmärkte. Prominente Bilanzskandale erschüttern immer wieder das Vertrauen der Marktteilnehmer in die Glaubwürdigkeit der veröffentlichten Informationen und führen dadurch zu einer ineffizienten Ressourcenallokation. Zuverlässige, automatisierte Betrugserkennungssysteme, die auf öffentlich zugänglichen Daten basieren, können dazu beitragen, die Prüfungsressourcen effizienter zuzuweisen und stärken die Resilienz der Kapitalmärkte indem Marktteilnehmer stärker vor Bilanzbetrug geschützt werden. In dieser Studie steht die Entwicklung eines Betrugserkennungsmodells im Vordergrund, welches aus textuelle und numerische Bestandteile von Jahresabschlüssen typische Muster für betrügerische Manipulationen extrahiert und diese in einem umfangreichen Aufdeckungsmodell vereint. Die Untersuchung stützt sich dabei auf einen umfassenden methodischen Ansatz, welcher wichtige Probleme und Fragestellungen im Prozess der Erstellung, Erweiterung und Testung der Modelle aufgreift. Die Analyse der textuellen Bestandteile der Jahresabschlüsse wird dabei auf Basis von Mehrwortphrasen durchgeführt, einschließlich einer umfassenden Sprachstandardisierung, um erzählerische Besonderheiten und Kontext besser verarbeiten zu können. Weiterhin wird die Musterextraktion um erfolgreiche Finanzprädiktoren aus den Rechenwerken wie Bilanz oder Gewinn- und Verlustrechnung angereichert und somit der Jahresabschluss in seiner Breite erfasst und möglichst viele Hinweise identifiziert. Die Ergebnisse deuten auf eine zuverlässige und robuste Erkennungsleistung über einen Zeitraum von 15 Jahren hin. Darüber hinaus implizieren die Ergebnisse, dass textbasierte Prädiktoren den Finanzkennzahlen überlegen sind und eine Kombination aus beiden erforderlich ist, um die bestmöglichen Ergebnisse zu erzielen. Außerdem zeigen textbasierte Prädiktoren im Laufe der Zeit eine starke Variation, was die Wichtigkeit einer regelmäßigen Aktualisierung der Modelle unterstreicht. Die insgesamt erzielte Erkennungsleistung konnte sich im Durchschnitt gegen vergleichbare Ansätze durchsetzen.Fraudulent financial statements inhibit markets allocating resources efficiently and induce considerable economic cost. Therefore, market participants strive to identify fraudulent financial statements. Reliable automated fraud detection systems based on publically available data may help to allocate audit resources more effectively. This study examines how quantitative data (financials) and corporate narratives, both can be used to identify accounting fraud (proxied by SEC’s AAERs). Thereby, the detection models are based upon a sound foundation from fraud theory, highlighting how accounting fraud is carried out and discussing the causes for companies to engage in fraudulent alteration of financial records. The study relies on a comprehensive methodological approach to create the detection model. Therefore, the design process is divided into eight design and three enhancing questions, shedding light onto important issues during model creation, improving and testing. The corporate narratives are analysed using multi-word phrases, including an extensive language standardisation that allows to capture narrative peculiarities more precisely and partly address context. The narrative clues are enriched by successful predictors from company financials found in previous studies. The results indicate a reliable and robust detection performance over a timeframe of 15 years. Furthermore, they suggest that text-based predictors are superior to financial ratios and a combination of both is required to achieve the best results possible. Moreover, it is found that text-based predictors vary considerably over time, which shows the importance of updating fraud detection systems frequently. The achieved detection performance was slightly higher on average than for comparable approaches

    Aplicaciones en Economía del Aprendizaje Automático

    Get PDF
    Tesis inédita de la Universidad Complutense de Madrid, Facultad de Ciencias Económicas y Empresariales, leída el 06-05-2022This Thesis examines problems in economics from a Machine Learning perspective. Emphasisis given on the interpretability of Machine Learning algorithms as opposed to blackbox predictions models. Chapter 1 provides an overview of the terminology and Machine Learning methods used throughout this Thesis. This chapter aims to build a roadmap from simple decision tree models to more advanced ensemble boosted algorithms. Other Machine Learning models are also explained. A discussion of the advances in Machine Learning in economics is also provided along with some of the pitfalls that Machine Learning faces. Moreover, an example of how Shapley values from coalition game theory are used to help infer inference from the Machine Learning models' predictions. Chapter 2 analyses the problem of bankruptcy prediction in the Spanish economy and how Machine Learning, not only provides more predictive accuracy, but can also provide adierent interpretation of the results that traditional econometric models cannot. Several financial ratios are constructed and passed to a series of Machine Learning algorithms. Case studies are provided which may aid in better decision-making from financial institutions. A section containing supplementary material based on further analysis is also provided...Este Tesis examina problemas en economía desde la perspectiva de Aprendizaje Mecánico. Se hace hincapié en la interpretabilidad de los algoritmos de Aprendizaje Mecánico en lugar de modelos de predicción de black-box. Capítulo 1 Proporciona el resumen de la terminología y los métodos de Aprendizaje Mecánico utilizados a lo largo de esta tesis. El objetivo de este capítulo es construir la trayectoria desde un simple árbol de decisión hasta algoritmos impulsados por conjuntos más avanzados. También se explican otros modelos de Machine Learning. Asimismo, se proporciona una discusión de los avances en el Aprendizaje Mecánico en economía junto con algunos de los escollos que enfrenta el aprendizaje automático. Además, un ejemplo sobre cómo se utilizan los valores de Shapley de coalición de teoría de juegos y muestran cómo se puede tomar inferencia de los modelos de predicción. Capítulo 2 Analiza el problema de la predicción de quiebra en la economía española y cómo Aprendizaje Mecánico, no sólo proporciona una mayor precisión predictiva, sino que también puede proporcionar una interpretación diferente de los resultados en la que los modelos econométricos tradicionales no pueden. Se construyen una serie de ratios financieros y se pasan a una serie de algoritmos de Aprendizaje Mecánico. Se proporcionan estudios de casos que pueden ayudar a mejorar la toma de decisiones por parte de las instituciones financieras. También se proporciona una sección que contiene material complementario basado en un análisis más detallado...Fac. de Ciencias Económicas y EmpresarialesTRUEunpu

    Data Science in Healthcare

    Get PDF
    Data science is an interdisciplinary field that applies numerous techniques, such as machine learning, neural networks, and deep learning, to create value based on extracting knowledge and insights from available data. Advances in data science have a significant impact on healthcare. While advances in the sharing of medical information result in better and earlier diagnoses as well as more patient-tailored treatments, information management is also affected by trends such as increased patient centricity (with shared decision making), self-care (e.g., using wearables), and integrated care delivery. The delivery of health services is being revolutionized through the sharing and integration of health data across organizational boundaries. Via data science, researchers can deliver new approaches to merge, analyze, and process complex data and gain more actionable insights, understanding, and knowledge at the individual and population levels. This Special Issue focuses on how data science is used in healthcare (e.g., through predictive modeling) and on related topics, such as data sharing and data management

    A survey of the application of soft computing to investment and financial trading

    Get PDF

    Operations Management

    Get PDF
    Global competition has caused fundamental changes in the competitive environment of the manufacturing and service industries. Firms should develop strategic objectives that, upon achievement, result in a competitive advantage in the market place. The forces of globalization on one hand and rapidly growing marketing opportunities overseas, especially in emerging economies on the other, have led to the expansion of operations on a global scale. The book aims to cover the main topics characterizing operations management including both strategic issues and practical applications. A global environmental business including both manufacturing and services is analyzed. The book contains original research and application chapters from different perspectives. It is enriched through the analyses of case studies
    corecore