    Computational Modelling for Bankruptcy Prediction: Semantic data Analysis Integrating Graph Database and Financial Ontology

    In this paper, we propose a novel intelligent methodology to construct a Bankruptcy Prediction Computation Model, which is aimed to execute a company's financial status analysis accurately. Based on the semantic data analysis and management, our methodology considers Semantic Database System as the core of the system. It comprises three layers: an Ontology of Bankruptcy Prediction, Semantic Search Engine, and a Semantic Analysis Graph Database system. The Ontological layer defines the basic concepts of the financial risk management as well as the objects that serve as sources of knowledge for predicting a company's bankruptcy. The Graph Database layer utilises a powerful semantic data technology, which serves as a semantic data repository for our model. The article provides a detailed description of the construction of the Ontology and its informal conceptual representation. We also present a working prototype of the Graph Database system, constructed using the Neo4j application, and show the connection between well-known financial ratios. We argue that this methodology which utilises state of the art semantic data management mechanisms enables data processing and relevant computations in a more efficient way than approaches using the traditional relational database. These give us solid grounds to build a system that is capable of tackling the data of any complexity level

    선박 및 특허 정보에 대한 RDF 데이터 관리

    The Resource Description Framework (RDF) is widely used to represent information in the Web. Efforts have been made to map RDF data to a relational representation, and this method has been adopted by several systems. RDF is queried using SPARQL, a standard W3C-recommended query language used to query graph and represent data as RDF triples. A set of tools and technologies are implemented and tested using SPARQL and Apache Jena Fuseki as RDF triplestore. The increasing size of RDF data requires the storage and query of an efficient system. These organizations’ framework, in the aspect of designing, analyzing, query optimization, storage and processing is required for efficient retrieval of RDF data. Together, SPARQL and RDF make it easier to merge results from multiple data sources. This thesis provides an overview of the method to manage data based on existing data of two organizations such as Port-MIS and KIPRIS. The RDF model is designed to enable web-based representation, information exchange and yet to suggest a promising direction for future research.Abstract viii Chapter 1 Introduction 1 1.1 Background of Research 1 1.2 Research Objectives 1 1.3 Organization of Thesis 2 Chapter 2 Literature Review 4 2.1 Semantic Web 4 2.2 Linked Open Data 5 2.3 RDF and RDF Schema 8 2.4 SPARQL 11 2.5 Apache Jena Fuseki as RDF Triplestore 12 Chapter 3 RDF Schema Design 14 3.1 RDFS for Vessels 14 3.1.1 Vessel Information Structure 14 3.1.2 Vessel Information Structure Based on Port-MIS 18 3.1.3 RDF Sample Syntax of Port-MIS 21 3.2 RDFS for Patent Data 23 3.2.1 Patent Data Structure 24 3.2.2 Patent Data Structure Based on KIPRIS 28 3.2.3 RDF Sample Syntax of KIPRIS 31 Chapter 4 Implementation and Testing 34 4.1 System Architecture 34 4.2 Data Processing Structure 35 4.3 SPARQL Queries 36 Chapter 5 Conclusion and Further Work 43 References 44 Acknowledgement 48Maste

    Informative Armstrong RDF datasets for n-Ary relations

    The W3C standardized Semantic Web languages enable users to capture data without a schema in a manner which is intuitive to them. The challenge is that, for the data to be useful, it should be possible to query the data and to query it efficiently, which necessitates a schema. Understanding the structure of data is thus important to both users and storage implementers: The structure of the data gives insight to users in how to query the data while storage implementers can use the structure to optimize queries. In this paper we propose that data mining routines be used to infer candidate n-ary relations with related uniqueness- and null-free constraints, which can be used to construct an informative Armstrong RDF dataset. The benefit of an informative Armstrong RDF dataset is that it provides example data based on the original data which is a fraction of the size of the original data, while capturing the constraints of the original data faithfully. A case study on a DBPedia person dataset showed that the associated informative Armstrong RDF dataset contained 0.00003% of the statements of the original DBPedia dataset.https://www.iospress.nl/bookserie/frontiers-in-artificial-intelligence-and-applicationsam2019Informatic

    Linked Data -palvelu luontohavaintoaineistoille

    Biologisten havaintoaineistojen julkaiseminen linkitettynä datana mahdollistaa useiden aineistojen yhdistämisen toisiinsa. Yhdistämällä toisiinsa useita samaan asiaan liittyviä aineistoja, voidaan saavuttaa parempi ymmärrys kiinnostuksen kohteena olevasta ilmiöstä kuin tutkimalla aineistoja erikseen. Näin voidaan mahdollistaa tarkempien päätelmien tekeminen aineistojen pohjalta sekä etsiä odotettuja tai odottamattomia yhteyksiä aineistojen välillä. Linkitetyssä datassa käytetty RDF-tietomalli tuo aineistoihin koneluettavuuden ja helpon tavan viitata kaikkiin aineistojen osiin. Linkitettynä datana julkaistuja aineistoja voidaan helposti rikastaa yhä uusilla aineistoilla. Tässä tutkielmassa käsitellään Hangon lintuaseman havaintoaineiston sekä Ilmatieteenlaitoksen Hangon Russarön säähavaintoaineiston mallinnusta, käsittelyä ja hyödyntämistä linkitettynä datana. Aineistot on mallinnettu käyttäen RDF Data Cube -sanastoa, joka parantaa aineistojen yhteentoimivuutta. Lintuhavaintoaineistoon on annotoitu lajitietoa käyttäen ontologiaa Suomen linnuista, jota on rikastettu mm. lajien tuntomerkkiontologialla sekä uhanalaisuustiedoilla. Aineistot on julkaistu Linked Data Finland -alustalla, ja aineistojen välisten yhteyksien hahmottamiseksi on kehitetty visualisointipalvelun prototyyppi. Säätilan tiedetään olevan tärkeimpiä päivittäisen lintumuuton voimakkuuteen vaikuttavia tekijöitä. Visualisointipalvelulla pyritään näyttämään käyttäjälle, miten säätila vaikuttaa lintuhavaintomääriin ja erityisesti havaittuun lintumuuttoon. Aineistojen välisten suhteiden parempi tuntemus mahdollistaa tarkempien päätelmien tekemisen lintuhavaintoaineiston perusteella. Tutkielmassa esitetyt menetelmät ovat yleistettävissä lintu- ja säähavaintoaineistojen lisäksi muihin rakenteeltaan samankaltaisiin aineistoihin

    Generic Architecture for Predictive Computational Modelling with Application to Financial Data Analysis: Integration of Semantic Approach and Machine Learning

    The PhD thesis introduces a Generic Architecture for Predictive Computational Modelling capable of automating analytical conclusions regarding quantitative data structured as a data frame. The model involves heterogeneous data mining based on a semantic approach, graph-based methods (ontology, knowledge graphs, graph databases) and advanced machine learning methods. The main focus of my research is data pre-processing aimed at a more efficient selection of input features to the computational model. Since the model I propose is generic, it can be applied for data mining of all quantitative datasets (containing two-dimensional, size-mutable, heterogeneous tabular data); however, it is best suitable for highly interconnected data. To adapt this generic model to a specific use case, an Ontology as the formal conceptual representation for the relevant domain knowledge is needed. I have determined to use financial/market data for my use cases. In the course of practical experiments, the effectiveness of the PCM model application for the UK companies’ financial risk analysis and the FTSE100 market index forecasting was evaluated. The tests confirmed that the PCM model has more accurate outcomes than stand-alone traditional machine learning methods. By critically evaluating this architecture, I proved its validity and suggested directions for future research

