173 research outputs found

    Identifying and Consolidating Knowledge Engineering Requirements

    Full text link
    Knowledge engineering is the process of creating and maintaining knowledge-producing systems. Throughout the history of computer science and AI, knowledge engineering workflows have been widely used because high-quality knowledge is assumed to be crucial for reliable intelligent agents. However, the landscape of knowledge engineering has changed, presenting four challenges: unaddressed stakeholder requirements, mismatched technologies, adoption barriers for new organizations, and misalignment with software engineering practices. In this paper, we propose to address these challenges by developing a reference architecture using a mainstream software methodology. By studying the requirements of different stakeholders and eras, we identify 23 essential quality attributes for evaluating reference architectures. We assess three candidate architectures from recent literature based on these attributes. Finally, we discuss the next steps towards a comprehensive reference architecture, including prioritizing quality attributes, integrating components with complementary strengths, and supporting missing socio-technical requirements. As this endeavor requires a collaborative effort, we invite all knowledge engineering researchers and practitioners to join us

    General Course Catalog [2022/23 academic year]

    Get PDF
    General Course Catalog, 2022/23 academic yearhttps://repository.stcloudstate.edu/undergencat/1134/thumbnail.jp

    LASSO – an observatorium for the dynamic selection, analysis and comparison of software

    Full text link
    Mining software repositories at the scale of 'big code' (i.e., big data) is a challenging activity. As well as finding a suitable software corpus and making it programmatically accessible through an index or database, researchers and practitioners have to establish an efficient analysis infrastructure and precisely define the metrics and data extraction approaches to be applied. Moreover, for analysis results to be generalisable, these tasks have to be applied at a large enough scale to have statistical significance, and if they are to be repeatable, the artefacts need to be carefully maintained and curated over time. Today, however, a lot of this work is still performed by human beings on a case-by-case basis, with the level of effort involved often having a significant negative impact on the generalisability and repeatability of studies, and thus on their overall scientific value. The general purpose, 'code mining' repositories and infrastructures that have emerged in recent years represent a significant step forward because they automate many software mining tasks at an ultra-large scale and allow researchers and practitioners to focus on defining the questions they would like to explore at an abstract level. However, they are currently limited to static analysis and data extraction techniques, and thus cannot support (i.e., help automate) any studies which involve the execution of software systems. This includes experimental validations of techniques and tools that hypothesise about the behaviour (i.e., semantics) of software, or data analysis and extraction techniques that aim to measure dynamic properties of software. In this thesis a platform called LASSO (Large-Scale Software Observatorium) is introduced that overcomes this limitation by automating the collection of dynamic (i.e., execution-based) information about software alongside static information. It features a single, ultra-large scale corpus of executable software systems created by amalgamating existing Open Source software repositories and a dedicated DSL for defining abstract selection and analysis pipelines. Its key innovations are integrated capabilities for searching for selecting software systems based on their exhibited behaviour and an 'arena' that allows their responses to software tests to be compared in a purely data-driven way. We call the platform a 'software observatorium' since it is a place where the behaviour of large numbers of software systems can be observed, analysed and compared

    Hybrid Database for XML Resource Management

    Get PDF
    Although XML has been used in software applications for a considerable amount of time, managing XML files is not a common skill in the realm of backend software design. This is primarily because JSON has become a more prevalent file format and is supported by numerous SQL and NoSQL databases. In this thesis, we will delve into the fundamentals and implementation of a web application that utilizes a hybrid database, with the goal of determining whether it is suitable for managing XML resources. Upon closer examination of the existing architecture, the client discovered a problem with upgrading their project. Further investigation revealed that the current approach of storing XML files in a single folder had serious flaws that could cause issues. As a result, a decision was made to revamp the entire web application, with hybrid databases being chosen as the preferred solution due to the application's XML storage concept. It is worth noting that there exists a type of database specifically designed for XML resources, known as native XML databases. However, the development team thoroughly reviewed all the requirements provided by the product owner, Niko Siltala, and assessed the compatibility of both native XML databases and hybrid databases for the new application. Based on our analysis, it was concluded that the hybrid database is the most suitable option for the project. The changes were successfully designed and implemented, and the development team determined that hybrid databases are a viable option for managing a significant number of XML file dependencies. There were no significant obstacles encountered that would hinder the use of this type of database. The advantages of using hybrid databases were observed, including streamlined XML file storage, the ability to mix XPATH/XQUERY in SQL queries, and simplified codebases

    Multidimensional framework for analysing next-generation sequencing data in a clinical diagnostic environment

    Get PDF
    Next-generation sequencing (NGS), also called massively parallel sequencing, is a high-throughput technology that allows the determination of the nucleotide sequences of entire or specific regions of the genome. The application of this technology in a clinical environment enables personalized diagnostics for patients, for instance, allowing the identification of variants that might cause a disease. In this sense, clinical diagnostic laboratories are responsible for providing a robust and appropriate workflow that enables the obtention of genomic information ready to be interpreted by a clinician. The Molecular Biology CORE Laboratory in the Hospital Clinic de Barcelona performs hundreds of analyses each year, providing service to several diagnostic laboratories. Be sides, with the increasing number of NGS applications in clinical diagnostics, the number of analyses is expected to keep growing in the following years. Quality data is generated from different sources in each of these NGS analyses, including laboratory procedures, DNA sequencing, and bioinformatics analyses. These quality data must be carefully evaluated and validated to ensure the results' reliability. Moreover, the accumulation of quality data from each analysis can be used to assess the performance of the laboratory and to identify potential sources of technical artefacts that might lower the quality of the experiments. Hence, a database is needed to store and manage quality data for easy accessibility and analysis over time. In this thesis, we aim to develop a data warehouse to analyze and monitor NGS quality data coming from different data sources. To do that, we will perform the following steps: 1) design a multidimensional data model to ensure that data will be efficiently stored; 2) data extraction from different sources; 3) database loading; 4) design a visualization tool to enable descriptive analyses of the quality data. The designed tool will allow the historical exploration of quality parameters, as well as the evaluation of an experiment's quality metrics compared to the rest. With this tool, we are enabling the identification of areas of improvement by discovering sources of variation that might affect the quality of clinical NGS data

    Framework for dependency analysis of software artifacts

    Get PDF
    Cílem této práce je seznámit se s komponentově orientovanými systémy, s reprezentací a analýzou grafových dat a s existujícími metodami a nástroji pro statickou analýzu komponentově orientovaných systémů, které jsou vyvíjeny na Katedře informatiky a výpočetní techniky Západočeské univerzity v Plzni. Na základě zjištěných poznatků je výsledkem této práce návrh a implementace frameworku s důrazem na podporu vývoje ve více programovacích jazycích a na schopnost zpracovávat velké datové sady. Vytvořený framework pak může sloužit pro podporu výzkumu komponentově orientovaných systémů. Autor této práce navrhuje zobecnění a rozšíření frameworku pro analýzu závislostí softwarových artefaktů, který byl vytvořen v rámci diplomové práce M. Hotovce. Model ukládání dat frameworku byl rovněž analyzován s důrazem na grafové databáze. Jako řešení pro ukládání dat byla nakonec zvolena databáze ArangoDB. Dále byla implementována knihovna s jádrem frameworku v jazyce Java, které umožňuje vývoj nástrojů frameworku. Výsledná návrhová rozhodnutí umožňují využití frameworku v širší škále případů použití, jako je například extrakce a verifikace kompatibility komponent, což bylo demonstrováno replikací této funkcionality v nástroji frameworku vytvořeném v rámci této práce.ObhájenoThis thesis aims to familiarize with the component-based systems, graph data representation and analysis and with existing methods and tools for static analysis of component-based systems which are being developed at the Department of Computer Science at the University of West Bohemia in Pilsen, Czech Republic. Based on the findings, the result of this thesis is a framework design and implementation with emphasis on support for development in multiple programming languages and on the ability to process large datasets. The created framework then can serve to support the research of the component-based systems. The author of this thesis proposes generalization and extension of the framework for software artifacts dependency analysis which has been created as a part of M. Hotovec's master's thesis. The framework data storage model has also been analyzed with emphasis on graph databases. ArangoDB database has been eventually chosen as a storage solution and a core library in Java has been implemented to allow the development of framework tools. The resulting design decisions allows the framework to be used in broader range of use cases such as components compatibility extraction and verification, which has been demonstrated by replicating this functionality in a framework tool created as a part of this thesis

    High Frequency Physiological Data Quality Modelling in the Intensive Care Unit

    Get PDF
    Intensive care medicine is a resource intense environment in which technical and clinical decision making relies on rapidly assimilating a huge amount of categorical and timeseries physiologic data. These signals are being presented at variable frequencies and of variable quality. Intensive care clinicians rely on high frequency measurements of the patient's physiologic state to assess critical illness and the response to therapies. Physiological waveforms have the potential to reveal details about the patient state in very fine resolution, and can assist, augment, or even automate decision making in intensive care. However, these high frequency time-series physiologic signals pose many challenges for modelling. These signals contain noise, artefacts, and systematic timing errors, all of which can impact the quality and accuracy of models being developed and the reproducibility of results. In this context, the central theme of this thesis is to model the process of data collection in an intensive care environment from a statistical, metrological, and biosignals engineering perspective with the aim of identifying, quantifying, and, where possible, correcting errors introduced by the data collection systems. Three different aspects of physiological measurement were explored in detail, namely measurement of blood oxygenation, measurement of blood pressure, and measurement of time. A literature review of sources of errors and uncertainty in timing systems used in intensive care units was undertaken. A signal alignment algorithm was developed and applied to approximately 34,000 patient-hours of simultaneously collected electroencephalography and physiological waveforms collected at the bedside using two different medical devices
    corecore