463,922 research outputs found

    LeafAI: query generator for clinical cohort discovery rivaling a human programmer

    Full text link
    Objective: Identifying study-eligible patients within clinical databases is a critical step in clinical research. However, accurate query design typically requires extensive technical and biomedical expertise. We sought to create a system capable of generating data model-agnostic queries while also providing novel logical reasoning capabilities for complex clinical trial eligibility criteria. Materials and Methods: The task of query creation from eligibility criteria requires solving several text-processing problems, including named entity recognition and relation extraction, sequence-to-sequence transformation, normalization, and reasoning. We incorporated hybrid deep learning and rule-based modules for these, as well as a knowledge base of the Unified Medical Language System (UMLS) and linked ontologies. To enable data-model agnostic query creation, we introduce a novel method for tagging database schema elements using UMLS concepts. To evaluate our system, called LeafAI, we compared the capability of LeafAI to a human database programmer to identify patients who had been enrolled in 8 clinical trials conducted at our institution. We measured performance by the number of actual enrolled patients matched by generated queries. Results: LeafAI matched a mean 43% of enrolled patients with 27,225 eligible across 8 clinical trials, compared to 27% matched and 14,587 eligible in queries by a human database programmer. The human programmer spent 26 total hours crafting queries compared to several minutes by LeafAI. Conclusions: Our work contributes a state-of-the-art data model-agnostic query generation system capable of conditional reasoning using a knowledge base. We demonstrate that LeafAI can rival a human programmer in finding patients eligible for clinical trials

    The design of a data model (DM) for managing durability index (DI) results for national road infrastructure

    Get PDF
    As part of a R 1.14 Billion 64-month concrete construction mega-project which began in May 2013, the Mt Edgecombe Interchange, comprising two incrementally launched bridges, the longest at 948 metres long and the other at 440 metres which joins uMhlanga and the N2 North, necessitates the demand to have adequate systems in place to measure durability compliance. Construction contracts of this nature exhibit thousands of test results that need to be assessed for variability, outliers and compliance for quality assurance in line with current performance-based specifications such as those contained in COTO (2018a; 2018b) derived from COLTO (1998) which requires judgement based on statistical principles. Since the inception of Durability Index (DI) performance-based specifications in 2008, over 12000 DI test results or determinations have accumulated within a repository at the University of Cape Town. As such, the performance-based approach in South Africa is now a decade into maturity and considerable amounts of actual site data are collected daily, and significant for refinements of the DI values in performance-based specifications, the long-term monitoring of Reinforced Concrete (RC) structures in a full-scale environment along with other research and development (R&D) initiatives. Data modelling can be defined as the process of designing a data model (DM) for data to be stored in a database. Commonly, a DM can be designated into three main types. A conceptual DM defines what the system contains; a logical DM defines how the system should be executed regardless of the Database Management System (DBMS); and a physical DM describes how the system will be executed using a specific DBMS system. The main objective of this study is to design a data model (DM) that is essentially a conceptual and logical representation of the physical database required to ensure durability compliance for RC structures. Database design principles are needed to execute a good database design and guide the entire process. Duplicate information or redundant data consume unnecessary storage as well as increase the probability of errors and inconsistencies. Therefore, the subdivision of the data within the conceptual data model (DM) into distinct groups or topics, which are broken down further into subject based tables, will help eliminate redundant data. The data contained within the database must be correct and complete. Incorrect or incomplete information will result in reports with mistakes and as such, any decisions based on the data will be misinformed. Therefore, the database must support and ensure the accuracy and integrity of the information as well as accommodate data processing and reporting requirements. An explanation and critique of the current durability specification has also been presented since information is required on how to join information in the database tables to create meaningful output. The conceptual data model (DM) established the basic concepts and the scope for the physical database through designing a modular structure or general layout for the database. This process established the entities or data objects (distinct groups), their attributes (properties of distinct groups) and their relationship (dependency of association between groups). The logical database design phase is divided into two main steps. In the first step, a data model (DM) is created to ensure minimal redundancy and capability for supporting user transactions. The output of this step is the creation of a logical data model (DM), which is a complete and accurate representation of the topics that are to be supported by the database. In the second step, the Entity Relationship Diagram (ERD) is mapped to a set of tables. The structure of each table is checked using normalization. Normalization is an effective means of ensuring that the tables are structurally consistent, logical, with minimal redundancy. The tables were also checked to ensure that they are capable of supporting the required transactions and the required integrity constraints on the database were defined The logical data model (DM) then added extra information to the conceptual data model (DM) elements through defining the database tables or basic information required for the physical database. This process established the structure of the data elements, set relationships between them and provided foundation to form the base for the physical database. A prototype is presented of the designed data model (DM) founded on 53 basic information database tables. The breakdown of database tables for the six modules is split according to references (1), concrete composition (13), execution (4), environment (7), specimens (2) and material tests (26). Correlations between different input parameters were identified which added further information to the logical data model (DM) elements by strengthening the relations between the topics. The extraction of information or output parameters according to specification limits was conducted through analysing data from five different projects which served as input for a total of 1054 DI test results or 4216 determinations. The results were used to conduct parametric studies on the DI values which predominantly affects concrete durability in RC structures. Lastly, a method is proposed using joint probability density functions of Durability Index (DI) test results and the achieved cover depth to calculate the probability that both random variables are out of specification limits

    Creating Responsive Information Systems with the Help of SSADM

    Get PDF
    In this paper, a program for a research is outlined. Firstly, the concept of responsive information systems is defined and then the notion of the capacity planning and software performance engineering is clarified. Secondly, the purpose of the proposed methodology of capacity planning, the interface to information systems analysis and development methodologies (SSADM), the advantage of knowledge-based approach is discussed. The interfaces to CASE tools more precisely to data dictionaries or repositories (IRDS) are examined in the context of a certain systems analysis and design methodology (e.g. SSADM)

    Multi-agent based simulation of self-governing knowledge commons

    No full text
    The potential of user-generated sensor data for participatory sensing has motivated the formation of organisations focused on the exploitation of collected information and associated knowledge. Given the power and value of both the raw data and the derived knowledge, we advocate an open approach to data and intellectual-property rights. By treating user-generated content as well as derived information and knowledge as a common-pool resource, we hypothesise that all participants can be compensated fairly for their input. To test this hypothesis, we undertake an extensive review of experimental, commercial and social participatory-sensing applications, from which we identify that a decentralised, community-oriented governance model is required to support this open approach. We show that the Institutional Analysis and Design framework as introduced by Elinor Ostrom, in conjunction with a framework for self-organising electronic institutions, can be used to give both an architectural and algorithmic base for the necessary governance model, in terms of operational and collective choice rules specified in computational logic. As a basis for understanding the effect of governance on these applications, we develop a testbed which joins our logical formulation of the knowledge commons with a generic model of the participatory-sensing problem. This requires a multi-agent platform for the simulation of autonomous and dynamic agents, and a method of executing the logical calculus in which our electronic institution is specified. To this end, firstly, we develop a general purpose, high performance platform for multi-agent based simulation, Presage2. Secondly, we propose a method for translating event-calculus axioms into rules compatible with business rule engines, and provide an implementation for JBoss Drools along with a suite of modules for electronic institutions. Through our simulations we show that, when building electronic institutions for managing participatory sensing as a knowledge commons, proper enfranchisement of agents (as outlined in Ostrom's work) is key to striking a balance between endurance, fairness and reduction of greedy behaviour. We conclude with a set of guidelines for engineering knowledge commons for the next generation of participatory-sensing applications.Open Acces

    Designing a customer data model and defining customer master data in a Finnish SaaS company

    Get PDF
    In this study a logical customer data model is designed and customer master data in the data model is defined for a case company. During the process of defining customer data and customer itself, the business glossary of a customer is defined to have clear definitions of a customer and to unify the vocabulary across the case company. Defining the important vocabulary ensures the base to define the customer data and customer master data. In addition, quality aspects are studied or ensuring high-quality customer data in the future. This study aims to understand what customer data in the case company is and model it to a logical data model to unify siloed operations, systems, and data. The case company is a Finnish Software as a Service company. It is in the middle of a merging process due to recent company acquisitions. The case company wants to have common customer data and customer master data. The case company does not have master data defined. It is important to identify, which data is critical to the business so that the case company can have the one truth and the development activities can be targeted into the right direction to ensure the most advantage. The research method of this study is design research. The empirical part of this study is done by workshops, and there are two rounds of workshops. First round analyses the current situation based on the processes and different functions in the company, that are working with customer data. The outcome of the first workshop round is customer terminology and its definitions and the customer data model. The second round concentrates on iterative development of the terminology and customer data model, and further identifying development needs, restrictions, and possibilities of having a common customer data model and master data. After the workshops,the terminology and data model are developed with internal experts. Lastly, there is a review event, where the participants get to see and comment the designed customer data model and the identified customer master data. After this study, the case company has a clear definition of what is a customer, and how it should be modeled in a logical data model in the future to have one common customer data structure to unify the case company. Also, the case company has the most important, necessary, common customer data, the customer master data defined. The next step after this study is to plan the implementation of the customer data designs of this study, taking into account the quality principles, that were defined in this study to support the sustainability of the designs.Tässä tutkimuksessa suunnitellaan kohdeyritykselle asiakastiedon looginen tietomalli ja määritellään asiakkaan ydintieto. Prosessin aikana määritellään, mikä on asiakas ja mitä on asiakastieto, ja sitä myötä tehdään sanasto asiakkaaseen liittyvistä termeistä. Sanaston on tarkoitus selkeyttää ja yhdistää asiakkaaseen liittyvää sanastoa kohdeyrityksessä. Tärkeän sanaston määrittely mahdollistaa asiakastiedon sekä asiakastiedon ydintiedon määrittelyn. Lisäksi tässä tutkimuksessa tutkitaan, mitä täytyy ottaa huomioon, jotta tulevaisuudessa asiakastiedot ovat korkealaatuisia. Tämä tutkimus pyrkii ymmärtämään, mitä asiakastieto on kohdeyritykselle, ja mallintaa sen loogiseksi tietomalliksi, joka yhdistäisi siiloutuneita operaatioita, systeemejä ja dataa. Kohdeyritys on suomalainen ohjelmistopalveluita tarjoava yritys. Kohdeyritys on tehnyt lähimenneisyydessä yritysostoja, ja on nyt keskellä yhdistysmisprosessia. Kohdeyrityksellä ei ole yhteistä määriteltyä ydintietoa. On tärkeää tunnistaa, mikä tieto on kriittistä yritykselle, jotta yrityksellä olisi yksi yhteinen totuus asiakastiedoista, ja kehityshankkeet voitaisiin kohdistaa oikein, jotta voidaan taata suurin hyöty. Tämän tutkimuksen tutkimusmetodi on suunnittelututkimus. Tutkimuksen empiirinen osa suoritetaan työpajojen avulla, ja ne järjestetään kahdessa kierroksessa. Ensimmäinen kierros analysoi nykytilannetta prosessien ja eri yrityksen toimintojen kautta, jotka ovat asiakastietojen kanssa tekemisissä. Ensimmäisen työpajakierroksen tuloksena muodostetaan terminologia asiakkaasta ja asiakkaan tietomalli. Toinen työpajakierros keskittyy ensimmäisen työpajakierroksen tulosten iteratiivisen kehittämiseen sekä tunnistamaan mahdollisuuksia, rajoitteita ja haasteita, jotka liittyvät siihen, että yrityksellä olisi yhteinen asiakastiedon tietomalli ja ydintieto. Työpajojen jälkeen terminologiaa ja tietomallia kehitetään yrityksen sisäisten asiantuntijoiden kanssa. Tutkimuksen viimeisessä vaiheessa järjestetään tilaisuus, jossa esitellään ja käydään läpi suunniteltu tietomalli ja määritelty ydintieto, ja osallistujilla on mahdollisuus kommentoida tuloksia, ja kommenttien perusteella tehdään viimeisiä pieniä tarkennuksia. Tämän tutkimuksen jälkeen yrityksellä on selkeä määritelmä siitä, mikä on asiakas, ja kuinka asiakastiedot tulisi mallintaa loogiseksi tietomalliksi, jotta kohdeyrityksellä voisi olla yhtenäinen asiakastiedon rakenne, joka yhtenäistäisi kohdeyritystä. Sen lisäksi yrityksellä on määriteltynä asiakastiedon tärkein, välttämättömin, yhteinen ydintieto. Tämän tutkimuksen jälkeen seuraava askel on suunnitella luodun mallin implementointi ottaen huomioon laatuun liittyvät periaatteet, jotka määriteltiin tässä tutkimuksessa tukemaan suunnitelman kestävyyttä

    Predicting Network Attacks Using Ontology-Driven Inference

    Full text link
    Graph knowledge models and ontologies are very powerful modeling and re asoning tools. We propose an effective approach to model network attacks and attack prediction which plays important roles in security management. The goals of this study are: First we model network attacks, their prerequisites and consequences using knowledge representation methods in order to provide description logic reasoning and inference over attack domain concepts. And secondly, we propose an ontology-based system which predicts potential attacks using inference and observing information which provided by sensory inputs. We generate our ontology and evaluate corresponding methods using CAPEC, CWE, and CVE hierarchical datasets. Results from experiments show significant capability improvements comparing to traditional hierarchical and relational models. Proposed method also reduces false alarms and improves intrusion detection effectiveness.Comment: 9 page

    Modeling of Traceability Information System for Material Flow Control Data.

    Get PDF
    This paper focuses on data modeling for traceability of material/work flow in information layer of manufacturing control system. The model is able to trace all associated data throughout the product manufacturing from order to final product. Dynamic data processing of Quality and Purchase activities are considered in data modeling as well as Order and Operation base on lots particulars. The modeling consisted of four steps and integrated as one final model. Entity-Relationships Modeling as data modeling methodology is proposed. The model is reengineered with Toad Data Modeler software in physical modeling step. The developed model promises to handle fundamental issues of a traceability system effectively. It supports for customization and real-time control of material in flow in all levels of manufacturing processes. Through enhanced visibility and dynamic store/retrieval of data, all traceability usages and applications is responded. Designed solution is initially applicable as reference data model in identical lot-base traceability system
    corecore