11,854 research outputs found

    Implementing a Portable Clinical NLP System with a Common Data Model - a Lisp Perspective

    Full text link
    This paper presents a Lisp architecture for a portable NLP system, termed LAPNLP, for processing clinical notes. LAPNLP integrates multiple standard, customized and in-house developed NLP tools. Our system facilitates portability across different institutions and data systems by incorporating an enriched Common Data Model (CDM) to standardize necessary data elements. It utilizes UMLS to perform domain adaptation when integrating generic domain NLP tools. It also features stand-off annotations that are specified by positional reference to the original document. We built an interval tree based search engine to efficiently query and retrieve the stand-off annotations by specifying positional requirements. We also developed a utility to convert an inline annotation format to stand-off annotations to enable the reuse of clinical text datasets with inline annotations. We experimented with our system on several NLP facilitated tasks including computational phenotyping for lymphoma patients and semantic relation extraction for clinical notes. These experiments showcased the broader applicability and utility of LAPNLP.Comment: 6 pages, accepted by IEEE BIBM 2018 as regular pape

    Operations on (ordered) interval sets

    Get PDF
    Intervals play an important role in various kinds of database-applications in practice, for example in historical, spatial, and temporal databases. As a consequence, there is a practical need for a clear and proper treatment of various useful operations on intervals and interval sets in a database context. However, the semantics of some important operations on interval sets are not always treated or not treated very clearly in the literature; e.g., often they are defined in an algorithmic rather than a declarative manner. Moreover, implementation proposals are often not as straightforward as they could be. This paper presents a declarative treatment of various operations on interval sets, also introducing some new notions (such as ordered interval sets, their visible points, and their surface). Then the paper formally ?links? such (mathematical) intervals to their database representations. Finally the paper provides straightforward translations from these formal database representations to standard SQL, without the need for SQL extensions.

    Interval Slopes as Numerical Abstract Domain for Floating-Point Variables

    Full text link
    The design of embedded control systems is mainly done with model-based tools such as Matlab/Simulink. Numerical simulation is the central technique of development and verification of such tools. Floating-point arithmetic, that is well-known to only provide approximated results, is omnipresent in this activity. In order to validate the behaviors of numerical simulations using abstract interpretation-based static analysis, we present, theoretically and with experiments, a new partially relational abstract domain dedicated to floating-point variables. It comes from interval expansion of non-linear functions using slopes and it is able to mimic all the behaviors of the floating-point arithmetic. Hence it is adapted to prove the absence of run-time errors or to analyze the numerical precision of embedded control systems

    ATLAS: A flexible and extensible architecture for linguistic annotation

    Full text link
    We describe a formal model for annotating linguistic artifacts, from which we derive an application programming interface (API) to a suite of tools for manipulating these annotations. The abstract logical model provides for a range of storage formats and promotes the reuse of tools that interact through this API. We focus first on ``Annotation Graphs,'' a graph model for annotations on linear signals (such as text and speech) indexed by intervals, for which efficient database storage and querying techniques are applicable. We note how a wide range of existing annotated corpora can be mapped to this annotation graph model. This model is then generalized to encompass a wider variety of linguistic ``signals,'' including both naturally occuring phenomena (as recorded in images, video, multi-modal interactions, etc.), as well as the derived resources that are increasingly important to the engineering of natural language processing systems (such as word lists, dictionaries, aligned bilingual corpora, etc.). We conclude with a review of the current efforts towards implementing key pieces of this architecture.Comment: 8 pages, 9 figure

    A Generic Approach to Supporting the Management of Computerised Clinical Guidelines and Protocols

    Get PDF
    Clinical guidelines or protocols (CGPs) are statements that are systematically developed for the purpose of guiding the clinician and the patient in making decisions about appropriate healthcare for specific clinical problems. Using CGPs is one of the most effective and proven ways to attaining improved quality, optimised resource utilisation, cost containment and reduced variation in healthcare practice. CGPs exist mainly as paper-based natural language statements, but are increasingly being computerised. Supporting computerised CGPs in a healthcare environment so that they are incorporated into the routine used daily by clinicians is complex and presents major information management challenges. This thesis contends that the management of computerised CGPs should incorporate their manipulation (operations and queries), in addition to their specification and execution, as part of a single unified management framework. The thesis applies modern advanced database technology to the task of managing computerised CGPs. The event-condition-action (ECA) rule paradigm is recognised to have a huge potential in supporting computerised CGPs. In this thesis, a unified generic framework, called SpEM and an approach, called MonCooS, were developed for enabling computerised CGPs, to be specified by using a specification language, called PLAN, which follows the ECA rule paradigm; executed by using a software mechanism based on the ECA mechanism within a modern database system, and manipulated by using a manipulation language, called TOPSQL. The MonCooS approach focuses on providing clinicians with assistance in monitoring and coordinating clinical interventions while leaving the reasoning task to domain experts. A proof-of-concepts system, TOPS, was developed to show that CGP management can be easily attained, within the SpEM framework, by using the MonCooS approach. TOPS is used to evaluate the framework and approach in a case study to manage a microalbuminuria protocol for diabetic patients. SpEM and MonCooS were found to be promising in supporting the full-scale management of information and knowledge for the computerised clinical protocol. Active capability within modern DBMS is still experiencing significant limitations in supporting some requirements of this application domain. These limitations lead to pointers for further improvements in database management system (DBMS) functionality for ECA rule support. The main contributions of this thesis are: a generic and unified framework for the management of CGPs; a general platform and an advanced software mechanism for the manipulation of information and knowledge in computerised CGPs; a requirement for further development of the active functionality within modern DBMS; and a case study for the computer-based management of microalbuminuria in diabetes patients

    A General Framework for Representing, Reasoning and Querying with Annotated Semantic Web Data

    Full text link
    We describe a generic framework for representing and reasoning with annotated Semantic Web data, a task becoming more important with the recent increased amount of inconsistent and non-reliable meta-data on the web. We formalise the annotated language, the corresponding deductive system and address the query answering problem. Previous contributions on specific RDF annotation domains are encompassed by our unified reasoning formalism as we show by instantiating it on (i) temporal, (ii) fuzzy, and (iii) provenance annotations. Moreover, we provide a generic method for combining multiple annotation domains allowing to represent, e.g. temporally-annotated fuzzy RDF. Furthermore, we address the development of a query language -- AnQL -- that is inspired by SPARQL, including several features of SPARQL 1.1 (subqueries, aggregates, assignment, solution modifiers) along with the formal definitions of their semantics

    Modeling temporal dimensions of semistructured data

    Get PDF
    In this paper we propose an approach to manage in a correct way valid time semantics for semistructured temporal clinical information. In particular, we use a graph-based data model to represent radiological clinical data, focusing on the patient model of the well known DICOM standard, and define the set of (graphical) constraints needed to guarantee that the history of the given application domain is consistent

    Four Lessons in Versatility or How Query Languages Adapt to the Web

    Get PDF
    Exposing not only human-centered information, but machine-processable data on the Web is one of the commonalities of recent Web trends. It has enabled a new kind of applications and businesses where the data is used in ways not foreseen by the data providers. Yet this exposition has fractured the Web into islands of data, each in different Web formats: Some providers choose XML, others RDF, again others JSON or OWL, for their data, even in similar domains. This fracturing stifles innovation as application builders have to cope not only with one Web stack (e.g., XML technology) but with several ones, each of considerable complexity. With Xcerpt we have developed a rule- and pattern based query language that aims to give shield application builders from much of this complexity: In a single query language XML and RDF data can be accessed, processed, combined, and re-published. Though the need for combined access to XML and RDF data has been recognized in previous work (including the W3C’s GRDDL), our approach differs in four main aspects: (1) We provide a single language (rather than two separate or embedded languages), thus minimizing the conceptual overhead of dealing with disparate data formats. (2) Both the declarative (logic-based) and the operational semantics are unified in that they apply for querying XML and RDF in the same way. (3) We show that the resulting query language can be implemented reusing traditional database technology, if desirable. Nevertheless, we also give a unified evaluation approach based on interval labelings of graphs that is at least as fast as existing approaches for tree-shaped XML data, yet provides linear time and space querying also for many RDF graphs. We believe that Web query languages are the right tool for declarative data access in Web applications and that Xcerpt is a significant step towards a more convenient, yet highly efficient data access in a “Web of Data”
    corecore