61,027 research outputs found

    Security and confidentiality approach for the Clinical E-Science Framework (CLEF)

    Get PDF
    CLEF is an MRC sponsored project in the E-Science programme that aims to establish policies and infrastructure for the next generation of integrated clinical and bioscience research. One of the major goals of the project is to provide a pseudonymised repository of histories of cancer patients that can be accessed by researchers. Robust mechanisms and policies are needed to ensure that patient privacy and confidentiality are preserved while delivering a repository of such medically rich information for the purposes of scientific research. This paper summarises the overall approach adopted by CLEF to meet data protection requirements, including the data flows and pseudonymisation mechanisms that are currently being developed. Intended constraints and monitoring policies that will apply to research interrogation of the repository are also outlined. Once evaluated, it is hoped that the CLEF approach can serve as a model for other distributed electronic health record repositories to be accessed for research

    On Quantifying Qualitative Geospatial Data: A Probabilistic Approach

    Full text link
    Living in the era of data deluge, we have witnessed a web content explosion, largely due to the massive availability of User-Generated Content (UGC). In this work, we specifically consider the problem of geospatial information extraction and representation, where one can exploit diverse sources of information (such as image and audio data, text data, etc), going beyond traditional volunteered geographic information. Our ambition is to include available narrative information in an effort to better explain geospatial relationships: with spatial reasoning being a basic form of human cognition, narratives expressing such experiences typically contain qualitative spatial data, i.e., spatial objects and spatial relationships. To this end, we formulate a quantitative approach for the representation of qualitative spatial relations extracted from UGC in the form of texts. The proposed method quantifies such relations based on multiple text observations. Such observations provide distance and orientation features which are utilized by a greedy Expectation Maximization-based (EM) algorithm to infer a probability distribution over predefined spatial relationships; the latter represent the quantified relationships under user-defined probabilistic assumptions. We evaluate the applicability and quality of the proposed approach using real UGC data originating from an actual travel blog text corpus. To verify the quality of the result, we generate grid-based maps visualizing the spatial extent of the various relations

    Rhetorical structure and reader manipulation in Agatha Christie's <i>Murder on the Orient Express</i>

    Get PDF
    This paper describes Agatha Christie’s use of rhetoric to convince readers of the ‘truth’ of her detective’s solution in The Murder on the Orient Express, and uses an adaptation of Rhetorical Structure Theory (RST) designed for analyses of long extracts of a narrative text. The paper aims to demonstrate firstly the rhetorical practice of Christie, and secondly to demonstrate a tabular, non-diagrammatic exposition of RST, with some suggestions for future alterations to this method

    Implementing a Portable Clinical NLP System with a Common Data Model - a Lisp Perspective

    Full text link
    This paper presents a Lisp architecture for a portable NLP system, termed LAPNLP, for processing clinical notes. LAPNLP integrates multiple standard, customized and in-house developed NLP tools. Our system facilitates portability across different institutions and data systems by incorporating an enriched Common Data Model (CDM) to standardize necessary data elements. It utilizes UMLS to perform domain adaptation when integrating generic domain NLP tools. It also features stand-off annotations that are specified by positional reference to the original document. We built an interval tree based search engine to efficiently query and retrieve the stand-off annotations by specifying positional requirements. We also developed a utility to convert an inline annotation format to stand-off annotations to enable the reuse of clinical text datasets with inline annotations. We experimented with our system on several NLP facilitated tasks including computational phenotyping for lymphoma patients and semantic relation extraction for clinical notes. These experiments showcased the broader applicability and utility of LAPNLP.Comment: 6 pages, accepted by IEEE BIBM 2018 as regular pape

    Knowledge will Propel Machine Understanding of Content: Extrapolating from Current Examples

    Full text link
    Machine Learning has been a big success story during the AI resurgence. One particular stand out success relates to learning from a massive amount of data. In spite of early assertions of the unreasonable effectiveness of data, there is increasing recognition for utilizing knowledge whenever it is available or can be created purposefully. In this paper, we discuss the indispensable role of knowledge for deeper understanding of content where (i) large amounts of training data are unavailable, (ii) the objects to be recognized are complex, (e.g., implicit entities and highly subjective content), and (iii) applications need to use complementary or related data in multiple modalities/media. What brings us to the cusp of rapid progress is our ability to (a) create relevant and reliable knowledge and (b) carefully exploit knowledge to enhance ML/NLP techniques. Using diverse examples, we seek to foretell unprecedented progress in our ability for deeper understanding and exploitation of multimodal data and continued incorporation of knowledge in learning techniques.Comment: Pre-print of the paper accepted at 2017 IEEE/WIC/ACM International Conference on Web Intelligence (WI). arXiv admin note: substantial text overlap with arXiv:1610.0770
    • …
    corecore