6,031 research outputs found

    Rationale in Development Chat Messages: An Exploratory Study

    Full text link
    Chat messages of development teams play an increasingly significant role in software development, having replaced emails in some cases. Chat messages contain information about discussed issues, considered alternatives and argumentation leading to the decisions made during software development. These elements, defined as rationale, are invaluable during software evolution for documenting and reusing development knowledge. Rationale is also essential for coping with changes and for effective maintenance of the software system. However, exploiting the rationale hidden in the chat messages is challenging due to the high volume of unstructured messages covering a wide range of topics. This work presents the results of an exploratory study examining the frequency of rationale in chat messages, the completeness of the available rationale and the potential of automatic techniques for rationale extraction. For this purpose, we apply content analysis and machine learning techniques on more than 8,700 chat messages from three software development projects. Our results show that chat messages are a rich source of rationale and that machine learning is a promising technique for detecting rationale and identifying different rationale elements.Comment: 11 pages, 6 figures. The 14th International Conference on Mining Software Repositories (MSR'17

    Automatic extraction of robotic surgery actions from text and kinematic data

    Get PDF
    The latest generation of robotic systems is becoming increasingly autonomous due to technological advancements and artificial intelligence. The medical field, particularly surgery, is also interested in these technologies because automation would benefit surgeons and patients. While the research community is active in this direction, commercial surgical robots do not currently operate autonomously due to the risks involved in dealing with human patients: it is still considered safer to rely on human surgeons' intelligence for decision-making issues. This means that robots must possess human-like intelligence, including various reasoning capabilities and extensive knowledge, to become more autonomous and credible. As demonstrated by current research in the field, indeed, one of the most critical aspects in developing autonomous systems is the acquisition and management of knowledge. In particular, a surgical robot must base its actions on solid procedural surgical knowledge to operate autonomously, safely, and expertly. This thesis investigates different possibilities for automatically extracting and managing knowledge from text and kinematic data. In the first part, we investigated the possibility of extracting procedural surgical knowledge from real intervention descriptions available in textbooks and academic papers on the robotic-surgical domains, by exploiting Transformer-based pre-trained language models. In particular, we released SurgicBERTa, a RoBERTa-based pre-trained language model for surgical literature understanding. It has been used to detect procedural sentences in books and extract procedural elements from them. Then, with some use cases, we explored the possibilities of translating written instructions into logical rules usable for robotic planning. Since not all the knowledge required for automatizing a procedure is written in texts, we introduce the concept of surgical commonsense, showing how it relates to different autonomy levels. In the second part of the thesis, we analyzed surgical procedures from a lower granularity level, showing how each surgical gesture is associated with a given combination of kinematic data

    Provider-specific quality measurement for ERCP using natural language processing

    Get PDF
    Background and Aims Natural language processing (NLP) is an information retrieval technique that has been shown to accurately identify quality measures for colonoscopy. There are no systematic methods by which to track adherence to quality measures for ERCP, the highest risk endoscopic procedure widely used in practice. Our aim was to demonstrate the feasibility of using NLP to measure adherence to ERCP quality indicators across individual providers. Methods ERCPs performed by 6 providers at a single institution from 2006 to 2014 were identified. Quality measures were defined using society guidelines and from expert opinion, and then extracted using a combination of NLP and data mining (eg, ICD9-CM codes). Validation for each quality measure was performed by manual record review. Quality measures were grouped into preprocedure (5), intraprocedure (6), and postprocedure (2). NLP was evaluated using measures of precision and accuracy. Results A total of 23,674 ERCPs were analyzed (average patient age, 52.9 ± 17.8 years, 14,113 were women [59.6%]). Among 13 quality measures, precision of NLP ranged from 84% to 100% with intraprocedure measures having lower precision (84% for precut sphincterotomy). Accuracy of NLP ranged from 90% to 100% with intraprocedure measures having lower accuracy (90% for pancreatic stent placement). Conclusions NLP in conjunction with data mining facilitates individualized tracking of ERCP providers for quality metrics without the need for manual medical record review. Incorporation of these tools across multiple centers may permit tracking of ERCP quality measures through national registries

    Concealment and Discovery: The Role of Information Security in Biomedical Data Re-Use

    Get PDF
    This paper analyses the role of information security (IS) in shaping the dissemination and re-use of biomedical data, as well as the embedding of such data in the material, social and regulatory landscapes of research. We consider the data management practices adopted by two UK-based data linkage infrastructures: the Secure Anonymised Information Linkage, a Welsh databank that facilitates appropriate re-use of health data derived from research and routine medical practice in the region; and the Medical and Environmental Data Mash-up Infrastructure, a project bringing together researchers from the University of Exeter, the London School of Hygiene and Tropical Medicine, the Met Office and Public Health England to link and analyse complex meteorological, environmental and epidemiological data. Through an in-depth analysis of how data are sourced, processed and analysed in these two cases, we show that IS takes two distinct forms: epistemic IS, focused on protecting the reliability and reusability of data as they move across platforms and research contexts; and infrastructural IS, concerned with protecting data from external attacks, mishandling and use disruption. These two dimensions are intertwined and mutually constitutive, and yet are often perceived by researchers as being in tension with each other. We discuss how such tensions emerge when the two dimensions of IS are operationalised in ways that put them at cross purpose with each other, thus exemplifying the vulnerability of data management strategies to broader governance and technological regimes. We also show that whenever biomedical researchers manage to overcome the conflict, the interplay between epistemic and infrastructural IS prompts critical questions concerning data sources, formats, metadata and potential uses, resulting in an improved understanding of the wider context of research and the development of relevant resources. This informs and significantly improves the re-usability of biomedical data, while encouraging exploratory analyses of secondary data sources

    Standardizing New Diagnostic Tests to Facilitate Rapid Responses to The Covid-19 Pandemic

    Get PDF
    In order to enhance the data interoperability, an expeditious and accurate standardization solution is highly desirable for naming rapidly emerging novel lab tests, and thus diminishes confusion in early responses to pandemic outbreaks. This is a preliminary study to explore the roles and implementation of medical informatics technology, especially natural language processing and ontology methods, in standardizing information about emerging lab tests during a pandemic, thereby facilitating rapid responses to the pandemic. The ultimate goal of this study is to develop an informatics framework for rapid standardization of lab testing names during a pandemic to better prepare for future public health threats. We first constructed an information model for lab tests approved during the COVID-19 pandemic and built a named entity recognition tool that can automatically extract lab test information specified in the information model from the Emergency Use Authorization(EUA)documents of the U.S. Food and Drug Administration (FDA), thus creating a catalog of approved lab tests with detailed information. To facilitate the standardization of lab testing data in electronic health records, we further developed the COVID-19 TestNorm, a tool that normalizes the names of various COVID-19 lab testing used by different healthcare facilities into standard Logical Observation Identifiers Names and Codes (LOINC). The overall accuracy of COVID-19 TestNorm on the development set was 98.9%, and on the independent test set was 97.4%. Lastly, we conducted a clinical study on COVID-19 re-positivity to demonstrate the utility of standardized lab test information in supporting clinical research. We believe that the result of my study indicates great a potential of medical informatics technologies for facilitating rapid responses to both current and future pandemics

    PD-L1 testing for lung cancer in the UK: recognizing the challenges for implementation.

    Get PDF
    A new approach to the management of non-small-cell lung cancer (NSCLC) has recently emerged that works by manipulating the immune checkpoint controlled by programmed death receptor 1 (PD-1) and its ligand programmed death ligand 1 (PD-L1). Several drugs targeting PD-1 (pembrolizumab and nivolumab) or PD-L1 (atezolizumab, durvalumab, and avelumab) have been approved or are in the late stages of development. Inevitably, the introduction of these drugs will put pressure on healthcare systems, and there is a need to stratify patients to identify those who are most likely to benefit from such treatment. There is evidence that responsiveness to PD-1 inhibitors may be predicted by expression of PD-L1 on neoplastic cells. Hence, there is considerable interest in using PD-L1 immunohistochemical staining to guide the use of PD-1-targeted treatments in patients with NSCLC. This article reviews the current knowledge about PD-L1 testing, and identifies current research requirements. Key factors to consider include the source and timing of sample collection, pre-analytical steps (sample tracking, fixation, tissue processing, sectioning, and tissue prioritization), analytical decisions (choice of biomarker assay/kit and automated staining platform, with verification of standardized assays or validation of laboratory-devised techniques, internal and external quality assurance, and audit), and reporting and interpretation of the results. This review addresses the need for integration of PD-L1 immunohistochemistry with other tests as part of locally agreed pathways and protocols. There remain areas of uncertainty, and guidance should be updated regularly as new information becomes available

    Concealment and discovery: the role of information security in biomedical data re-use

    Get PDF
    This is the author accepted manuscript. The final version is available from SAGE Publications via the DOI in this record.This paper analyses the role of information security (IS) in shaping the dissemination and re-use of biomedical data, as well as the embedding of such data in the material, social and regulatory landscapes of research. We consider the data management practices adopted by two UK-based data linkage infrastructures: the Secure Anonymised Information Linkage, a Welsh databank that facilitates appropriate re-use of health data derived from research and routine medical practice in the region; and the Medical and Environmental Data Mash-up Infrastructure, a project bringing together researchers from the University of Exeter, the London School of Hygiene and Tropical Medicine, the Met Office and Public Health England to link and analyse complex meteorological, environmental and epidemiological data. Through an in-depth analysis of how data are sourced, processed and analysed in these two cases, we show that IS takes two distinct forms: epistemic IS, focused on protecting the reliability and reusability of data as they move across platforms and research contexts; and infrastructural IS, concerned with protecting data from external attacks, mishandling and use disruption. These two dimensions are intertwined and mutually constitutive, and yet are often perceived by researchers as being in tension with each other. We discuss how such tensions emerge when the two dimensions of IS are operationalised in ways that put them at cross purpose with each other, thus exemplifying the vulnerability of data management strategies to broader governance and technological regimes. We also show that whenever biomedical researchers manage to overcome the conflict, the interplay between epistemic and infrastructural IS prompts critical questions concerning data sources, formats, metadata and potential uses, resulting in an improved understanding of the wider context of research and the development of relevant resources. This informs and significantly improves the re-usability of biomedical data, while encouraging exploratory analyses of secondary data sources.This research was funded by ERC grant award 335925 (DATA_SCIENCE), the Australian Research Council (Discovery Project DP160102989) and a MEDMI pilot project funded through MEDMI by MRC and NERC (MR/K019341/1)
    corecore