42 research outputs found

    Scalable Approaches for Auditing the Completeness of Biomedical Ontologies

    Get PDF
    An ontology provides a formalized representation of knowledge within a domain. In biomedicine, ontologies have been widely used in modern biomedical applications to enable semantic interoperability and facilitate data exchange. Given the important roles that biomedical ontologies play, quality issues such as incompleteness, if not addressed, can affect the quality of downstream ontology-driven applications. However, biomedical ontologies often have large sizes and complex structures. Thus, it is infeasible to uncover potential quality issues through manual effort. In this dissertation, we introduce automated and scalable approaches for auditing the completeness of biomedical ontologies. We mainly focus on two incompleteness issues -- missing hierarchical relations and missing concepts. To identify missing hierarchical relations, we develop three approaches: a lexical-based approach, a hybrid approach utilizing both lexical features and logical definitions, and an approach based on concept name transformation. To identify missing concepts, a lexical-based Formal Concept Analysis (FCA) method is proposed for concept enrichment. We also predict proper concept names for the missing concepts using deep learning techniques. Manual review by domain experts is performed to evaluate these approaches. In addition, we leverage extrinsic knowledge (i.e., external ontologies) to help validate the detected incompleteness issues. The auditing approaches have been applied to a variety of biomedical ontologies, including the SNOMED CT, National Cancer Institute (NCI) Thesaurus and Gene Ontology. In the first lexical-based approach to identify missing hierarchical relations, each concept is modeled with an enriched set of lexical features, leveraging words and noun phrases in the name of the concept itself and the concept\u27s ancestors. Given a pair of concepts that are not linked by a hierarchical relation, if the enriched lexical attributes of one concept is a superset of the other\u27s, a potentially missing hierarchical relation will be suggested. Applying this approach to the September 2017 release of SNOMED CT (US edition) suggested 38,615 potentially missing hierarchical relations. A domain expert reviewed a random sample of 100 potentially missing ones, and confirmed 90 are valid (a precision of 90%). In the second work, a hybrid approach is proposed to detect missing hierarchical relations in non-lattice subgraphs. For each concept, its lexical features are harmonized with role definitions to provide a more comprehensive semantic model. Then a two-step subsumption testing is performed to automatically suggest potentially missing hierarchical relations. This approach identified 55 potentially missing hierarchical relations in the 19.08d version of the NCI Thesaurus. 29 out of 55 were confirmed as valid by the curators from the NCI Enterprise Vocabulary Service (EVS) and have been incorporated in the newer versions of the NCI Thesaurus. 7 out of 55 further revealed incorrect existing hierarchical relations in the NCI Thesaurus. In the third work, we introduce a transformation-based method that leverages the Unified Medical Language System (UMLS) knowledge to identify missing hierarchical relations in its source ontologies. Given a concept name, noun chunks within it are identified and replaced by their more general counterparts to generate new concept names that are supposed to be more general than the original one. Applying this method to the UMLS (2019AB release), a total of 39,359 potentially missing hierarchical relations were detected in 13 source ontologies. Domain experts evaluated a random sample of 200 potentially missing hierarchical relations identified in the SNOMED CT (US edition), and 100 in the Gene Ontology. 173 out of 200 and 63 out of 100 potentially missing hierarchical relations were confirmed by domain experts, indicating our method achieved a precision of 86.5% and 63% for the SNOMED CT and Gene Ontology, respectively. In the work of concept enrichment, we introduce a lexical method based on FCA to identify potentially missing concepts. Lexical features (i.e., words appearing in the concept names) are considered as FCA attributes while generating formal context. Applying multistage intersection on FCA attributes results in newly formalized concepts along with bags of words that can be utilized to name the concepts. This method was applied to the Disease or Disorder sub-hierarchy in the 19.08d version of the NCI Thesaurus and identified 8,983 potentially missing concepts. We performed a preliminary evaluation and validated that 592 out of 8,983 potentially missing concepts were included in external ontologies in the UMLS. After obtaining new concepts and their relevant bags of words, we further developed deep learning-based approaches to automatically predict concept names that comply with the naming convention of a specific ontology. We explored simple neural network, Long Short-Term Memory (LSTM), and Convolutional Neural Network (CNN) combined with LSTM. Our experiments showed that the LSTM-based approach achieved the best performance with an F1 score of 63.41% for predicting names for newly added concepts in the March 2018 release of SNOMED CT (US Edition) and an F1 score of 73.95% for naming missing concepts revealed by our previous work. In the last part of this dissertation, extrinsic knowledge is leveraged to collect supporting evidence for the detected incompleteness issues. We present a work in which cross-ontology evaluation based on extrinsic knowledge from the UMLS is utilized to help validate potentially missing hierarchical relations, aiming at relieving the heavy workload of manual review

    Improving Pneumonia Classification and Lesion Detection Using Spatial Attention Superposition and Multilayer Feature Fusion

    Get PDF
    Pneumonia is a severe inflammation of the lung that could cause serious complications. Chest X-rays (CXRs) are commonly used to make a diagnosis of pneumonia. In this paper, we propose a deep-learning-based method with spatial attention superposition (SAS) and multilayer feature fusion (MFF) to facilitate pneumonia diagnosis based on CXRs. Specifically, an SAS module, which takes advantage of the channel and spatial attention mechanisms, was designed to identify intrinsic imaging features of pneumonia-related lesions and their locations, and an MFF module was designed to harmonize disparate features from different channels and emphasize important information. These two modules were concatenated to extract critical image features serving as the basis for pneumonia diagnosis. We further embedded the proposed modules into a baseline neural network and developed a model called SAS-MFF-YOLO to diagnose pneumonia. To validate the effectiveness of our model, extensive experiments were conducted on two CXR datasets provided by the Radiological Society of North America (RSNA) and the AI Research Institute. SAS-MFF-YOLO achieved a precision of 88.1%, a recall of 98.2% for pneumonia classification and an AP50 of 99% for lesion detection on the AI Research Institute dataset. The visualization of intermediate feature maps showed that our method could facilitate uncovering pneumonia-related lesions in CXRs. Our results demonstrated that our approach could be used to enhance the performance of the overall pneumonia detection on CXR imaging

    ESSM: An Extractive Summarization Model with Enhanced Spatial-Temporal Information and Span Mask Encoding

    Get PDF
    Extractive reading comprehension is to extract consecutive subsequences from a given article to answer the given question. Previous work often adopted Byte Pair Encoding (BPE) that could cause semantically correlated words to be separated. Also, previous features extraction strategy cannot effectively capture the global semantic information. In this paper, an extractive summarization model is proposed with enhanced spatial-temporal information and span mask encoding (ESSM) to promote global semantic information. ESSM utilizes Embedding Layer to reduce semantic segmentation of correlated words, and adopts TemporalConvNet Layer to relief the loss of feature information. The model can also deal with unanswerable questions. To verify the effectiveness of the model, experiments on datasets SQuAD1.1 and SQuAD2.0 are conducted. Our model achieved an EM of 86.31% and a F1 score of 92.49% on SQuAD1.1 and the numbers are 80.54% and 83.27% for SQuAD2.0. It was proved that the model is effective for extractive QA task

    A Policy Optimization Method Towards Optimal-time Stability

    Full text link
    In current model-free reinforcement learning (RL) algorithms, stability criteria based on sampling methods are commonly utilized to guide policy optimization. However, these criteria only guarantee the infinite-time convergence of the system's state to an equilibrium point, which leads to sub-optimality of the policy. In this paper, we propose a policy optimization technique incorporating sampling-based Lyapunov stability. Our approach enables the system's state to reach an equilibrium point within an optimal time and maintain stability thereafter, referred to as "optimal-time stability". To achieve this, we integrate the optimization method into the Actor-Critic framework, resulting in the development of the Adaptive Lyapunov-based Actor-Critic (ALAC) algorithm. Through evaluations conducted on ten robotic tasks, our approach outperforms previous studies significantly, effectively guiding the system to generate stable patterns.Comment: 27 pages, 11 figues. 7th Annual Conference on Robot Learning. 202

    Synthesis, Characterization, and Evaluation of a Novel Amphiphilic Polymer RGD-PEG-Chol for Target Drug Delivery System

    Get PDF
    An amphiphilic polymer RGD-PEG-Chol which can be produced in large scale at a very low cost has been synthesized successfully. The synthesized intermediates and final products were characterized and confirmed by 1H nuclear magnetic resonance spectrum (1H NMR) and Fourier transform infrared spectrum (FT-IR). The paclitaxel- (PTX-) loaded liposomes based on RGD-PEG-Chol were then prepared by film formation method. The liposomes had a size within 100 nm and significantly enhanced the cytotoxicity of paclitaxel to B16F10 cell as demonstrated by MTT test (IC50 = 0.079 μg/mL of RGD-modified PTX-loaded liposomes compared to 9.57 μg/mL of free PTX). Flow cytometry analysis revealed that the cellular uptake of coumarin encapsulated in the RGD-PEG-Chol modified liposome was increased for HUVEC cells. This work provides a reasonable, facile, and economic approach to prepare peptide-modified liposome materials with controllable performances and the obtained linear RGD-modified PTX-loaded liposomes might be attractive as a drug delivery system

    Conceptual design and progress of transmitting \sim MV DC HV into 4 K LHe detectors

    Full text link
    A dual-phase TPC (Time Projection Chamber) is more advanced in characterizing an event than a single-phase one because it can, in principle, reconstruct the 3D (X-Y-Z) image of the event, while a single-phase detector can only show a 2D (X-Y) picture. As a result, more enriched physics is expected for a dual-phase detector than a single-phase one. However, to build such a detector, DC HV (High Voltage) must be delivered into the chamber (to have a static electric field), which is a challenging task, especially for an LHe detector due to the extremely low temperature, \sim 4 K, and the very high voltage, \sim MV (Million Volts). This article introduces a convincing design for transmitting \sim MV DC into a 4 K LHe detector. We also report the progress of manufacturing a 100 kV DC feedthrough capable of working at 4 K. Surprisingly, we realized that the technology we developed here might be a valuable reference to the scientists and engineers aiming to build residential bases on the Moon or Mars

    Searching for ER and/or NR-like dark matter signals with the especially low background liquid helium TPCs

    Full text link
    In the Dark Matter (DM) direct detection community, the absence of convincing signals has become a ``new normal'' for decades. Among other possibilities, the ``new normal'' might indicate that DM-matter interactions could generate not only the hypothetical NR (Nuclear Recoil) events but also the ER (Electron Recoil) ones, which have often been tagged as backgrounds historically. Further, we argue that ER and NR-like DM signals could co-exist in a DM detector's same dataset. So in total, there would be three scenarios we can search for DM signals: (i) ER excess only, (ii) NR excess only, and (iii) ER and NR excesses combined. To effectively identify any possible DM signal under the three scenarios, a DM detector should (a) have the minimum ER and NR backgrounds and (b) be capable of discriminating ER events from NR ones. Accordingly, we introduce the newly established project, ALETHEIA, which implements liquid helium-filled TPCs (Time Projection Chamber) in hunting for DM. Thanks to the nearly single-digit number of ER and NR backgrounds on 1 ton*yr exposure, presumably, the ALETHEIA detectors should be able to identify any form of DM-induced excess in its ROI (Research Of Interest). As far as we know, ALETHEIA is the first DM direct detection experiment claiming such an inclusive search; conventional detectors search DM mainly on the ``ER excess only'' and/or the ``NR excess only'' channel, not the ``ER and NR excesses combined'' channel. In addition, we introduce a preliminary scheme to one of the most challenging R\&D tasks, transmitting 500+ kV into a 4 K LHe detector

    Enhancing Neural Text Detector Robustness with <i>μ</i>Attacking and RR-Training

    Get PDF
    With advanced neural network techniques, language models can generate content that looks genuinely created by humans. Such advanced progress benefits society in numerous ways. However, it may also bring us threats that we have not seen before. A neural text detector is a classification model that separates machine-generated text from human-written ones. Unfortunately, a pretrained neural text detector may be vulnerable to adversarial attack, aiming to fool the detector into making wrong classification decisions. Through this work, we propose μAttacking, a mutation-based general framework that can be used to evaluate the robustness of neural text detectors systematically. Our experiments demonstrate that μAttacking identifies the detector’s flaws effectively. Inspired by the insightful information revealed by μAttacking, we also propose an RR-training strategy, a straightforward but effective method to improve the robustness of neural text detectors through finetuning. Compared with the normal finetuning method, our experiments demonstrated that RR-training effectively increased the model robustness by up to 11.33% without increasing much effort when finetuning a neural text detector. We believe the μAttacking and RR-training are useful tools for developing and evaluating neural language models

    A New Method for De-Noising of Well Test Pressure Data Base on Legendre Approximation

    No full text
    In this paper, noise removing of the well test data is considered. We use the Legendre expansion to approximate well test data and a truncated strategy has been employed to reduce noise. The parameter of the truncation will be chosen by a discrepancy principle and a corresponding convergence result has been obtained. The theoretical analysis shows that a well numerical approximation can be obtained by the new method. Moreover, we can directly obtain the stable numerical derivatives of the pressure data in this method. Finally, we give some numerical tests to show the effectiveness of the method
    corecore