81 research outputs found

    Competence-based Multimodal Curriculum Learning for Medical Report Generation

    Full text link
    Medical report generation task, which targets to produce long and coherent descriptions of medical images, has attracted growing research interests recently. Different from the general image captioning tasks, medical report generation is more challenging for data-driven neural models. This is mainly due to 1) the serious data bias and 2) the limited medical data. To alleviate the data bias and make best use of available data, we propose a Competence-based Multimodal Curriculum Learning framework (CMCL). Specifically, CMCL simulates the learning process of radiologists and optimizes the model in a step by step manner. Firstly, CMCL estimates the difficulty of each training instance and evaluates the competence of current model; Secondly, CMCL selects the most suitable batch of training instances considering current model competence. By iterating above two steps, CMCL can gradually improve the model's performance. The experiments on the public IU-Xray and MIMIC-CXR datasets show that CMCL can be incorporated into existing models to improve their performance.Comment: Accepted by ACL 2021 (Oral

    Aligning Source Visual and Target Language Domains for Unpaired Video Captioning

    Full text link
    Training supervised video captioning model requires coupled video-caption pairs. However, for many targeted languages, sufficient paired data are not available. To this end, we introduce the unpaired video captioning task aiming to train models without coupled video-caption pairs in target language. To solve the task, a natural choice is to employ a two-step pipeline system: first utilizing video-to-pivot captioning model to generate captions in pivot language and then utilizing pivot-to-target translation model to translate the pivot captions to the target language. However, in such a pipeline system, 1) visual information cannot reach the translation model, generating visual irrelevant target captions; 2) the errors in the generated pivot captions will be propagated to the translation model, resulting in disfluent target captions. To address these problems, we propose the Unpaired Video Captioning with Visual Injection system (UVC-VI). UVC-VI first introduces the Visual Injection Module (VIM), which aligns source visual and target language domains to inject the source visual information into the target language domain. Meanwhile, VIM directly connects the encoder of the video-to-pivot model and the decoder of the pivot-to-target model, allowing end-to-end inference by completely skipping the generation of pivot captions. To enhance the cross-modality injection of the VIM, UVC-VI further introduces a pluggable video encoder, i.e., Multimodal Collaborative Encoder (MCE). The experiments show that UVC-VI outperforms pipeline systems and exceeds several supervised systems. Furthermore, equipping existing supervised systems with our MCE can achieve 4% and 7% relative margins on the CIDEr scores to current state-of-the-art models on the benchmark MSVD and MSR-VTT datasets, respectively.Comment: Published at IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI

    Contrastive Attention for Automatic Chest X-ray Report Generation

    Full text link
    Recently, chest X-ray report generation, which aims to automatically generate descriptions of given chest X-ray images, has received growing research interests. The key challenge of chest X-ray report generation is to accurately capture and describe the abnormal regions. In most cases, the normal regions dominate the entire chest X-ray image, and the corresponding descriptions of these normal regions dominate the final report. Due to such data bias, learning-based models may fail to attend to abnormal regions. In this work, to effectively capture and describe abnormal regions, we propose the Contrastive Attention (CA) model. Instead of solely focusing on the current input image, the CA model compares the current input image with normal images to distill the contrastive information. The acquired contrastive information can better represent the visual features of abnormal regions. According to the experiments on the public IU-X-ray and MIMIC-CXR datasets, incorporating our CA into several existing models can boost their performance across most metrics. In addition, according to the analysis, the CA model can help existing models better attend to the abnormal regions and provide more accurate descriptions which are crucial for an interpretable diagnosis. Specifically, we achieve the state-of-the-art results on the two public datasets.Comment: Appear in Findings of ACL 2021 (The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021)

    Arsenic speciation in saliva of acute promyelocytic leukemia patients undergoing arsenic trioxide treatment

    Get PDF
    Arsenic trioxide has been successfully used as a therapeutic in the treatment of acute promyelocytic leukemia (APL). Detailed monitoring of the therapeutic arsenic and its metabolites in various accessible specimens of APL patients can contribute to improving treatment efficacy and minimizing arsenic-induced side effects. This article focuses on the determination of arsenic species in saliva samples from APL patients undergoing arsenic treatment. Saliva samples were collected from nine APL patients over three consecutive days. The patients received 10 mg arsenic trioxide each day via intravenous infusion. The saliva samples were analyzed using high-performance liquid chromatography coupled with inductively coupled plasma mass spectrometry. Monomethylarsonous acid and monomethylmonothioarsonic acid were identified along with arsenite, dimethylarsinic acid, monomethylarsonic acid, and arsenate. Arsenite was the predominant arsenic species, accounting for 71.8 % of total arsenic in the saliva. Following the arsenic infusion each day, the percentage of methylated arsenicals significantly decreased, possibly suggesting that the arsenic methylation process was saturated by the high doses immediately after the arsenic infusion. The temporal profiles of arsenic species in saliva following each arsenic infusion over 3 days have provided information on arsenic exposure, metabolism, and excretion. These results suggest that saliva can be used as an appropriate clinical biomarker for monitoring arsenic species in APL patients. [Figure: see text

    Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations

    Full text link
    Most video-and-language representation learning approaches employ contrastive learning, e.g., CLIP, to project the video and text features into a common latent space according to the semantic similarities of text-video pairs. However, such learned shared latent spaces are not often optimal, and the modality gap between visual and textual representation can not be fully eliminated. In this paper, we propose Expectation-Maximization Contrastive Learning (EMCL) to learn compact video-and-language representations. Specifically, we use the Expectation-Maximization algorithm to find a compact set of bases for the latent space, where the features could be concisely represented as the linear combinations of these bases. Such feature decomposition of video-and-language representations reduces the rank of the latent space, resulting in increased representing power for the semantics. Extensive experiments on three benchmark text-video retrieval datasets prove that our EMCL can learn more discriminative video-and-language representations than previous methods, and significantly outperform previous state-of-the-art methods across all metrics. More encouragingly, the proposed method can be applied to boost the performance of existing approaches either as a jointly training layer or an out-of-the-box inference module with no extra training, making it easy to be incorporated into any existing methods.Comment: Accepted to NeurIPS 202

    Case report: Treatment of two cases of recurrent/refractory early T-cell precursor acute lymphoblastic leukemia with venetoclax combined with the CAG regimen

    Get PDF
    Early T-cell precursor acute lymphoblastic leukemia (ETP-ALL) is a highly aggressive subtype of T-ALL. No standard chemotherapy regimen exists for patients with recurrent/refractory (R/R) ETP-ALL; in these patients, the primary goal of salvage therapy is to achieve remission as a foundation for consolidation and intensification treatments. This study reports cases of two patients with R/R ETP-ALL who underwent salvage therapy of venetoclax combined with the CAG regimen and achieved complete remission in the bone marrow. Flow cytometry results were negative for minimal residual disease. Both patients were bridged to allogeneic hematopoietic stem cell transplantation (HSCT) and in complete remission over a 3-year follow-up period. These cases show that the use of venetoclax combined with the CAG regimen may offer patients with R/R ETP-ALL an opportunity for allogeneic HSCT

    Case Report: Sequential Chemotherapy and Immunotherapy Produce Sustained Response in Osteosarcoma With High Tumor Mutational Burden

    Get PDF
    BackgroundImmunotherapy has provided an effective method for the treatment of many cancers. However, its efficacy in osteosarcoma is not satisfactory so far.Case PresentationHere, we presented a case of osteosarcoma treated with sequential chemotherapy and immunotherapy and showed promising therapeutic potential. The 29-year-old female patient presented 9th rib osteosarcoma with suspected right lung lower lobe metastasis. Surgery was performed to remove the primary lesion, and a series of chemotherapies were given afterward in consideration of the response and tolerance. The right lung lower lobe metastasis was under control first but progressed (PD) 9 months after the initiation of therapy. The lesion was surgically removed and subsequent chemotherapy was implemented. The patient had good tolerance with chemotherapy and maintained well for approximately 11 months before the discovery of 11th rib and right lung upper lobe metastases. Surgery was then performed on both lesions and achieved complete response. Post-surgical brief chemotherapy and subsequent long-term immunotherapy (pembrolizumab) maintained continuous remission for 33 months. The patient survived for 60 months with well-controlled disease from the time of confirmed diagnosis. Genetic alterations of all primary and metastatic lesions were investigated by whole-exome sequencing (WES). Substantial similarity in mutational landscape between the primary lesion and 11th rib metastasis and between the two lung metastases were revealed, while substantial heterogeneity was found between the rib lesions and lung metastases. The tumor mutational burden (TMB) for the 9th rib primary lesion, the metastatic 11th rib lesion, and the metastatic right upper and lower lobe nodule tissues was 8.02, 2.38, 4.61, and 0.14 mutations/Mb, respectively. The primary lesion exhibited the most diverse copy number variation (CNV) changes among all lesions. Furthermore, pathway enrichment analysis also suggested significant heterogeneity among the lesions.ConclusionsSurgery with sequential chemotherapy and maintenance immunotherapy was shown to have good response for the first time on osteosarcoma patient who had high TMB tumor lesions and good tolerance for chemotherapy and immunotherapy

    Femtosecond gas phase electron diffraction with MeV electrons

    Get PDF
    We present results on ultrafast gas electron diffraction (UGED) experiments with femtosecond resolution using the MeV electron gun at SLAC National Accelerator Laboratory. UGED is a promising method to investigate molecular dynamics in the gas phase because electron pulses can probe the structure with a high spatial resolution. Until recently, however, it was not possible for UGED to reach the relevant timescale for the motion of the nuclei during a molecular reaction. Using MeV electron pulses has allowed us to overcome the main challenges in reaching femtosecond resolution, namely delivering short electron pulses on a gas target, overcoming the effect of velocity mismatch between pump laser pulses and the probe electron pulses, and maintaining a low timing jitter. At electron kinetic energies above 3 MeV, the velocity mismatch between laser and electron pulses becomes negligible. The relativistic electrons are also less susceptible to temporal broadening due to the Coulomb force. One of the challenges of diffraction with relativistic electrons is that the small de Broglie wavelength results in very small diffraction angles. In this paper we describe the new setup and its characterization, including capturing static diffraction patterns of molecules in the gas phase, finding time-zero with sub-picosecond accuracy and first time-resolved diffraction experiments. The new device can achieve a temporal resolution of 100 fs root-mean-square, and sub-angstrom spatial resolution. The collimation of the beam is sufficient to measure the diffraction pattern, and the transverse coherence is on the order of 2 nm. Currently, the temporal resolution is limited both by the pulse duration of the electron pulse on target and by the timing jitter, while the spatial resolution is limited by the average electron beam current and the signal-to-noise ratio of the detection system. We also discuss plans for improving both the temporal resolution and the spatial resolution
    • …
    corecore