177 research outputs found

    TLMCM Network for Medical Image Hierarchical Multi-Label Classification

    Full text link
    Medical Image Hierarchical Multi-Label Classification (MI-HMC) is of paramount importance in modern healthcare, presenting two significant challenges: data imbalance and \textit{hierarchy constraint}. Existing solutions involve complex model architecture design or domain-specific preprocessing, demanding considerable expertise or effort in implementation. To address these limitations, this paper proposes Transfer Learning with Maximum Constraint Module (TLMCM) network for the MI-HMC task. The TLMCM network offers a novel approach to overcome the aforementioned challenges, outperforming existing methods based on the Area Under the Average Precision and Recall Curve(AU(PRC)β€ΎAU\overline{(PRC)}) metric. In addition, this research proposes two novel accuracy metrics, EMREMR and HammingAccuracyHammingAccuracy, which have not been extensively explored in the context of the MI-HMC task. Experimental results demonstrate that the TLMCM network achieves high multi-label prediction accuracy(80%80\%-90%90\%) for MI-HMC tasks, making it a valuable contribution to healthcare domain applications

    WSPAlign: Word Alignment Pre-training via Large-Scale Weakly Supervised Span Prediction

    Full text link
    Most existing word alignment methods rely on manual alignment datasets or parallel corpora, which limits their usefulness. Here, to mitigate the dependence on manual data, we broaden the source of supervision by relaxing the requirement for correct, fully-aligned, and parallel sentences. Specifically, we make noisy, partially aligned, and non-parallel paragraphs. We then use such a large-scale weakly-supervised dataset for word alignment pre-training via span prediction. Extensive experiments with various settings empirically demonstrate that our approach, which is named WSPAlign, is an effective and scalable way to pre-train word aligners without manual data. When fine-tuned on standard benchmarks, WSPAlign has set a new state-of-the-art by improving upon the best-supervised baseline by 3.3~6.1 points in F1 and 1.5~6.1 points in AER. Furthermore, WSPAlign also achieves competitive performance compared with the corresponding baselines in few-shot, zero-shot and cross-lingual tests, which demonstrates that WSPAlign is potentially more practical for low-resource languages than existing methods.Comment: To appear at ACL 202

    Molecular Joint Representation Learning via Multi-modal Information

    Full text link
    In recent years, artificial intelligence has played an important role on accelerating the whole process of drug discovery. Various of molecular representation schemes of different modals (e.g. textual sequence or graph) are developed. By digitally encoding them, different chemical information can be learned through corresponding network structures. Molecular graphs and Simplified Molecular Input Line Entry System (SMILES) are popular means for molecular representation learning in current. Previous works have done attempts by combining both of them to solve the problem of specific information loss in single-modal representation on various tasks. To further fusing such multi-modal imformation, the correspondence between learned chemical feature from different representation should be considered. To realize this, we propose a novel framework of molecular joint representation learning via Multi-Modal information of SMILES and molecular Graphs, called MMSG. We improve the self-attention mechanism by introducing bond level graph representation as attention bias in Transformer to reinforce feature correspondence between multi-modal information. We further propose a Bidirectional Message Communication Graph Neural Network (BMC GNN) to strengthen the information flow aggregated from graphs for further combination. Numerous experiments on public property prediction datasets have demonstrated the effectiveness of our model

    Leveraging Multi-lingual Positive Instances in Contrastive Learning to Improve Sentence Embedding

    Full text link
    Learning multi-lingual sentence embeddings is a fundamental and significant task in natural language processing. Recent trends of learning both mono-lingual and multi-lingual sentence embeddings are mainly based on contrastive learning (CL) with an anchor, one positive, and multiple negative instances. In this work, we argue that leveraging multiple positives should be considered for multi-lingual sentence embeddings because (1) positives in a diverse set of languages can benefit cross-lingual learning, and (2) transitive similarity across multiple positives can provide reliable structural information to learn. In order to investigate the impact of CL with multiple positives, we propose a novel approach MPCL to effectively utilize multiple positive instances to improve learning multi-lingual sentence embeddings. Our experimental results on various backbone models and downstream tasks support that compared with conventional CL, MPCL leads to better retrieval, semantic similarity, and classification performances. We also observe that on unseen languages, sentence embedding models trained on multiple positives have better cross-lingual transferring performance than models trained on a single positive instance.Comment: 14 pages, 4 figure

    cTFbase: a database for comparative genomics of transcription factors in cyanobacteria

    Get PDF
    BACKGROUND: Comprehensive identification and classification of the transcription factors (TFs) in a given genome is an important aspect in understanding transcriptional regulatory networks of a specific organism. Cyanobacteria are an ancient group of gram-negative bacteria with strong variation in genome size ranging from about 1.6 to 9.1 Mb and little is known about their TF repertoires. Therefore, we constructed the cTFbase database to classify and analyze all the putative TFs in cyanobacterial genomes, followed by genome-wide comparative analysis. DESCRIPTION: In the current release, cTFbase contains 1288 putative TFs identified from 21 fully sequenced cyanobacterial genomes. Through its user-friendly interactive interface, users can employ various criteria to retrieve all TF sequences and their detailed annotation information, including sequence features, domain architecture and sequence similarity against the linked databases. Furthermore, cTFbase provides phylogenetic trees of individual TF family, multiple sequence alignments of the DNA-binding domains and ortholog identification from any selected genomes. Comparative analysis revealed great variability of the TF sequences in cyanobacterial genomes. The high variance on the gene number and domain organization would be related to their diverse biological functions and their adaptation to various environmental conditions. CONCLUSION: cTFbase provides a centralized warehouse for comparative analysis of putative TFs in cyanobacterial genomes. The availability of such an extensive database would be of great interest for the community of researchers working on TFs or transcriptional regulatory networks in cyanobacteria. cTFbase can be freely accessible at and will be continuously updated when the newly sequenced cyanobacterial genomes are available

    Generation of Monoclonal Antibodies against Highly Conserved Antigens

    Get PDF
    Background: Therapeutic antibody development is one of the fastest growing areas of the pharmaceutical industry. Generating high-quality monoclonal antibodies against a given therapeutic target is very crucial for the success of the drug development. However, due to immune tolerance, some proteins that are highly conserved between mice and humans are not very immunogenic in mice, making it difficult to generate antibodies using a conventional approach. Methodology/Principal Findings: In this report, the impaired immune tolerance of NZB/W mice was exploited to generate monoclonal antibodies against highly conserved or self-antigens. Using two highly conserved human antigens (MIF and HMGB1) and one mouse self-antigen (TNF-alpha) as examples, we demonstrate here that multiple clones of high affinity, highly specific antibodies with desired biological activities can be generated, using the NZB/W mouse as the immunization host and a T cell-specific tag fused to a recombinant antigen to stimulate the immune system. Conclusions/Significance: We developed an efficient and universal method for generating surrogate or therapeuti
    • …