390 research outputs found

    A Hybrid Model for Monolingual and Multilingual Toxic Comment Detection

    Get PDF
    Social media provides a public and convenient platform for people to communicate. However, it is also open to hateful behavior and toxic comments. Social networks, like Facebook, Twitter, and many others, have been working on developing effective toxic comment detection methods to provide better service. Monolingual language model focuses on a single-language and provides high accuracy in detection. Multilingual language model provides better generalization performance. In order to improve the effectiveness of detecting toxic comments in multiple languages, we propose a hybrid model, which fuses monolingual model and multilingual model. We use labeled data to fine-tune the monolingual pre-trained model. We use masked language modeling to semi-supervise the fine-tuning of multilingual pre-trained model on unlabeled data and then use labeled data to fine-tune the model. Through this way, we can fully utilize the large amount of unlabeled data; reduce dependence on labeled comment data; and improve the effectiveness of detection. We also design several comparative experiments. The results demonstrate the effectiveness and advantage of our proposed model, especially compared to the XLM-RoBERTa multilingual fine-tuning model

    Diagnostic Evaluation of Policy-Gradient-Based Ranking

    Get PDF
    Learning-to-rank has been intensively studied and has shown significantly increasing values in a wide range of domains, such as web search, recommender systems, dialogue systems, machine translation, and even computational biology, to name a few. In light of recent advances in neural networks, there has been a strong and continuing interest in exploring how to deploy popular techniques, such as reinforcement learning and adversarial learning, to solve ranking problems. However, armed with the aforesaid popular techniques, most studies tend to show how effective a new method is. A comprehensive comparison between techniques and an in-depth analysis of their deficiencies are somehow overlooked. This paper is motivated by the observation that recent ranking methods based on either reinforcement learning or adversarial learning boil down to policy-gradient-based optimization. Based on the widely used benchmark collections with complete information (where relevance labels are known for all items), such as MSLRWEB30K and Yahoo-Set1, we thoroughly investigate the extent to which policy-gradient-based ranking methods are effective. On one hand, we analytically identify the pitfalls of policy-gradient-based ranking. On the other hand, we experimentally compare a wide range of representative methods. The experimental results echo our analysis and show that policy-gradient-based ranking methods are, by a large margin, inferior to many conventional ranking methods. Regardless of whether we use reinforcement learning or adversarial learning, the failures are largely attributable to the gradient estimation based on sampled rankings, which significantly diverge from ideal rankings. In particular, the larger the number of documents per query and the more fine-grained the ground-truth labels, the greater the impact policy-gradient-based ranking suffers. Careful examination of this weakness is highly recommended for developing enhanced methods based on policy gradient

    BigVideo: A Large-scale Video Subtitle Translation Dataset for Multimodal Machine Translation

    Full text link
    We present a large-scale video subtitle translation dataset, BigVideo, to facilitate the study of multi-modality machine translation. Compared with the widely used How2 and VaTeX datasets, BigVideo is more than 10 times larger, consisting of 4.5 million sentence pairs and 9,981 hours of videos. We also introduce two deliberately designed test sets to verify the necessity of visual information: Ambiguous with the presence of ambiguous words, and Unambiguous in which the text context is self-contained for translation. To better model the common semantics shared across texts and videos, we introduce a contrastive learning method in the cross-modal encoder. Extensive experiments on the BigVideo show that: a) Visual information consistently improves the NMT model in terms of BLEU, BLEURT, and COMET on both Ambiguous and Unambiguous test sets. b) Visual information helps disambiguation, compared to the strong text baseline on terminology-targeted scores and human evaluation. Dataset and our implementations are available at https://github.com/DeepLearnXMU/BigVideo-VMT.Comment: Accepted to ACL 2023 Finding

    Combining Context and Knowledge Representations for Chemical-Disease Relation Extraction

    Full text link
    Automatically extracting the relationships between chemicals and diseases is significantly important to various areas of biomedical research and health care. Biomedical experts have built many large-scale knowledge bases (KBs) to advance the development of biomedical research. KBs contain huge amounts of structured information about entities and relationships, therefore plays a pivotal role in chemical-disease relation (CDR) extraction. However, previous researches pay less attention to the prior knowledge existing in KBs. This paper proposes a neural network-based attention model (NAM) for CDR extraction, which makes full use of context information in documents and prior knowledge in KBs. For a pair of entities in a document, an attention mechanism is employed to select important context words with respect to the relation representations learned from KBs. Experiments on the BioCreative V CDR dataset show that combining context and knowledge representations through the attention mechanism, could significantly improve the CDR extraction performance while achieve comparable results with state-of-the-art systems.Comment: Published on IEEE/ACM Transactions on Computational Biology and Bioinformatics, 11 pages, 5 figure

    Stationary shapes of deformable particles moving at low Reynolds numbers

    Full text link
    Lecture Notes of the Summer School ``Microswimmers -- From Single Particle Motion to Collective Behaviour'', organised by the DFG Priority Programme SPP 1726 (Forschungszentrum J{\"{u}}lich, 2015).Comment: Pages C7.1-16 of G. Gompper et al. (ed.), Microswimmers - From Single Particle Motion to Collective Behaviour, Lecture Notes of the DFG SPP 1726 Summer School 2015, Forschungszentrum J\"ulich GmbH, Schriften des Forschungszentrums J\"ulich, Reihe Key Technologies, Vol 110, ISBN 978-3-95806-083-

    Sensing remote nuclear spins

    Full text link
    Sensing single nuclear spins is a central challenge in magnetic resonance based imaging techniques. Although different methods and especially diamond defect based sensing and imaging techniques in principle have shown sufficient sensitivity, signals from single nuclear spins are usually too weak to be distinguished from background noise. Here, we present the detection and identification of remote single C-13 nuclear spins embedded in nuclear spin baths surrounding a single electron spins of a nitrogen-vacancy centre in diamond. With dynamical decoupling control of the centre electron spin, the weak magnetic field ~10 nT from a single nuclear spin located ~3 nm from the centre with hyperfine coupling as weak as ~500 Hz is amplified and detected. The quantum nature of the coupling is confirmed and precise position and the vector components of the nuclear field are determined. Given the distance over which nuclear magnetic fields can be detected the technique marks a firm step towards imaging, detecting and controlling nuclear spin species external to the diamond sensor

    Synthesis, self-assembly, and immunological activity of α-galactose-functionalized dendron–lipid amphiphiles

    Get PDF
    Nanoassemblies presenting multivalent displays of biologically active carbohydrates are of significant interest for a wide array of biomedical applications ranging from drug delivery to immunotherapy. In this study, glycodendron–lipid hybrids were developed as a new and tunable class of dendritic amphiphiles. A modular synthesis was used to prepare dendron–lipid hybrids comprising distearylglycerol and 0 through 4th generation polyester dendrons with peripheral protected amines. Following deprotection of the amines, an isothiocyanate derivative of C-linked α-galactose (α-Gal) was conjugated to the dendron peripheries, affording amphiphiles with 1 to 16 α-Gal moieties. Self-assembly in water through a solvent exchange process resulted in vesicles for the 0 through 2nd generation systems and micelles for the 3rd and 4th generation systems. The critical aggregation concentrations decreased with increasing dendron generation, suggesting that the effects of increasing molar mass dominated over the effects of increasing the hydrophilic weight fraction. The binding of the assemblies to Griffonia simplicifolia Lectin I (GSL 1), a protein with specificity for α-Gal was studied by quantifying the binding of fluorescently labeled assemblies to GSL 1-coated beads. It was found that binding was enhanced for amphiphiles containing higher generation dendrons. Despite their substantial structural differences with the natural ligands for the CD1d receptor, the glycodendron–lipid hybrids were capable of stimulating invariant natural killer T (iNKT) cells, a class of innate-like T cells that recognize lipid and glycolipid antigens presented by CD1d and that are implicated in a wide range of diseases and conditions including but not limited to infectious diseases, diabetes and cancer

    Tuning the Properties of ZnO, Hematite, and Ag Nanoparticles by Adjusting the Surface Charge

    Get PDF
    Nanomaterials have become a central focus of scientific research and technological development over the last decade due to their broad applications in a variety of physicochemical and biological fields, including lasers,[1] solar cells,[2] catalysts,[3] sensors,[4–6] biological labels,[7] drug delivery,[8,9] and cancer therapy.[10–13] Controlling the size and/or shape of nanoparticles (NPs) has been widely used to modify and improve NP properties for designated applications.[1,6,11,14–19] Recently, it has been found that adjusting the surface charge (SC) can be a effective method to modify the cytotoxicity, cellular uptake, and specificity of targeting of NPs.[9,10,12,20–22] Electrons and/or other electrical charges play an essential role in many key material properties, such as electrostatic interactions, photoluminescence (PL), magnetism, plasmon properties, chemical bonds, and related chemical properties
    • 

    corecore