63 research outputs found
Towards Few-shot Entity Recognition in Document Images: A Graph Neural Network Approach Robust to Image Manipulation
Recent advances of incorporating layout information, typically bounding box
coordinates, into pre-trained language models have achieved significant
performance in entity recognition from document images. Using coordinates can
easily model the absolute position of each token, but they might be sensitive
to manipulations in document images (e.g., shifting, rotation or scaling),
especially when the training data is limited in few-shot settings. In this
paper, we propose to further introduce the topological adjacency relationship
among the tokens, emphasizing their relative position information.
Specifically, we consider the tokens in the documents as nodes and formulate
the edges based on the topological heuristics from the k-nearest bounding
boxes. Such adjacency graphs are invariant to affine transformations including
shifting, rotations and scaling. We incorporate these graphs into the
pre-trained language model by adding graph neural network layers on top of the
language model embeddings, leading to a novel model LAGER. Extensive
experiments on two benchmark datasets show that LAGER significantly outperforms
strong baselines under different few-shot settings and also demonstrate better
robustness to manipulations
ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in Real-World User-AI Conversation
Despite remarkable advances that large language models have achieved in
chatbots, maintaining a non-toxic user-AI interactive environment has become
increasingly critical nowadays. However, previous efforts in toxicity detection
have been mostly based on benchmarks derived from social media content, leaving
the unique challenges inherent to real-world user-AI interactions
insufficiently explored. In this work, we introduce ToxicChat, a novel
benchmark based on real user queries from an open-source chatbot. This
benchmark contains the rich, nuanced phenomena that can be tricky for current
toxicity detection models to identify, revealing a significant domain
difference compared to social media content. Our systematic evaluation of
models trained on existing toxicity datasets has shown their shortcomings when
applied to this unique domain of ToxicChat. Our work illuminates the
potentially overlooked challenges of toxicity detection in real-world user-AI
conversations. In the future, ToxicChat can be a valuable resource to drive
further advancements toward building a safe and healthy environment for user-AI
interactions
Predicting miRNA-disease associations based on multi-view information fusion
MicroRNAs (miRNAs) play an important role in various biological processes and their abnormal expression could lead to the occurrence of diseases. Exploring the potential relationships between miRNAs and diseases can contribute to the diagnosis and treatment of complex diseases. The increasing databases storing miRNA and disease information provide opportunities to develop computational methods for discovering unobserved disease-related miRNAs, but there are still some challenges in how to effectively learn and fuse information from multi-source data. In this study, we propose a multi-view information fusion based method for miRNA-disease association (MDA)prediction, named MVIFMDA. Firstly, multiple heterogeneous networks are constructed by combining the known MDAs and different similarities of miRNAs and diseases based on multi-source information. Secondly, the topology features of miRNAs and diseases are obtained by using the graph convolutional network to each heterogeneous network view, respectively. Moreover, we design the attention strategy at the topology representation level to adaptively fuse representations including different structural information. Meanwhile, we learn the attribute representations of miRNAs and diseases from their similarity attribute views with convolutional neural networks, respectively. Finally, the complicated associations between miRNAs and diseases are reconstructed by applying a bilinear decoder to the combined features, which combine topology and attribute representations. Experimental results on the public dataset demonstrate that our proposed model consistently outperforms baseline methods. The case studies further show the ability of the MVIFMDA model for inferring underlying associations between miRNAs and diseases
Identify the radiotherapy-induced abnormal changes in the patients with nasopharyngeal carcinoma
Radiotherapy (RT) is the standard treatment for nasopharyngeal carcinoma, which often causes inevitable brain injury in the process of treatment. The majority of patients has no abnormal signal or density change of the conventional magnetic resonance imaging (MRI) and computed tomography (CT) examination in the long-term follow-up after radiation therapy. However, when there is a visible CT and conventional MR imaging changes, the damage often has been severe and lack of effective treatments, seriously influencing the prognosis of patients. Therefore, the present study aimed to investigate the abnormal changes in nasopharyngeal carcinoma (NPC) patients after RT. In the present study, we exploited the machine learning framework which contained two parts: feature extraction and classification to automatically detect the brain injury. Our results showed that the method could effectively identify the abnormal regions reduced by radiotherapy. The highest classification accuracy was 82.5 % in the abnormal brain regions. The parahippocampal gyrus was the highest accuracy region, which suggested that the parahippocampal gyrus could be most sensitive to radiotherapy and involved in the pathogenesis of radiotherapy-induced brain injury in NPC patients
Refined Edge Usage of Graph Neural Networks for Edge Prediction
Graph Neural Networks (GNNs), originally proposed for node classification,
have also motivated many recent works on edge prediction (a.k.a., link
prediction). However, existing methods lack elaborate design regarding the
distinctions between two tasks that have been frequently overlooked: (i) edges
only constitute the topology in the node classification task but can be used as
both the topology and the supervisions (i.e., labels) in the edge prediction
task; (ii) the node classification makes prediction over each individual node,
while the edge prediction is determinated by each pair of nodes. To this end,
we propose a novel edge prediction paradigm named Edge-aware Message PassIng
neuRal nEtworks (EMPIRE). Concretely, we first introduce an edge splitting
technique to specify use of each edge where each edge is solely used as either
the topology or the supervision (named as topology edge or supervision edge).
We then develop a new message passing mechanism that generates the messages to
source nodes (through topology edges) being aware of target nodes (through
supervision edges). In order to emphasize the differences between pairs
connected by supervision edges and pairs unconnected, we further weight the
messages to highlight the relative ones that can reflect the differences. In
addition, we design a novel negative node-pair sampling trick that efficiently
samples 'hard' negative instances in the supervision instances, and can
significantly improve the performance. Experimental results verify that the
proposed method can significantly outperform existing state-of-the-art models
regarding the edge prediction task on multiple homogeneous and heterogeneous
graph datasets.Comment: Pre-prin
New highly-anisotropic Rh-based Heusler compound for magnetic recording
The development of high-density magnetic recording media is limited by the
superparamagnetism in very small ferromagnetic crystals. Hard magnetic
materials with strong perpendicular anisotropy offer stability and high
recording density. To overcome the difficulty of writing media with a large
coercivity, heat assisted magnetic recording (HAMR) has been developed, rapidly
heating the media to the Curie temperature Tc before writing, followed by rapid
cooling. Requirements are a suitable Tc, coupled with anisotropic thermal
conductivity and hard magnetic properties. Here we introduce Rh2CoSb as a new
hard magnet with potential for thin film magnetic recording. A
magnetocrystalline anisotropy of 3.6 MJm-3 is combined with a saturation
magnetization of {\mu}0Ms = 0.52 T at 2 K (2.2 MJm-3 and 0.44 T at
room-temperature). The magnetic hardness parameter of 3.7 at room temperature
is the highest observed for any rare-earth free hard magnet. The anisotropy is
related to an unquenched orbital moment of 0.42 {\mu}B on Co, which is
hybridized with neighbouring Rh atoms with a large spin-orbit interaction.
Moreover, the pronounced temperature-dependence of the anisotropy that follows
from its Tc of 450 K, together with a high thermal conductivity of 20 Wm-1K-1,
makes Rh2CoSb a candidate for development for heat assisted writing with a
recording density in excess of 10 Tb/in2
MEI Kodierung der frĂĽhesten Notation in linienlosen Neumen
Das Optical Neume Recognition Project (ONRP) hat die digitale Kodierung von musikalischen Notationszeichen aus dem Jahr um 1000 zum Ziel – ein ambitioniertes Vorhaben, das die Projektmitglieder veranlasste, verschiedenste methodische Ansätze zu evaluieren. Die Optical Music Recognition-Software soll eine linienlose Notation aus einem der ältesten erhaltenen Quellen mit Notationszeichen, dem Antiphonar Hartker aus der Benediktinerabtei St. Gallen (Schweiz), welches heute in zwei Bänden in der Stiftsbibliothek in St. Gallen aufbewahrt wird, erfassen. Aufgrund der handgeschriebenen, linienlosen Notation stellt dieser Gregorianische Gesang den Forscher vor viele Herausforderungen. Das Werk umfasst über 300 verschiedene Neumenzeichen und ihre Notation, die mit Hilfe der Music Encoding Initiative (MEI) erfasst und beschrieben werden sollen. Der folgende Artikel beschreibt den Prozess der Adaptierung, um die MEI auf die Notation von Neumen ohne Notenlinien anzuwenden. Beschrieben werden Eigenschaften der Neumennotation, um zu verdeutlichen, wo die Herausforderungen dieser Arbeit liegen sowie die Funktionsweise des Classifiers, einer Art digitalen Neumenwörterbuchs
- …