308 research outputs found
共感からなる反抗:中国における情報検閲下の反抗活動
Tohoku University博士(文学)博士学位論文 (Thesis(doctor))要約のみthesi
Only Positive Cases: 5-fold High-order Attention Interaction Model for Skin Segmentation Derived Classification
Computer-aided diagnosis of skin diseases is an important tool. However, the
interpretability of computer-aided diagnosis is currently poor. Dermatologists
and patients cannot intuitively understand the learning and prediction process
of neural networks, which will lead to a decrease in the credibility of
computer-aided diagnosis. In addition, traditional methods need to be trained
using negative samples in order to predict the presence or absence of a lesion,
but medical data is often in short supply. In this paper, we propose a multiple
high-order attention interaction model (MHA-UNet) for use in a highly
explainable skin lesion segmentation task. MHA-UNet is able to obtain the
presence or absence of a lesion by explainable reasoning without the need for
training on negative samples. Specifically, we propose a high-order attention
interaction mechanism that introduces squeeze attention to a higher level for
feature attention. In addition, a multiple high-order attention interaction
(MHAblock) module is proposed by combining the different features of different
orders. For classifying the presence or absence of lesions, we conducted
classification experiments on several publicly available datasets in the
absence of negative samples, based on explainable reasoning about the
interaction of 5 attention orders of MHAblock. The highest positive detection
rate obtained from the experiments was 81.0% and the highest negative detection
rate was 83.5%. For segmentation experiments, comparison experiments of the
proposed method with 13 medical segmentation models and external validation
experiments with 8 state-of-the-art models in three public datasets and our
clinical dataset demonstrate the state-of-the-art performance of our model. The
code is available from https://github.com/wurenkai/MHA-UNet
Real-Time Marker Localization Learning for GelStereo Tactile Sensing
Visuotactile sensing technology is becoming more popular in tactile sensing,
but the effectiveness of the existing marker detection localization methods
remains to be further explored. Instead of contour-based blob detection, this
paper presents a learning-based marker localization network for GelStereo
visuotactile sensing called Marknet. Specifically, the Marknet presents a grid
regression architecture to incorporate the distribution of the GelStereo
markers. Furthermore, a marker rationality evaluator (MRE) is modelled to
screen suitable prediction results. The experimental results show that the
Marknet combined with MRE achieves 93.90% precision for irregular markers in
contact areas, which outperforms the traditional contour-based blob detection
method by a large margin of 42.32%. Meanwhile, the proposed learning-based
marker localization method can achieve better real-time performance beyond the
blob detection interface provided by the OpenCV library through GPU
acceleration, which we believe will lead to considerable perceptual sensitivity
gains in various robotic manipulation tasks
MindLLM: Pre-training Lightweight Large Language Model from Scratch, Evaluations and Domain Applications
Large Language Models (LLMs) have demonstrated remarkable performance across
various natural language tasks, marking significant strides towards general
artificial intelligence. While general artificial intelligence is leveraged by
developing increasingly large-scale models, there could be another branch to
develop lightweight custom models that better serve certain domains, taking
into account the high cost of training and deploying LLMs and the scarcity of
resources. In this paper, we present MindLLM, a novel series of bilingual
lightweight large language models, trained from scratch, alleviating such
burdens by offering models with 1.3 billion and 3 billion parameters. A
thorough account of experiences accrued during large model development is
given, covering every step of the process, including data construction, model
architecture, evaluation, and applications. Such insights are hopefully
valuable for fellow academics and developers. MindLLM consistently matches or
surpasses the performance of other open-source larger models on some public
benchmarks. We also introduce an innovative instruction tuning framework
tailored for smaller models to enhance their capabilities efficiently.
Moreover, we explore the application of MindLLM in specific vertical domains
such as law and finance, underscoring the agility and adaptability of our
lightweight models.Comment: Working in progres
Crust: Verifiable And Efficient Private Information Retrieval with Sublinear Online Time
Private Information Retrieval (PIR) is a cryptographic primitive that enables a user to retrieve information from a database without revealing the particular information they are seeking, thus preserving their privacy. PIR schemes suffer from high computation overhead. By running an offline preprocessing phase, PIR schemes can achieve sublinear online server computation. On the other hand, although protocols for honest-but-curious servers have been well-studied in both single-server and multi-server scenarios, little work has been done for the case where the server is malicious. In this paper, we propose a simple but efficient sublinear PIR scheme named Crust. The scheme is tailored for verifiability and provides privacy and data integrity against malicious servers. Our scheme can work with two servers or a single server. Aside from verifiability, our scheme is very efficient. Compared to state-of-the-art two-server and single-server sublinear PIR schemes, our scheme is 22x more efficient in online computation. To the best of our knowledge, this is the first PIR scheme that achieves verifiability, as well as amortized server computation
MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response
Large Language Models (LLMs) have shown immense potential in multimodal
applications, yet the convergence of textual and musical domains remains
relatively unexplored. To address this gap, we present MusiLingo, a novel
system for music caption generation and music-related query responses.
MusiLingo employs a single projection layer to align music representations from
the pre-trained frozen music audio model MERT with the frozen LLaMA language
model, bridging the gap between music audio and textual contexts. We train it
on an extensive music caption dataset and fine-tune it with instructional data.
Due to the scarcity of high-quality music Q&A datasets, we created the
MusicInstruct (MI) dataset from MusicCaps, tailored for open-ended music
inquiries. Empirical evaluations demonstrate its competitive performance in
generating music captions and composing music-related Q&A pairs. Our introduced
dataset enables notable advancements beyond previous ones
- …