96 research outputs found
Efficient Querying from Weighted Binary Codes
Binary codes are widely used to represent the data due to their small storage
and efficient computation. However, there exists an ambiguity problem that lots
of binary codes share the same Hamming distance to a query. To alleviate the
ambiguity problem, weighted binary codes assign different weights to each bit
of binary codes and compare the binary codes by the weighted Hamming distance.
Till now, performing the querying from the weighted binary codes efficiently is
still an open issue. In this paper, we propose a new method to rank the
weighted binary codes and return the nearest weighted binary codes of the query
efficiently. In our method, based on the multi-index hash tables, two
algorithms, the table bucket finding algorithm and the table merging algorithm,
are proposed to select the nearest weighted binary codes of the query in a
non-exhaustive and accurate way. The proposed algorithms are justified by
proving their theoretic properties. The experiments on three large-scale
datasets validate both the search efficiency and the search accuracy of our
method. Especially for the number of weighted binary codes up to one billion,
our method shows a great improvement of more than 1000 times faster than the
linear scan.Comment: 13 pages, accepted by AAAI202
How to Describe Images in a More Funny Way? Towards a Modular Approach to Cross-Modal Sarcasm Generation
Sarcasm generation has been investigated in previous studies by considering
it as a text-to-text generation problem, i.e., generating a sarcastic sentence
for an input sentence. In this paper, we study a new problem of cross-modal
sarcasm generation (CMSG), i.e., generating a sarcastic description for a given
image. CMSG is challenging as models need to satisfy the characteristics of
sarcasm, as well as the correlation between different modalities. In addition,
there should be some inconsistency between the two modalities, which requires
imagination. Moreover, high-quality training data is insufficient. To address
these problems, we take a step toward generating sarcastic descriptions from
images without paired training data and propose an
Extraction-Generation-Ranking based Modular method (EGRM) for cross-model
sarcasm generation. Specifically, EGRM first extracts diverse information from
an image at different levels and uses the obtained image tags, sentimental
descriptive caption, and commonsense-based consequence to generate candidate
sarcastic texts. Then, a comprehensive ranking algorithm, which considers
image-text relation, sarcasticness, and grammaticality, is proposed to select a
final text from the candidate texts. Human evaluation at five criteria on a
total of 1200 generated image-text pairs from eight systems and auxiliary
automatic evaluation show the superiority of our method
A Segmentation Foundation Model for Diverse-type Tumors
Large pre-trained models with their numerous model parameters and extensive
training datasets have shown excellent performance in various tasks. Many
publicly available medical image datasets do not have a sufficient amount of
data so there are few large-scale models in medical imaging. We propose a
large-scale Tumor Segmentation Foundation Model (TSFM) with 1.6 billion
parameters using Resblock-backbone and Transformer-bottleneck,which has good
transfer ability for downstream tasks. To make TSFM exhibit good performance in
tumor segmentation, we make full use of the strong spatial correlation between
tumors and organs in the medical image, innovatively fuse 7 tumor datasets and
3 multi-organ datasets to build a 3D medical dataset pool, including 2779 cases
with totally 300k medical images, whose size currently exceeds many other
single publicly available datasets. TSFM is the pre-trained model for medical
image segmentation, which also can be transferred to multiple downstream tasks
for fine-tuning learning. The average performance of our pre-trained model is
2% higher than that of nnU-Net across various tumor types. In the transfer
learning task, TSFM only needs 5% training epochs of nnU-Net to achieve similar
performance and can surpass nnU-Net by 2% on average with 10% training epoch.
Pre-trained TSFM and its code will be released soon.Comment: 10 pages, 2 figures.About Medical image segmentation and Foundation
Mode
MPCPA: Multi-Center Privacy Computing with Predictions Aggregation based on Denoising Diffusion Probabilistic Model
Privacy-preserving computing is crucial for multi-center machine learning in
many applications such as healthcare and finance. In this paper a Multi-center
Privacy Computing framework with Predictions Aggregation (MPCPA) based on
denoising diffusion probabilistic model (DDPM) is proposed, in which
conditional diffusion model training, DDPM data generation, a classifier, and
strategy of prediction aggregation are included. Compared to federated
learning, this framework necessitates fewer communications and leverages
high-quality generated data to support robust privacy computing. Experimental
validation across multiple datasets demonstrates that the proposed framework
outperforms classic federated learning and approaches the performance of
centralized learning with original data. Moreover, our approach demonstrates
robust security, effectively addressing challenges such as image memorization
and membership inference attacks. Our experiments underscore the efficacy of
the proposed framework in the realm of privacy computing, with the code set to
be released soon
- …