54 research outputs found
Deep Image Compression Using Scene Text Quality Assessment
Image compression is a fundamental technology for Internet communication
engineering. However, a high compression rate with general methods may degrade
images, resulting in unreadable texts. In this paper, we propose an image
compression method for maintaining text quality. We developed a scene text
image quality assessment model to assess text quality in compressed images. The
assessment model iteratively searches for the best-compressed image holding
high-quality text. Objective and subjective results showed that the proposed
method was superior to existing methods. Furthermore, the proposed assessment
model outperformed other deep-learning regression models.Comment: Accepted by Pattern Recognition, 202
Automatic Discrimination between Scomber japonicus and Scomber australasicus by Geometric and Texture Features
This paper proposes a method for automatic discrimination of two mackerel species: Scomber japonicus (chub mackerel) and Scomber australasicus (blue mackerel). Because S. japonicus has a much higher market price than S. australasicus, the two species must be properly sorted before shipment, but their similar appearance makes discrimination difficult. These species can be effectively distinguished using the ratio of the base length between the dorsal fin’s first and ninth spines to the fork length. However, manual measurement of this ratio is time-consuming and reduces fish freshness. The proposed technique instead uses image processing to measure these lengths. We were able to successfully discriminate between the two species using the ratio as a geometric feature, in combination with several texture features. We then quantitatively verified the effectiveness of the proposed method and demonstrated that it is highly accurate in classifying mackerel
Infrared Image Super-Resolution: Systematic Review, and Future Trends
Image Super-Resolution (SR) is essential for a wide range of computer vision
and image processing tasks. Investigating infrared (IR) image (or thermal
images) super-resolution is a continuing concern within the development of deep
learning. This survey aims to provide a comprehensive perspective of IR image
super-resolution, including its applications, hardware imaging system dilemmas,
and taxonomy of image processing methodologies. In addition, the datasets and
evaluation metrics in IR image super-resolution tasks are also discussed.
Furthermore, the deficiencies in current technologies and possible promising
directions for the community to explore are highlighted. To cope with the rapid
development in this field, we intend to regularly update the relevant excellent
work at \url{https://github.com/yongsongH/Infrared_Image_SR_SurveyComment: Submitted to IEEE TNNL
Activity Recognition Using Gazed Text and Viewpoint Information for User Support Systems
The development of information technology has added many conveniences to our lives. On the other hand, however, we have to deal with various kinds of information, which can be a difficult task for elderly people or those who are not familiar with information devices. A technology to recognize each person’s activity and providing appropriate support based on that activity could be useful for such people. In this paper, we propose a novel fine-grained activity recognition method for user support systems that focuses on identifying the text at which a user is gazing, based on the idea that the content of the text is related to the activity of the user. It is necessary to keep in mind that the meaning of the text depends on its location. To tackle this problem, we propose the simultaneous use of a wearable device and fixed camera. To obtain the global location of the text, we perform image matching using the local features of the images obtained by these two devices. Then, we generate a feature vector based on this information and the content of the text. To show the effectiveness of the proposed approach, we performed activity recognition experiments with six subjects in a laboratory environment
Fidelity-Controllable Extreme Image Compression with Generative Adversarial Networks
We propose a GAN-based image compression method working at extremely low
bitrates below 0.1bpp. Most existing learned image compression methods suffer
from blur at extremely low bitrates. Although GAN can help to reconstruct sharp
images, there are two drawbacks. First, GAN makes training unstable. Second,
the reconstructions often contain unpleasing noise or artifacts. To address
both of the drawbacks, our method adopts two-stage training and network
interpolation. The two-stage training is effective to stabilize the training.
Moreover, the network interpolation utilizes the models in both stages and
reduces undesirable noise and artifacts, while maintaining important edges.
Hence, we can control the trade-off between perceptual quality and fidelity
without re-training models. The experimental results show that our model can
reconstruct high quality images. Furthermore, our user study confirms that our
reconstructions are preferable to state-of-the-art GAN-based image compression
model. The code will be available.Comment: 8 pages, 11 figure
Multiple Visual-Semantic Embedding for Video Retrieval from Query Sentence
Visual-semantic embedding aims to learn a joint embedding space where related
video and sentence instances are located close to each other. Most existing
methods put instances in a single embedding space. However, they struggle to
embed instances due to the difficulty of matching visual dynamics in videos to
textual features in sentences. A single space is not enough to accommodate
various videos and sentences. In this paper, we propose a novel framework that
maps instances into multiple individual embedding spaces so that we can capture
multiple relationships between instances, leading to compelling video
retrieval. We propose to produce a final similarity between instances by fusing
similarities measured in each embedding space using a weighted sum strategy. We
determine the weights according to a sentence. Therefore, we can flexibly
emphasize an embedding space. We conducted sentence-to-video retrieval
experiments on a benchmark dataset. The proposed method achieved superior
performance, and the results are competitive to state-of-the-art methods. These
experimental results demonstrated the effectiveness of the proposed multiple
embedding approach compared to existing methods.Comment: 8 pages, 5 figure
Target-oriented Domain Adaptation for Infrared Image Super-Resolution
Recent efforts have explored leveraging visible light images to enrich
texture details in infrared (IR) super-resolution. However, this direct
adaptation approach often becomes a double-edged sword, as it improves texture
at the cost of introducing noise and blurring artifacts. To address these
challenges, we propose the Target-oriented Domain Adaptation SRGAN (DASRGAN),
an innovative framework specifically engineered for robust IR super-resolution
model adaptation. DASRGAN operates on the synergy of two key components: 1)
Texture-Oriented Adaptation (TOA) to refine texture details meticulously, and
2) Noise-Oriented Adaptation (NOA), dedicated to minimizing noise transfer.
Specifically, TOA uniquely integrates a specialized discriminator,
incorporating a prior extraction branch, and employs a Sobel-guided adversarial
loss to align texture distributions effectively. Concurrently, NOA utilizes a
noise adversarial loss to distinctly separate the generative and Gaussian noise
pattern distributions during adversarial training. Our extensive experiments
confirm DASRGAN's superiority. Comparative analyses against leading methods
across multiple benchmarks and upsampling factors reveal that DASRGAN sets new
state-of-the-art performance standards. Code are available at
\url{https://github.com/yongsongH/DASRGAN}.Comment: 11 pages, 9 figure
- …