Search CORE

54 research outputs found

Deep Image Compression Using Scene Text Quality Assessment

Author: Miyazaki Tomo
Omachi Shinichiro
Uchigasaki Shohei
Publication venue: 'Elsevier BV'
Publication date: 18/05/2023
Field of study

Image compression is a fundamental technology for Internet communication engineering. However, a high compression rate with general methods may degrade images, resulting in unreadable texts. In this paper, we propose an image compression method for maintaining text quality. We developed a scene text image quality assessment model to assess text quality in compressed images. The assessment model iteratively searches for the best-compressed image holding high-quality text. Objective and subjective results showed that the proposed method was superior to existing methods. Furthermore, the proposed assessment model outperformed other deep-learning regression models.Comment: Accepted by Pattern Recognition, 202

arXiv.org e-Print Archive

Automatic Mackerel Sorting Machine Using Global and Local Features

Author: SHINICHIRO OMACHI
TOMO MIYAZAKI
YOSHIHIRO SUGAYA
YOSHITO NAGAOKA
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 17/05/2019
Field of study

Tohoku University Repository (TOUR) / 東北大学機関リポジトリ

Automatic Discrimination between Scomber japonicus and Scomber australasicus by Geometric and Texture Features

Author: Airi Kitasato
Shinichiro Omachi
Tomo Miyazaki
Yoshihiro Sugaya
Publication venue: 'MDPI AG'
Publication date: 01/06/2018
Field of study

This paper proposes a method for automatic discrimination of two mackerel species: Scomber japonicus (chub mackerel) and Scomber australasicus (blue mackerel). Because S. japonicus has a much higher market price than S. australasicus, the two species must be properly sorted before shipment, but their similar appearance makes discrimination difficult. These species can be effectively distinguished using the ratio of the base length between the dorsal fin’s first and ninth spines to the fork length. However, manual measurement of this ratio is time-consuming and reduces fish freshness. The proposed technique instead uses image processing to measure these lengths. We were able to successfully discriminate between the two species using the ratio as a geometric feature, in combination with several texture features. We then quantitatively verified the effectiveness of the proposed method and demonstrated that it is highly accurate in classifying mackerel

Multidisciplinary Digital Publishing Institute

Tohoku University Repository (TOUR) / 東北大学機関リポジトリ

Directory of Open Access Journals

Infrared Image Super-Resolution: Systematic Review, and Future Trends

Author: Huang Yongsong
Liu Xiaofeng
Miyazaki Tomo
Omachi Shinichiro
Publication venue
Publication date: 15/11/2023
Field of study

Image Super-Resolution (SR) is essential for a wide range of computer vision and image processing tasks. Investigating infrared (IR) image (or thermal images) super-resolution is a continuing concern within the development of deep learning. This survey aims to provide a comprehensive perspective of IR image super-resolution, including its applications, hardware imaging system dilemmas, and taxonomy of image processing methodologies. In addition, the datasets and evaluation metrics in IR image super-resolution tasks are also discussed. Furthermore, the deficiencies in current technologies and possible promising directions for the community to explore are highlighted. To cope with the rapid development in this field, we intend to regularly update the relevant excellent work at \url{https://github.com/yongsongH/Infrared_Image_SR_SurveyComment: Submitted to IEEE TNNL

arXiv.org e-Print Archive

Activity Recognition Using Gazed Text and Viewpoint Information for User Support Systems

Author: Shinichiro Omachi
Shun Chiba
Tomo Miyazaki
Yoshihiro Sugaya
Publication venue: 'MDPI AG'
Publication date: 02/08/2018
Field of study

The development of information technology has added many conveniences to our lives. On the other hand, however, we have to deal with various kinds of information, which can be a difficult task for elderly people or those who are not familiar with information devices. A technology to recognize each person’s activity and providing appropriate support based on that activity could be useful for such people. In this paper, we propose a novel fine-grained activity recognition method for user support systems that focuses on identifying the text at which a user is gazing, based on the idea that the content of the text is related to the activity of the user. It is necessary to keep in mind that the meaning of the text depends on its location. To tackle this problem, we propose the simultaneous use of a wearable device and fixed camera. To obtain the global location of the text, we perform image matching using the local features of the images obtained by these two devices. Then, we generate a feature vector based on this information and the content of the text. To show the effectiveness of the proposed approach, we performed activity recognition experiments with six subjects in a laboratory environment

Multidisciplinary Digital Publishing Institute

Tohoku University Repository (TOUR) / 東北大学機関リポジトリ

Directory of Open Access Journals

Fidelity-Controllable Extreme Image Compression with Generative Adversarial Networks

Author: Iwai Shoma
Miyazaki Tomo
Omachi Shinichiro
Sugaya Yoshihiro
Publication venue
Publication date: 24/08/2020
Field of study

We propose a GAN-based image compression method working at extremely low bitrates below 0.1bpp. Most existing learned image compression methods suffer from blur at extremely low bitrates. Although GAN can help to reconstruct sharp images, there are two drawbacks. First, GAN makes training unstable. Second, the reconstructions often contain unpleasing noise or artifacts. To address both of the drawbacks, our method adopts two-stage training and network interpolation. The two-stage training is effective to stabilize the training. Moreover, the network interpolation utilizes the models in both stages and reduces undesirable noise and artifacts, while maintaining important edges. Hence, we can control the trade-off between perceptual quality and fidelity without re-training models. The experimental results show that our model can reconstruct high quality images. Furthermore, our user study confirms that our reconstructions are preferable to state-of-the-art GAN-based image compression model. The code will be available.Comment: 8 pages, 11 figure

arXiv.org e-Print Archive

Multiple Visual-Semantic Embedding for Video Retrieval from Query Sentence

Author: Miyazaki Tomo
Nguyen Huy Manh
Omachi Shinichiro
Sugaya Yoshihiro
Publication venue
Publication date: 16/04/2020
Field of study

Visual-semantic embedding aims to learn a joint embedding space where related video and sentence instances are located close to each other. Most existing methods put instances in a single embedding space. However, they struggle to embed instances due to the difficulty of matching visual dynamics in videos to textual features in sentences. A single space is not enough to accommodate various videos and sentences. In this paper, we propose a novel framework that maps instances into multiple individual embedding spaces so that we can capture multiple relationships between instances, leading to compelling video retrieval. We propose to produce a final similarity between instances by fusing similarities measured in each embedding space using a weighted sum strategy. We determine the weights according to a sentence. Therefore, we can flexibly emphasize an embedding space. We conducted sentence-to-video retrieval experiments on a benchmark dataset. The proposed method achieved superior performance, and the results are competitive to state-of-the-art methods. These experimental results demonstrated the effectiveness of the proposed multiple embedding approach compared to existing methods.Comment: 8 pages, 5 figure

arXiv.org e-Print Archive

Target-oriented Domain Adaptation for Infrared Image Super-Resolution

Author: Dong Yafei
Huang Yongsong
Liu Xiaofeng
Miyazaki Tomo
Omachi Shinichiro
Publication venue
Publication date: 15/11/2023
Field of study

Recent efforts have explored leveraging visible light images to enrich texture details in infrared (IR) super-resolution. However, this direct adaptation approach often becomes a double-edged sword, as it improves texture at the cost of introducing noise and blurring artifacts. To address these challenges, we propose the Target-oriented Domain Adaptation SRGAN (DASRGAN), an innovative framework specifically engineered for robust IR super-resolution model adaptation. DASRGAN operates on the synergy of two key components: 1) Texture-Oriented Adaptation (TOA) to refine texture details meticulously, and 2) Noise-Oriented Adaptation (NOA), dedicated to minimizing noise transfer. Specifically, TOA uniquely integrates a specialized discriminator, incorporating a prior extraction branch, and employs a Sobel-guided adversarial loss to align texture distributions effectively. Concurrently, NOA utilizes a noise adversarial loss to distinctly separate the generative and Gaussian noise pattern distributions during adversarial training. Our extensive experiments confirm DASRGAN's superiority. Comparative analyses against leading methods across multiple benchmarks and upsampling factors reveal that DASRGAN sets new state-of-the-art performance standards. Code are available at \url{https://github.com/yongsongH/DASRGAN}.Comment: 11 pages, 9 figure

arXiv.org e-Print Archive

Importance Estimation for Scene Texts Using Visual Features

Author: MIYAZAKI Tomo
OMACHI Shinichiro
OODAIRA Kota
SUGAYA Yoshihiro
Publication venue: Graduate School of Information Sciences, Tohoku University
Publication date: 01/07/2022
Field of study

Tohoku University Repository (TOUR) / 東北大学機関リポジトリ