10,124 research outputs found
Hybrid image representation methods for automatic image annotation: a survey
In most automatic image annotation systems, images are represented with low level features using either global
methods or local methods. In global methods, the entire image is used as a unit. Local methods divide images into blocks where fixed-size sub-image blocks are adopted as sub-units; or into regions by using segmented regions as sub-units in images. In contrast to typical automatic image annotation methods that use either global or local features exclusively, several recent methods have considered incorporating the two kinds of information, and believe that the combination of the two levels of features is
beneficial in annotating images. In this paper, we provide a
survey on automatic image annotation techniques according to
one aspect: feature extraction, and, in order to complement
existing surveys in literature, we focus on the emerging image annotation methods: hybrid methods that combine both global and local features for image representation
Object Referring in Visual Scene with Spoken Language
Object referring has important applications, especially for human-machine
interaction. While having received great attention, the task is mainly attacked
with written language (text) as input rather than spoken language (speech),
which is more natural. This paper investigates Object Referring with Spoken
Language (ORSpoken) by presenting two datasets and one novel approach. Objects
are annotated with their locations in images, text descriptions and speech
descriptions. This makes the datasets ideal for multi-modality learning. The
approach is developed by carefully taking down ORSpoken problem into three
sub-problems and introducing task-specific vision-language interactions at the
corresponding levels. Experiments show that our method outperforms competing
methods consistently and significantly. The approach is also evaluated in the
presence of audio noise, showing the efficacy of the proposed vision-language
interaction methods in counteracting background noise.Comment: 10 pages, Submitted to WACV 201
Multi-branch Convolutional Neural Network for Multiple Sclerosis Lesion Segmentation
In this paper, we present an automated approach for segmenting multiple
sclerosis (MS) lesions from multi-modal brain magnetic resonance images. Our
method is based on a deep end-to-end 2D convolutional neural network (CNN) for
slice-based segmentation of 3D volumetric data. The proposed CNN includes a
multi-branch downsampling path, which enables the network to encode information
from multiple modalities separately. Multi-scale feature fusion blocks are
proposed to combine feature maps from different modalities at different stages
of the network. Then, multi-scale feature upsampling blocks are introduced to
upsize combined feature maps to leverage information from lesion shape and
location. We trained and tested the proposed model using orthogonal plane
orientations of each 3D modality to exploit the contextual information in all
directions. The proposed pipeline is evaluated on two different datasets: a
private dataset including 37 MS patients and a publicly available dataset known
as the ISBI 2015 longitudinal MS lesion segmentation challenge dataset,
consisting of 14 MS patients. Considering the ISBI challenge, at the time of
submission, our method was amongst the top performing solutions. On the private
dataset, using the same array of performance metrics as in the ISBI challenge,
the proposed approach shows high improvements in MS lesion segmentation
compared with other publicly available tools.Comment: This paper has been accepted for publication in NeuroImag
Optical tomography: Image improvement using mixed projection of parallel and fan beam modes
Mixed parallel and fan beam projection is a technique used to increase the quality images. This research focuses on enhancing the image quality in optical tomography. Image quality can be deïŹned by measuring the Peak Signal to Noise Ratio (PSNR) and Normalized Mean Square Error (NMSE) parameters. The ïŹndings of this research prove that by combining parallel and fan beam projection, the image quality can be increased by more than 10%in terms of its PSNR value and more than 100% in terms of its NMSE value compared to a single parallel beam
Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers
The massive amounts of digitized historical documents acquired over the last
decades naturally lend themselves to automatic processing and exploration.
Research work seeking to automatically process facsimiles and extract
information thereby are multiplying with, as a first essential step, document
layout analysis. If the identification and categorization of segments of
interest in document images have seen significant progress over the last years
thanks to deep learning techniques, many challenges remain with, among others,
the use of finer-grained segmentation typologies and the consideration of
complex, heterogeneous documents such as historical newspapers. Besides, most
approaches consider visual features only, ignoring textual signal. In this
context, we introduce a multimodal approach for the semantic segmentation of
historical newspapers that combines visual and textual features. Based on a
series of experiments on diachronic Swiss and Luxembourgish newspapers, we
investigate, among others, the predictive power of visual and textual features
and their capacity to generalize across time and sources. Results show
consistent improvement of multimodal models in comparison to a strong visual
baseline, as well as better robustness to high material variance
- âŠ