380 research outputs found
Historical Document Image Segmentation with LDA-Initialized Deep Neural Networks
In this paper, we present a novel approach to perform deep neural networks
layer-wise weight initialization using Linear Discriminant Analysis (LDA).
Typically, the weights of a deep neural network are initialized with: random
values, greedy layer-wise pre-training (usually as Deep Belief Network or as
auto-encoder) or by re-using the layers from another network (transfer
learning). Hence, many training epochs are needed before meaningful weights are
learned, or a rather similar dataset is required for seeding a fine-tuning of
transfer learning. In this paper, we describe how to turn an LDA into either a
neural layer or a classification layer. We analyze the initialization technique
on historical documents. First, we show that an LDA-based initialization is
quick and leads to a very stable initialization. Furthermore, for the task of
layout analysis at pixel level, we investigate the effectiveness of LDA-based
initialization and show that it outperforms state-of-the-art random weight
initialization methods.Comment: 5 page
A Comprehensive Study of ImageNet Pre-Training for Historical Document Image Analysis
Automatic analysis of scanned historical documents comprises a wide range of
image analysis tasks, which are often challenging for machine learning due to a
lack of human-annotated learning samples. With the advent of deep neural
networks, a promising way to cope with the lack of training data is to
pre-train models on images from a different domain and then fine-tune them on
historical documents. In the current research, a typical example of such
cross-domain transfer learning is the use of neural networks that have been
pre-trained on the ImageNet database for object recognition. It remains a
mostly open question whether or not this pre-training helps to analyse
historical documents, which have fundamentally different image properties when
compared with ImageNet. In this paper, we present a comprehensive empirical
survey on the effect of ImageNet pre-training for diverse historical document
analysis tasks, including character recognition, style classification,
manuscript dating, semantic segmentation, and content-based retrieval. While we
obtain mixed results for semantic segmentation at pixel-level, we observe a
clear trend across different network architectures that ImageNet pre-training
has a positive effect on classification as well as content-based retrieval
DeepDIVA: A Highly-Functional Python Framework for Reproducible Experiments
We introduce DeepDIVA: an infrastructure designed to enable quick and
intuitive setup of reproducible experiments with a large range of useful
analysis functionality. Reproducing scientific results can be a frustrating
experience, not only in document image analysis but in machine learning in
general. Using DeepDIVA a researcher can either reproduce a given experiment
with a very limited amount of information or share their own experiments with
others. Moreover, the framework offers a large range of functions, such as
boilerplate code, keeping track of experiments, hyper-parameter optimization,
and visualization of data and results. To demonstrate the effectiveness of this
framework, this paper presents case studies in the area of handwritten document
analysis where researchers benefit from the integrated functionality. DeepDIVA
is implemented in Python and uses the deep learning framework PyTorch. It is
completely open source, and accessible as Web Service through DIVAServices.Comment: Submitted at the 16th International Conference on Frontiers in
Handwriting Recognition (ICFHR), 6 pages, 6 Figure
News Text Classification Based on an Improved Convolutional Neural Network
With the explosive growth in Internet news media and the disorganized status of news texts, this paper puts forward an automatic classification model for news based on a Convolutional Neural Network (CNN). In the model, Word2vec is firstly merged with Latent Dirichlet Allocation (LDA) to generate an effective text feature representation. Then when an attention mechanism is combined with the proposed model, higher attention probability values are given to key features to achieve an accurate judgment. The results show that the precision rate, the recall rate and the F1 value of the model in this paper reach 96.4%, 95.9% and 96.2% respectively, which indicates that the improved CNN, through a unique framework, can extract deep semantic features of the text and provide a strong support for establishing an efficient and accurate news text classification model
A Hybrid COVID-19 Detection Model Using an Improved Marine Predators Algorithm and a Ranking-Based Diversity Reduction Strategy
Many countries are challenged by the medical resources required for COVID-19 detection which necessitates the development of a low-cost, rapid tool to detect and diagnose the virus effectively for a large numbers of tests. Although a chest X-Ray scan is a useful candidate tool the images generated by the scans must be analyzed accurately and quickly if large numbers of tests are to be processed. COVID-19 causes bilateral pulmonary parenchymal ground-glass and consolidative pulmonary opacities, sometimes with a rounded morphology and a peripheral lung distribution. In this work, we aim to extract rapidly from chest X-Ray images the similar small regions that may contain the identifying features of COVID-19. This paper therefore proposes a hybrid COVID-19 detection model based on an improved marine predators algorithm (IMPA) for X-Ray image segmentation. The ranking-based diversity reduction (RDR) strategy is used to enhance the performance of the IMPA to reach better solutions in fewer iterations. RDR works on finding the particles that couldn't find better solutions within a consecutive number of iterations, and then moving those particles towards the best solutions so far. The performance of IMPA has been validated on nine chest X-Ray images with threshold levels between 10 and 100 and compared with five state-of-art algorithms: equilibrium optimizer (EO), whale optimization algorithm (WOA), sine cosine algorithm (SCA), Harris-hawks algorithm (HHA), and salp swarm algorithms (SSA). The experimental results demonstrate that the proposed hybrid model outperforms all other algorithms for a range of metrics. In addition, the performance of our proposed model was convergent on all numbers of thresholds level in the Structured Similarity Index Metric (SSIM) and Universal Quality Index (UQI) metrics.</p
Visual Representation Learning with Limited Supervision
The quality of a Computer Vision system is proportional to the rigor of data representation it is built upon. Learning expressive representations of images is therefore the centerpiece to almost every computer vision application, including image search, object detection and classification, human re-identification, object tracking, pose understanding, image-to-image translation, and embodied agent navigation to name a few. Deep Neural Networks are most often seen among the modern methods of representation learning. The limitation is, however, that deep representation learning methods require extremely large amounts of manually labeled data for training. Clearly, annotating vast amounts of images for various environments is infeasible due to cost and time constraints. This requirement of obtaining labeled data is a prime restriction regarding pace of the development of visual recognition systems.
In order to cope with the exponentially growing amounts of visual data generated daily, machine learning algorithms have to at least strive to scale at a similar rate.
The second challenge consists in the learned representations having to generalize to novel objects, classes, environments and tasks in order to accommodate to the diversity of the visual world.
Despite the evergrowing number of recent publications tangentially addressing the topic of learning generalizable representations, efficient generalization is yet to be achieved. This dissertation attempts to tackle the problem of learning visual representations that can generalize to novel settings while requiring few labeled examples.
In this research, we study the limitations of the existing supervised representation learning approaches and propose a framework that improves the generalization of learned features by exploiting visual similarities between images which are not captured by provided manual annotations. Furthermore, to mitigate the common requirement of large scale manually annotated datasets, we propose several approaches that can learn expressive representations without human-attributed labels, in a self-supervised fashion, by grouping highly-similar samples into surrogate classes based on progressively learned representations.
The development of computer vision as science is preconditioned upon the seamless ability of a machine to record and disentangle pictures' attributes that were expected to only be conceived by humans. As such, particular interest was dedicated to the ability to analyze the means of artistic expression and style which depicts a more complex task than merely breaking an image down to colors and pixels. The ultimate test for this ability is the task of style transfer which involves altering the style of an image while keeping its content. An effective solution of style transfer requires learning such image representation which would allow disentangling image style and its content.
Moreover, particular artistic styles come with idiosyncrasies that affect which content details should be preserved and which discarded.
Another pitfall here is that it is impossible to get pixel-wise annotations of style and how the style should be altered.
We address this problem by proposing an unsupervised approach that enables encoding the image content in such a way that is required by a particular style.
The proposed approach exchanges the style of an input image by first extracting the content representation in a style-aware way and then rendering it in a new style using a style-specific decoder network, achieving compelling results in image and video stylization.
Finally, we combine supervised and self-supervised representation learning techniques for the task of human and animals pose understanding. The proposed method enables transfer of the representation learned for recognition of human poses to proximal mammal species without using labeled animal images. This approach is not limited to dense pose estimation and could potentially enable autonomous agents from robots to self-driving cars to retrain themselves and adapt to novel environments based on learning from previous experiences
- …