546 research outputs found
Deep Markov Random Field for Image Modeling
Markov Random Fields (MRFs), a formulation widely used in generative image
modeling, have long been plagued by the lack of expressive power. This issue is
primarily due to the fact that conventional MRFs formulations tend to use
simplistic factors to capture local patterns. In this paper, we move beyond
such limitations, and propose a novel MRF model that uses fully-connected
neurons to express the complex interactions among pixels. Through theoretical
analysis, we reveal an inherent connection between this model and recurrent
neural networks, and thereon derive an approximated feed-forward network that
couples multiple RNNs along opposite directions. This formulation combines the
expressive power of deep neural networks and the cyclic dependency structure of
MRF in a unified model, bringing the modeling capability to a new level. The
feed-forward approximation also allows it to be efficiently learned from data.
Experimental results on a variety of low-level vision tasks show notable
improvement over state-of-the-arts.Comment: Accepted at ECCV 201
Convolutional Bidirectional Variational Autoencoder for Image Domain Translation of Dotted Arabic Expiration
THIS paper proposes an approach of Ladder Bottom-up Convolutional
Bidirectional Variational Autoencoder (LCBVAE) architecture for the encoder and
decoder, which is trained on the image translation of the dotted Arabic
expiration dates by reconstructing the Arabic dotted expiration dates into
filled-in expiration dates. We employed a customized and adapted version of
Convolutional Recurrent Neural Network CRNN model to meet our specific
requirements and enhance its performance in our context, and then trained the
custom CRNN model with the filled-in images from the year of 2019 to 2027 to
extract the expiration dates and assess the model performance of LCBVAE on the
expiration date recognition. The pipeline of (LCBVAE+CRNN) can be then
integrated into an automated sorting systems for extracting the expiry dates
and sorting the products accordingly during the manufacture stage.
Additionally, it can overcome the manual entry of expiration dates that can be
time-consuming and inefficient at the merchants. Due to the lack of the
availability of the dotted Arabic expiration date images, we created an Arabic
dot-matrix True Type Font (TTF) for the generation of the synthetic images. We
trained the model with unrealistic synthetic dates of 59902 images and
performed the testing on a realistic synthetic date of 3287 images from the
year of 2019 to 2027, represented as yyyy/mm/dd. In our study, we demonstrated
the significance of latent bottleneck layer with improving the generalization
when the size is increased up to 1024 in downstream transfer learning tasks as
for image translation. The proposed approach achieved an accuracy of 97% on the
image translation with using the LCBVAE architecture that can be generalized
for any downstream learning tasks as for image translation and reconstruction.Comment: 15 Pages, 10 figure
Filling the G_ap_s: Multivariate Time Series Imputation by Graph Neural Networks
Dealing with missing values and incomplete time series is a labor-intensive,
tedious, inevitable task when handling data coming from real-world
applications. Effective spatio-temporal representations would allow imputation
methods to reconstruct missing temporal data by exploiting information coming
from sensors at different locations. However, standard methods fall short in
capturing the nonlinear time and space dependencies existing within networks of
interconnected sensors and do not take full advantage of the available - and
often strong - relational information. Notably, most state-of-the-art
imputation methods based on deep learning do not explicitly model relational
aspects and, in any case, do not exploit processing frameworks able to
adequately represent structured spatio-temporal data. Conversely, graph neural
networks have recently surged in popularity as both expressive and scalable
tools for processing sequential data with relational inductive biases. In this
work, we present the first assessment of graph neural networks in the context
of multivariate time series imputation. In particular, we introduce a novel
graph neural network architecture, named GRIN, which aims at reconstructing
missing data in the different channels of a multivariate time series by
learning spatio-temporal representations through message passing. Empirical
results show that our model outperforms state-of-the-art methods in the
imputation task on relevant real-world benchmarks with mean absolute error
improvements often higher than 20%.Comment: Accepted at ICLR 202
Short-term forecasting of wind energy: A comparison of deep learning frameworks
Wind energy has been recognized as the most promising and economical renewable energy source, attracting increasing attention in recent years. However, considering the variability and uncertainty of wind energy, accurate forecasting is crucial to propel high levels of wind energy penetration within electricity markets. In this paper, a comparative framework is proposed where a suite of long short-term memory (LSTM) recurrent neural networks (RNN) models, inclusive of standard, bidirectional, stacked, convolutional, and autoencoder architectures, are implemented to address the existing gaps and limitations of reported wind power forecasting methodologies. These integrated networks are implemented through an iterative process of varying hyperparameters to better assess their effect, and the overall performance of each architecture, when tackling one-hour to three-hours ahead wind power forecasting. The corresponding validation is carried out through hourly wind power data from the Spanish electricity market, collected between 2014 and 2020. The proposed comparative error analysis shows that, overall, the models tend to showcase low error variability and better performance when the networks are able to learn in weekly sequences. The model with the best performance in forecasting one-hour ahead wind power is the stacked LSTM, implemented with weekly learning input sequences, with an average MAPE improvement of roughly 6, 7, and 49%, when compared to standard, bidirectional, and convolutional LSTM models, respectively. In the case of two to three-hours ahead forecasting, the model with the best overall performance is the bidirectional LSTM implemented with weekly learning input sequences, showcasing an average improved MAPE performance from 2 to 23% when compared to the other LSTM architectures implemented
Complementary Time-Frequency Domain Networks for Dynamic Parallel MR Image Reconstruction
Purpose: To introduce a novel deep learning based approach for fast and
high-quality dynamic multi-coil MR reconstruction by learning a complementary
time-frequency domain network that exploits spatio-temporal correlations
simultaneously from complementary domains.
Theory and Methods: Dynamic parallel MR image reconstruction is formulated as
a multi-variable minimisation problem, where the data is regularised in
combined temporal Fourier and spatial (x-f) domain as well as in
spatio-temporal image (x-t) domain. An iterative algorithm based on variable
splitting technique is derived, which alternates among signal de-aliasing steps
in x-f and x-t spaces, a closed-form point-wise data consistency step and a
weighted coupling step. The iterative model is embedded into a deep recurrent
neural network which learns to recover the image via exploiting spatio-temporal
redundancies in complementary domains.
Results: Experiments were performed on two datasets of highly undersampled
multi-coil short-axis cardiac cine MRI scans. Results demonstrate that our
proposed method outperforms the current state-of-the-art approaches both
quantitatively and qualitatively. The proposed model can also generalise well
to data acquired from a different scanner and data with pathologies that were
not seen in the training set.
Conclusion: The work shows the benefit of reconstructing dynamic parallel MRI
in complementary time-frequency domains with deep neural networks. The method
can effectively and robustly reconstruct high-quality images from highly
undersampled dynamic multi-coil data ( and yielding 15s
and 10s scan times respectively) with fast reconstruction speed (2.8s). This
could potentially facilitate achieving fast single-breath-hold clinical 2D
cardiac cine imaging.Comment: Accepted by Magnetic Resonance in Medicin
ViTs are Everywhere: A Comprehensive Study Showcasing Vision Transformers in Different Domain
Transformer design is the de facto standard for natural language processing
tasks. The success of the transformer design in natural language processing has
lately piqued the interest of researchers in the domain of computer vision.
When compared to Convolutional Neural Networks (CNNs), Vision Transformers
(ViTs) are becoming more popular and dominant solutions for many vision
problems. Transformer-based models outperform other types of networks, such as
convolutional and recurrent neural networks, in a range of visual benchmarks.
We evaluate various vision transformer models in this work by dividing them
into distinct jobs and examining their benefits and drawbacks. ViTs can
overcome several possible difficulties with convolutional neural networks
(CNNs). The goal of this survey is to show the first use of ViTs in CV. In the
first phase, we categorize various CV applications where ViTs are appropriate.
Image classification, object identification, image segmentation, video
transformer, image denoising, and NAS are all CV applications. Our next step
will be to analyze the state-of-the-art in each area and identify the models
that are currently available. In addition, we outline numerous open research
difficulties as well as prospective research possibilities.Comment: ICCD-2023. arXiv admin note: substantial text overlap with
arXiv:2208.04309 by other author
- …