Search CORE

546 research outputs found

Deep Markov Random Field for Image Modeling

Author: A Dempster
Alex Graves
C Dong
E Ising
GE Hinton
GR Cross
H Rue
J Duchi
J Pearl
J Portilla
J Shotton
J Wright
James Hays
Jean-François Lalonde
M Schuster
PJ Werbos
R Timofte
S Geman
SC Zhu
SZ Li
WT Freeman
Publication venue
Publication date: 07/09/2016
Field of study

Markov Random Fields (MRFs), a formulation widely used in generative image modeling, have long been plagued by the lack of expressive power. This issue is primarily due to the fact that conventional MRFs formulations tend to use simplistic factors to capture local patterns. In this paper, we move beyond such limitations, and propose a novel MRF model that uses fully-connected neurons to express the complex interactions among pixels. Through theoretical analysis, we reveal an inherent connection between this model and recurrent neural networks, and thereon derive an approximated feed-forward network that couples multiple RNNs along opposite directions. This formulation combines the expressive power of deep neural networks and the cyclic dependency structure of MRF in a unified model, bringing the modeling capability to a new level. The feed-forward approximation also allows it to be efficiently learned from data. Experimental results on a variety of low-level vision tasks show notable improvement over state-of-the-arts.Comment: Accepted at ECCV 201

arXiv.org e-Print Archive

Crossref

Convolutional Bidirectional Variational Autoencoder for Image Domain Translation of Dotted Arabic Expiration

Author: Soliman Ghada
Zidane Ahmed
Publication venue
Publication date: 21/10/2023
Field of study

THIS paper proposes an approach of Ladder Bottom-up Convolutional Bidirectional Variational Autoencoder (LCBVAE) architecture for the encoder and decoder, which is trained on the image translation of the dotted Arabic expiration dates by reconstructing the Arabic dotted expiration dates into filled-in expiration dates. We employed a customized and adapted version of Convolutional Recurrent Neural Network CRNN model to meet our specific requirements and enhance its performance in our context, and then trained the custom CRNN model with the filled-in images from the year of 2019 to 2027 to extract the expiration dates and assess the model performance of LCBVAE on the expiration date recognition. The pipeline of (LCBVAE+CRNN) can be then integrated into an automated sorting systems for extracting the expiry dates and sorting the products accordingly during the manufacture stage. Additionally, it can overcome the manual entry of expiration dates that can be time-consuming and inefficient at the merchants. Due to the lack of the availability of the dotted Arabic expiration date images, we created an Arabic dot-matrix True Type Font (TTF) for the generation of the synthetic images. We trained the model with unrealistic synthetic dates of 59902 images and performed the testing on a realistic synthetic date of 3287 images from the year of 2019 to 2027, represented as yyyy/mm/dd. In our study, we demonstrated the significance of latent bottleneck layer with improving the generalization when the size is increased up to 1024 in downstream transfer learning tasks as for image translation. The proposed approach achieved an accuracy of 97% on the image translation with using the LCBVAE architecture that can be generalized for any downstream learning tasks as for image translation and reconstruction.Comment: 15 Pages, 10 figure

arXiv.org e-Print Archive

Filling the G_ap_s: Multivariate Time Series Imputation by Graph Neural Networks

Author: Andrea Cini
Cesare Alippi
Ivan Marisca
Publication venue
Publication date: 01/01/2021
Field of study

Dealing with missing values and incomplete time series is a labor-intensive, tedious, inevitable task when handling data coming from real-world applications. Effective spatio-temporal representations would allow imputation methods to reconstruct missing temporal data by exploiting information coming from sensors at different locations. However, standard methods fall short in capturing the nonlinear time and space dependencies existing within networks of interconnected sensors and do not take full advantage of the available - and often strong - relational information. Notably, most state-of-the-art imputation methods based on deep learning do not explicitly model relational aspects and, in any case, do not exploit processing frameworks able to adequately represent structured spatio-temporal data. Conversely, graph neural networks have recently surged in popularity as both expressive and scalable tools for processing sequential data with relational inductive biases. In this work, we present the first assessment of graph neural networks in the context of multivariate time series imputation. In particular, we introduce a novel graph neural network architecture, named GRIN, which aims at reconstructing missing data in the different channels of a multivariate time series by learning spatio-temporal representations through message passing. Empirical results show that our model outperforms state-of-the-art methods in the imputation task on relevant real-world benchmarks with mean absolute error improvements often higher than 20%.Comment: Accepted at ICLR 202

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Short-term forecasting of wind energy: A comparison of deep learning frameworks

Author: Cifuentes Quintero Jenny Alexandra
Marulanda Geovanny
Mora Elianne
Publication venue: 'MDPI AG'
Publication date: 01/12/2021
Field of study

Wind energy has been recognized as the most promising and economical renewable energy source, attracting increasing attention in recent years. However, considering the variability and uncertainty of wind energy, accurate forecasting is crucial to propel high levels of wind energy penetration within electricity markets. In this paper, a comparative framework is proposed where a suite of long short-term memory (LSTM) recurrent neural networks (RNN) models, inclusive of standard, bidirectional, stacked, convolutional, and autoencoder architectures, are implemented to address the existing gaps and limitations of reported wind power forecasting methodologies. These integrated networks are implemented through an iterative process of varying hyperparameters to better assess their effect, and the overall performance of each architecture, when tackling one-hour to three-hours ahead wind power forecasting. The corresponding validation is carried out through hourly wind power data from the Spanish electricity market, collected between 2014 and 2020. The proposed comparative error analysis shows that, overall, the models tend to showcase low error variability and better performance when the networks are able to learn in weekly sequences. The model with the best performance in forecasting one-hour ahead wind power is the stacked LSTM, implemented with weekly learning input sequences, with an average MAPE improvement of roughly 6, 7, and 49%, when compared to standard, bidirectional, and convolutional LSTM models, respectively. In the case of two to three-hours ahead forecasting, the model with the best overall performance is the bidirectional LSTM implemented with weekly learning input sequences, showcasing an average improved MAPE performance from 2 to 23% when compared to the other LSTM architectures implemented

Universidad Carlos III de Madrid e-Archivo

Complementary Time-Frequency Domain Networks for Dynamic Parallel MR Image Reconstruction

Author: Botnar René
Duan Jinming
Hajnal Joseph V.
Hammernik Kerstin
Küstner Thomas
Price Anthony N.
Prieto Claudia
Qin Chen
Rueckert Daniel
Schlemper Jo
Publication venue
Publication date: 18/06/2021
Field of study

Purpose: To introduce a novel deep learning based approach for fast and high-quality dynamic multi-coil MR reconstruction by learning a complementary time-frequency domain network that exploits spatio-temporal correlations simultaneously from complementary domains. Theory and Methods: Dynamic parallel MR image reconstruction is formulated as a multi-variable minimisation problem, where the data is regularised in combined temporal Fourier and spatial (x-f) domain as well as in spatio-temporal image (x-t) domain. An iterative algorithm based on variable splitting technique is derived, which alternates among signal de-aliasing steps in x-f and x-t spaces, a closed-form point-wise data consistency step and a weighted coupling step. The iterative model is embedded into a deep recurrent neural network which learns to recover the image via exploiting spatio-temporal redundancies in complementary domains. Results: Experiments were performed on two datasets of highly undersampled multi-coil short-axis cardiac cine MRI scans. Results demonstrate that our proposed method outperforms the current state-of-the-art approaches both quantitatively and qualitatively. The proposed model can also generalise well to data acquired from a different scanner and data with pathologies that were not seen in the training set. Conclusion: The work shows the benefit of reconstructing dynamic parallel MRI in complementary time-frequency domains with deep neural networks. The method can effectively and robustly reconstruct high-quality images from highly undersampled dynamic multi-coil data (

16 \times

and

24 \times

yielding 15s and 10s scan times respectively) with fast reconstruction speed (2.8s). This could potentially facilitate achieving fast single-breath-hold clinical 2D cardiac cine imaging.Comment: Accepted by Magnetic Resonance in Medicin

arXiv.org e-Print Archive

University of Birmingham Research Portal

King's Research Portal

ViTs are Everywhere: A Comprehensive Study Showcasing Vision Transformers in Different Domain

Author: Arnob Abu Bakor Hayat
Islam Md Shariful
Mia Md Sohag
Naim Abdu
Voban Abdullah Al Bary
Publication venue
Publication date: 13/10/2023
Field of study

Transformer design is the de facto standard for natural language processing tasks. The success of the transformer design in natural language processing has lately piqued the interest of researchers in the domain of computer vision. When compared to Convolutional Neural Networks (CNNs), Vision Transformers (ViTs) are becoming more popular and dominant solutions for many vision problems. Transformer-based models outperform other types of networks, such as convolutional and recurrent neural networks, in a range of visual benchmarks. We evaluate various vision transformer models in this work by dividing them into distinct jobs and examining their benefits and drawbacks. ViTs can overcome several possible difficulties with convolutional neural networks (CNNs). The goal of this survey is to show the first use of ViTs in CV. In the first phase, we categorize various CV applications where ViTs are appropriate. Image classification, object identification, image segmentation, video transformer, image denoising, and NAS are all CV applications. Our next step will be to analyze the state-of-the-art in each area and identify the models that are currently available. In addition, we outline numerous open research difficulties as well as prospective research possibilities.Comment: ICCD-2023. arXiv admin note: substantial text overlap with arXiv:2208.04309 by other author

arXiv.org e-Print Archive