23 research outputs found
Super-resolution of synthetic aperture radar complex data by deep-learning
One of the greatest limitations of Synthetic Aperture Radar imagery is the capability to obtain an arbitrarily high spatial resolution. Indeed, despite optical sensors, this capability is not just limited by the sensor technology. Instead, improving the SAR spatial resolution requires large transmitted bandwidth and relatively long synthetic apertures that for regulatory and practical reasons are impossible to be met. This issue gets particularly relevant when dealing with Stripmap mode acquisitions and with relatively low carrier frequency sensors (where relatively large bandwidth signals are more difficult to be transmitted). To overcome this limitation, in this paper a deep learning based framework is proposed to enhance the SAR image spatial resolution while retaining the complex image accuracy. Results on simuated and real SAR data demonstrate the effectiveness of the proposed framework
Early Classifying Multimodal Sequences
Often pieces of information are received sequentially over time. When did one
collect enough such pieces to classify? Trading wait time for decision
certainty leads to early classification problems that have recently gained
attention as a means of adapting classification to more dynamic environments.
However, so far results have been limited to unimodal sequences. In this pilot
study, we expand into early classifying multimodal sequences by combining
existing methods. We show our new method yields experimental AUC advantages of
up to 8.7%.Comment: 7 pages, 5 figure
Intent recognition in smart living through deep recurrent neural networks
Electroencephalography (EEG) signal based intent recognition has recently
attracted much attention in both academia and industries, due to helping the
elderly or motor-disabled people controlling smart devices to communicate with
outer world. However, the utilization of EEG signals is challenged by low
accuracy, arduous and time- consuming feature extraction. This paper proposes a
7-layer deep learning model to classify raw EEG signals with the aim of
recognizing subjects' intents, to avoid the time consumed in pre-processing and
feature extraction. The hyper-parameters are selected by an Orthogonal Array
experiment method for efficiency. Our model is applied to an open EEG dataset
provided by PhysioNet and achieves the accuracy of 0.9553 on the intent
recognition. The applicability of our proposed model is further demonstrated by
two use cases of smart living (assisted living with robotics and home
automation).Comment: 10 pages, 5 figures,5 tables, 21 conference
Public Transit Arrival Prediction: a Seq2Seq RNN Approach
Arrival/Travel times for public transit exhibit variability on account of
factors like seasonality, dwell times at bus stops, traffic signals, travel
demand fluctuation etc. The developing world in particular is plagued by
additional factors like lack of lane discipline, excess vehicles, diverse modes
of transport and so on. This renders the bus arrival time prediction (BATP) to
be a challenging problem especially in the developing world. A novel
data-driven model based on recurrent neural networks (RNNs) is proposed for
BATP (in real-time) in the current work. The model intelligently incorporates
both spatial and temporal correlations in a unique (non-linear) fashion
distinct from existing approaches. In particular, we propose a Gated Recurrent
Unit (GRU) based Encoder-Decoder(ED) OR Seq2Seq RNN model (originally
introduced for language translation) for BATP. The geometry of the dynamic real
time BATP problem enables a nice fit with the Encoder-Decoder based RNN
structure. We feed relevant additional synchronized inputs (from previous
trips) at each step of the decoder (a feature classically unexplored in machine
translation applications). Further motivated from accurately modelling
congestion influences on travel time prediction, we additionally propose to use
a bidirectional layer at the decoder (something unexplored in other time-series
based ED application contexts). The effectiveness of the proposed algorithms is
demonstrated on real field data collected from challenging traffic conditions.
Our experiments indicate that the proposed method outperforms diverse existing
state-of-art data-driven approaches proposed for the same problem
Nonparametric Classification on Low Dimensional Manifolds using Overparameterized Convolutional Residual Networks
Convolutional residual neural networks (ConvResNets), though
overparameterized, can achieve remarkable prediction performance in practice,
which cannot be well explained by conventional wisdom. To bridge this gap, we
study the performance of ConvResNeXts, which cover ConvResNets as a special
case, trained with weight decay from the perspective of nonparametric
classification. Our analysis allows for infinitely many building blocks in
ConvResNeXts, and shows that weight decay implicitly enforces sparsity on these
blocks. Specifically, we consider a smooth target function supported on a
low-dimensional manifold, then prove that ConvResNeXts can adapt to the
function smoothness and low-dimensional structures and efficiently learn the
function without suffering from the curse of dimensionality. Our findings
partially justify the advantage of overparameterized ConvResNeXts over
conventional machine learning models.Comment: 20 pages, 1 figur
TriPlaneNet: An Encoder for EG3D Inversion
Recent progress in NeRF-based GANs has introduced a number of approaches for
high-resolution and high-fidelity generative modeling of human heads with a
possibility for novel view rendering. At the same time, one must solve an
inverse problem to be able to re-render or modify an existing image or video.
Despite the success of universal optimization-based methods for 2D GAN
inversion, those applied to 3D GANs may fail to extrapolate the result onto the
novel view, whereas optimization-based 3D GAN inversion methods are
time-consuming and can require at least several minutes per image. Fast
encoder-based techniques, such as those developed for StyleGAN, may also be
less appealing due to the lack of identity preservation. Our work introduces a
fast technique that bridges the gap between the two approaches by directly
utilizing the tri-plane representation presented for the EG3D generative model.
In particular, we build upon a feed-forward convolutional encoder for the
latent code and extend it with a fully-convolutional predictor of tri-plane
numerical offsets. The renderings are similar in quality to the ones produced
by optimization-based techniques and outperform the ones by encoder-based
methods. As we empirically prove, this is a consequence of directly operating
in the tri-plane space, not in the GAN parameter space, while making use of an
encoder-based trainable approach. Finally, we demonstrate significantly more
correct embedding of a face image in 3D than for all the baselines, further
strengthened by a probably symmetric prior enabled during training.Comment: Project page: https://anantarb.github.io/triplanene