22 research outputs found
Adaptive Detrending for Accelerating the Training of Convolutional Recurrent Neural Networks
Convolutional recurrent neural networks (ConvRNNs) provide robust spatio-temporal information processing capabilities for contextual video recognition, but require extensive computation that slows down training. Inspired by detrending methods, we propose “adaptive detrending” (AD) for temporal normalization in order to accelerate the training of ConvRNNs, especially of convolutional gated recurrent unit (ConvGRU)
Adaptive detrending to accelerate convolutional gated recurrent unit training for contextual video recognition
Video image recognition has been extensively studied with rapid progress recently. However, most methods focus on short-term rather than long-term (contextual) video recognition. Convolutional recurrent neural networks (ConvRNNs) provide robust spatio-temporal information processing capabilities for contextual video recognition, but require extensive computation that slows down training. Inspired by normalization and detrending methods, in this paper we propose "adaptive detrending" (AD) for temporal normalization in order to accelerate the training of ConvRNNs, especially of convolutional gated recurrent unit (ConvGRU). For each neuron in a recurrent neural network (RNN), AD identifies the trending change within a sequence and subtracts it, removing the internal covariate shift. In experiments testing for contextual video recognition with ConvGRU, results show that (1) ConvGRU clearly outperforms feed-forward neural networks, (2) AD consistently and significantly accelerates training and improves generalization, (3) performance is further improved when AD is coupled with other normalization methods, and most importantly, (4) the more long-term contextual information is required, the more AD outperforms existing methods
An Improved Time Feedforward Connections Recurrent Neural Networks
Recurrent Neural Networks (RNNs) have been widely applied to deal with
temporal problems, such as flood forecasting and financial data processing. On
the one hand, traditional RNNs models amplify the gradient issue due to the
strict time serial dependency, making it difficult to realize a long-term
memory function. On the other hand, RNNs cells are highly complex, which will
significantly increase computational complexity and cause waste of
computational resources during model training. In this paper, an improved Time
Feedforward Connections Recurrent Neural Networks (TFC-RNNs) model was first
proposed to address the gradient issue. A parallel branch was introduced for
the hidden state at time t-2 to be directly transferred to time t without the
nonlinear transformation at time t-1. This is effective in improving the
long-term dependence of RNNs. Then, a novel cell structure named Single Gate
Recurrent Unit (SGRU) was presented. This cell structure can reduce the number
of parameters for RNNs cell, consequently reducing the computational
complexity. Next, applying SGRU to TFC-RNNs as a new TFC-SGRU model solves the
above two difficulties. Finally, the performance of our proposed TFC-SGRU was
verified through several experiments in terms of long-term memory and
anti-interference capabilities. Experimental results demonstrated that our
proposed TFC-SGRU model can capture helpful information with time step 1500 and
effectively filter out the noise. The TFC-SGRU model accuracy is better than
the LSTM and GRU models regarding language processing ability
Learning Representations that Support Extrapolation
Extrapolation -- the ability to make inferences that go beyond the scope of
one's experiences -- is a hallmark of human intelligence. By contrast, the
generalization exhibited by contemporary neural network algorithms is largely
limited to interpolation between data points in their training corpora. In this
paper, we consider the challenge of learning representations that support
extrapolation. We introduce a novel visual analogy benchmark that allows the
graded evaluation of extrapolation as a function of distance from the convex
domain defined by the training data. We also introduce a simple technique,
temporal context normalization, that encourages representations that emphasize
the relations between objects. We find that this technique enables a
significant improvement in the ability to extrapolate, considerably
outperforming a number of competitive techniques.Comment: ICML 202
Recommended from our members
Machine learning to model health with multimodal mobile sensor data
The widespread adoption of smartphones and wearables has led to the accumulation of rich datasets, which could aid the understanding of behavior and health in unprecedented detail. At the same time, machine learning and specifically deep learning have reached impressive performance in a variety of prediction tasks, but their use on time-series data appears challenging. Existing models struggle to learn from this unique type of data due to noise, sparsity, long-tailed distributions of behaviors, lack of labels, and multimodality.
This dissertation addresses these challenges by developing new models that leverage multi-task learning for accurate forecasting, multimodal fusion for improved population subtyping, and self-supervision for learning generalized representations. We apply our proposed methods to challenging real-world tasks of predicting mental health and cardio-respiratory fitness through sensor data.
First, we study the relationship of passive data as collected from smartphones (movement and background audio) to momentary mood levels. Our new training pipeline, which combines different sensor data into a low-dimensional embedding and clusters longitudinal user trajectories as outcome, outperforms traditional approaches based solely on psychology questionnaires. Second, motivated by mood instability as a predictor of poor mental health, we propose encoder-decoder models for time-series forecasting which exploit the bi-modality of mood with multi-task learning.
Next, motivated by the success of general-purpose models in vision and language tasks, we propose a self-supervised neural network ready-to-use as a feature extractor for wearable data. To this end, we set the heart rate responses as the supervisory signal for activity data, leveraging their underlying physiological relationship and show that the resulting task-agnostic embeddings can generalize in predicting structurally different downstream outcomes through transfer learning (e.g. BMI, age, energy expenditure), outperforming unsupervised autoencoders and biomarkers. Finally, acknowledging fitness as a strong predictor of overall health, which, however, can only be measured with expensive instruments (e.g., a VO2max test), we develop models that enable accurate prediction of fine-grained fitness levels with wearables in the present, and more importantly, its direction and magnitude almost a decade later.
All proposed methods are evaluated on large longitudinal datasets with tens of thousands of participants in the wild. The models developed and the insights drawn in this dissertation provide evidence for a better understanding of high-dimensional behavioral and physiological data with implications for large-scale health and lifestyle monitoring.The Department of Computer Science and Technology at the University of Cambridge through the EPSRC through Grant DTP (EP/N509620/1), and the Embiricos Trust Scholarship of Jesus College Cambridg
Physics-constrained Hyperspectral Data Exploitation Across Diverse Atmospheric Scenarios
Hyperspectral target detection promises new operational advantages, with increasing instrument spectral resolution and robust material discrimination. Resolving surface materials requires a fast and accurate accounting of atmospheric effects to increase detection accuracy while minimizing false alarms. This dissertation investigates deep learning methods constrained by the processes governing radiative transfer to efficiently perform atmospheric compensation on data collected by long-wave infrared (LWIR) hyperspectral sensors. These compensation methods depend on generative modeling techniques and permutation invariant neural network architectures to predict LWIR spectral radiometric quantities. The compensation algorithms developed in this work were examined from the perspective of target detection performance using collected data. These deep learning-based compensation algorithms resulted in comparable detection performance to established methods while accelerating the image processing chain by 8X
Intelligent Biosignal Analysis Methods
This book describes recent efforts in improving intelligent systems for automatic biosignal analysis. It focuses on machine learning and deep learning methods used for classification of different organism states and disorders based on biomedical signals such as EEG, ECG, HRV, and others
Improving Demand Forecasting: The Challenge of Forecasting Studies Comparability and a Novel Approach to Hierarchical Time Series Forecasting
Bedarfsprognosen sind in der Wirtschaft unerlässlich. Anhand des erwarteten Kundenbe-darfs bestimmen Firmen beispielsweise welche Produkte sie entwickeln, wie viele Fabri-ken sie bauen, wie viel Personal eingestellt wird oder wie viel Rohmaterial geordert wer-den muss. Fehleinschätzungen bei Bedarfsprognosen können schwerwiegende Auswir-kungen haben, zu Fehlentscheidungen führen, und im schlimmsten Fall den Bankrott einer Firma herbeiführen.
Doch in vielen Fällen ist es komplex, den tatsächlichen Bedarf in der Zukunft zu antizipie-ren. Die Einflussfaktoren können vielfältig sein, beispielsweise makroökonomische Ent-wicklung, das Verhalten von Wettbewerbern oder technologische Entwicklungen. Selbst wenn alle Einflussfaktoren bekannt sind, sind die Zusammenhänge und Wechselwirkun-gen häufig nur schwer zu quantifizieren.
Diese Dissertation trägt dazu bei, die Genauigkeit von Bedarfsprognosen zu verbessern.
Im ersten Teil der Arbeit wird im Rahmen einer überfassenden Übersicht über das gesamte Spektrum der Anwendungsfelder von Bedarfsprognosen ein neuartiger Ansatz eingeführt, wie Studien zu Bedarfsprognosen systematisch verglichen werden können und am Bei-spiel von 116 aktuellen Studien angewandt. Die Vergleichbarkeit von Studien zu verbes-sern ist ein wesentlicher Beitrag zur aktuellen Forschung. Denn anders als bspw. in der Medizinforschung, gibt es für Bedarfsprognosen keine wesentlichen vergleichenden quan-titativen Meta-Studien. Der Grund dafür ist, dass empirische Studien für Bedarfsprognosen keine vereinheitlichte Beschreibung nutzen, um ihre Daten, Verfahren und Ergebnisse zu beschreiben. Wenn Studien hingegen durch systematische Beschreibung direkt miteinan-der verglichen werden können, ermöglicht das anderen Forschern besser zu analysieren, wie sich Variationen in Ansätzen auf die Prognosegüte auswirken – ohne die aufwändige Notwendigkeit, empirische Experimente erneut durchzuführen, die bereits in Studien beschrieben wurden. Diese Arbeit führt erstmals eine solche Systematik zur Beschreibung ein.
Der weitere Teil dieser Arbeit behandelt Prognoseverfahren für intermittierende Zeitreihen, also Zeitreihen mit wesentlichem Anteil von Bedarfen gleich Null. Diese Art der Zeitreihen erfüllen die Anforderungen an Stetigkeit der meisten Prognoseverfahren nicht, weshalb gängige Verfahren häufig ungenügende Prognosegüte erreichen. Gleichwohl ist die Rele-vanz intermittierender Zeitreihen hoch – insbesondere Ersatzteile weisen dieses Bedarfs-muster typischerweise auf. Zunächst zeigt diese Arbeit in drei Studien auf, dass auch die getesteten Stand-der-Technik Machine Learning Ansätze bei einigen bekannten Datensät-zen keine generelle Verbesserung herbeiführen. Als wesentlichen Beitrag zur Forschung zeigt diese Arbeit im Weiteren ein neuartiges Verfahren auf: Der Similarity-based Time Series Forecasting (STSF) Ansatz nutzt ein Aggregation-Disaggregationsverfahren basie-rend auf einer selbst erzeugten Hierarchie statistischer Eigenschaften der Zeitreihen. In Zusammenhang mit dem STSF Ansatz können alle verfügbaren Prognosealgorithmen eingesetzt werden – durch die Aggregation wird die Stetigkeitsbedingung erfüllt. In Expe-rimenten an insgesamt sieben öffentlich bekannten Datensätzen und einem proprietären Datensatz zeigt die Arbeit auf, dass die Prognosegüte (gemessen anhand des Root Mean Square Error RMSE) statistisch signifikant um 1-5% im Schnitt gegenüber dem gleichen Verfahren ohne Einsatz von STSF verbessert werden kann. Somit führt das Verfahren eine wesentliche Verbesserung der Prognosegüte herbei.
Zusammengefasst trägt diese Dissertation zum aktuellen Stand der Forschung durch die zuvor genannten Verfahren wesentlich bei. Das vorgeschlagene Verfahren zur Standardi-sierung empirischer Studien beschleunigt den Fortschritt der Forschung, da sie verglei-chende Studien ermöglicht. Und mit dem STSF Verfahren steht ein Ansatz bereit, der zuverlässig die Prognosegüte verbessert, und dabei flexibel mit verschiedenen Arten von Prognosealgorithmen einsetzbar ist. Nach dem Erkenntnisstand der umfassenden Literatur-recherche sind keine vergleichbaren Ansätze bislang beschrieben worden