26 research outputs found
Spatial-Temporal Fusion Graph Neural Networks for Traffic Flow Forecasting
Spatial-temporal data forecasting of traffic flow is a challenging task
because of complicated spatial dependencies and dynamical trends of temporal
pattern between different roads. Existing frameworks typically utilize given
spatial adjacency graph and sophisticated mechanisms for modeling spatial and
temporal correlations. However, limited representations of given spatial graph
structure with incomplete adjacent connections may restrict effective
spatial-temporal dependencies learning of those models. To overcome those
limitations, our paper proposes Spatial-Temporal Fusion Graph Neural Networks
(STFGNN) for traffic flow forecasting. SFTGNN could effectively learn hidden
spatial-temporal dependencies by a novel fusion operation of various spatial
and temporal graphs, which is generated by a data-driven method. Meanwhile, by
integrating this fusion graph module and a novel gated convolution module into
a unified layer, SFTGNN could handle long sequences. Experimental results on
several public traffic datasets demonstrate that our method achieves
state-of-the-art performance consistently than other baselines.Comment: 8 pages, 3 figures, to be published in AAAI2021. arXiv admin note:
text overlap with arXiv:1903.00919 by other author
Who Is the Rightful Owner? Young Children’s Ownership Judgments in Different Transfer Contexts
This study aimed to examine whether Chinese preschoolers understand that ownership can be transferred in different contexts. The study participants were 3- to 5-year-old Chinese children (n = 96) and adults (n = 34). With four scenarios that contained different transfer types (giving, stealing, losing, and abandoning), participants were asked four questions about ownership. The results indicated that preschoolers’ ability to distinguish legitimate ownership transfers from illegitimate ownership transfers improved with age. Three-year-olds understood that ownership cannot be transferred in a stealing context, but the appropriate understanding of ownership was not attained until 4 years old in a giving context and 5 years old in losing and abandoning contexts, which is similar to the adults’ performance. In addition to the first possessor bias (a tendency to judge the first possessor as the owner) found in previous studies, 3-year-olds also displayed a loan bias (a tendency to believe everything that is transferred should be returned) in the study. The findings suggest that the developmental trajectories of preschoolers’ understanding of ownership transfers varied across different contexts, which may relate to children’s ability to consider the role of intent in determining ownership and parents’ disciplinary behavior. Both cross-cultural similarities and differences are discussed
Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization
It is well-known that stochastic gradient noise (SGN) acts as implicit
regularization for deep learning and is essentially important for both
optimization and generalization of deep networks. Some works attempted to
artificially simulate SGN by injecting random noise to improve deep learning.
However, it turned out that the injected simple random noise cannot work as
well as SGN, which is anisotropic and parameter-dependent. For simulating SGN
at low computational costs and without changing the learning rate or batch
size, we propose the Positive-Negative Momentum (PNM) approach that is a
powerful alternative to conventional Momentum in classic optimizers. The
introduced PNM method maintains two approximate independent momentum terms.
Then, we can control the magnitude of SGN explicitly by adjusting the momentum
difference. We theoretically prove the convergence guarantee and the
generalization advantage of PNM over Stochastic Gradient Descent (SGD). By
incorporating PNM into the two conventional optimizers, SGD with Momentum and
Adam, our extensive experiments empirically verified the significant advantage
of the PNM-based variants over the corresponding conventional Momentum-based
optimizers.Comment: ICML 2021; 20 pages; 13 figures; Key Words: deep learning theory,
optimizer, momentum, generalization, gradient nois
Amata: An Annealing Mechanism for Adversarial Training Acceleration
Despite the empirical success in various domains, it has been revealed that
deep neural networks are vulnerable to maliciously perturbed input data that
much degrade their performance. This is known as adversarial attacks. To
counter adversarial attacks, adversarial training formulated as a form of
robust optimization has been demonstrated to be effective. However, conducting
adversarial training brings much computational overhead compared with standard
training. In order to reduce the computational cost, we propose an annealing
mechanism, Amata, to reduce the overhead associated with adversarial training.
The proposed Amata is provably convergent, well-motivated from the lens of
optimal control theory and can be combined with existing acceleration methods
to further enhance performance. It is demonstrated that on standard datasets,
Amata can achieve similar or better robustness with around 1/3 to 1/2 the
computational time compared with traditional methods. In addition, Amata can be
incorporated into other adversarial training acceleration algorithms (e.g.
YOPO, Free, Fast, and ATTA), which leads to further reduction in computational
time on large-scale problems.Comment: accepted by AAA
Towards Making Deep Transfer Learning Never Hurt
Transfer learning have been frequently used to improve deep neural network
training through incorporating weights of pre-trained networks as the
starting-point of optimization for regularization. While deep transfer learning
can usually boost the performance with better accuracy and faster convergence,
transferring weights from inappropriate networks hurts training procedure and
may lead to even lower accuracy. In this paper, we consider deep transfer
learning as minimizing a linear combination of empirical loss and regularizer
based on pre-trained weights, where the regularizer would restrict the training
procedure from lowering the empirical loss, with conflicted descent directions
(e.g., derivatives). Following the view, we propose a novel strategy making
regularization-based Deep Transfer learning Never Hurt (DTNH) that, for each
iteration of training procedure, computes the derivatives of the two terms
separately, then re-estimates a new descent direction that does not hurt the
empirical loss minimization while preserving the regularization affects from
the pre-trained weights. Extensive experiments have been done using common
transfer learning regularizers, such as L2-SP and knowledge distillation, on
top of a wide range of deep transfer learning benchmarks including Caltech, MIT
indoor 67, CIFAR-10 and ImageNet. The empirical results show that the proposed
descent direction estimation strategy DTNH can always improve the performance
of deep transfer learning tasks based on all above regularizers, even when
transferring pre-trained weights from inappropriate networks. All in all, DTNH
strategy can improve state-of-the-art regularizers in all cases with 0.1%--7%
higher accuracy in all experiments.Comment: 10 page
Individualized music induces theta-gamma phase-amplitude coupling in patients with disorders of consciousness
ObjectiveThis study aimed to determine whether patients with disorders of consciousness (DoC) could experience neural entrainment to individualized music, which explored the cross-modal influences of music on patients with DoC through phase-amplitude coupling (PAC). Furthermore, the study assessed the efficacy of individualized music or preferred music (PM) versus relaxing music (RM) in impacting patient outcomes, and examined the role of cross-modal influences in determining these outcomes.MethodsThirty-two patients with DoC [17 with vegetative state/unresponsive wakefulness syndrome (VS/UWS) and 15 with minimally conscious state (MCS)], alongside 16 healthy controls (HCs), were recruited for this study. Neural activities in the frontal–parietal network were recorded using scalp electroencephalography (EEG) during baseline (BL), RM and PM. Cerebral-acoustic coherence (CACoh) was explored to investigate participants’ abilitiy to track music, meanwhile, the phase-amplitude coupling (PAC) was utilized to evaluate the cross-modal influences of music. Three months post-intervention, the outcomes of patients with DoC were followed up using the Coma Recovery Scale-Revised (CRS-R).ResultsHCs and patients with MCS showed higher CACoh compared to VS/UWS patients within musical pulse frequency (p = 0.016, p = 0.045; p < 0.001, p = 0.048, for RM and PM, respectively, following Bonferroni correction). Only theta-gamma PAC demonstrated a significant interaction effect between groups and music conditions (F(2,44) = 2.685, p = 0.036). For HCs, the theta-gamma PAC in the frontal–parietal network was stronger in the PM condition compared to the RM (p = 0.016) and BL condition (p < 0.001). For patients with MCS, the theta-gamma PAC was stronger in the PM than in the BL (p = 0.040), while no difference was observed among the three music conditions in patients with VS/UWS. Additionally, we found that MCS patients who showed improved outcomes after 3 months exhibited evident neural responses to preferred music (p = 0.019). Furthermore, the ratio of theta-gamma coupling changes in PM relative to BL could predict clinical outcomes in MCS patients (r = 0.992, p < 0.001).ConclusionIndividualized music may serve as a potential therapeutic method for patients with DoC through cross-modal influences, which rely on enhanced theta-gamma PAC within the consciousness-related network
Spatial-temporal fusion graph neural networks for traffic flow forecasting
Spatial-temporal data forecasting of traffic flow is a challenging task because of complicated spatial dependencies and dynamical trends of temporal pattern between different roads. Existing frameworks typically utilize given spatial adjacency graph and sophisticated mechanisms for modeling spatial and temporal correlations. However, limited representations of given spatial graph structure with incomplete adjacent connections may restrict effective spatial-temporal dependencies learning of those models. Furthermore, existing methods are out at elbows when solving complicated spatial-temporal data: they usually utilize separate modules for spatial and temporal correlations, or they only use independent components capturing localized or global heterogeneous dependencies. To overcome those limitations, our paper proposes a novel Spatial-Temporal Fusion Graph Neural Networks (STFGNN) for traffic flow forecasting. First, a data-driven method of generating “temporal graph” is proposed to compensate several existing correlations that spatial graph may not reflect. SFTGNN could effectively learn hidden spatial-temporal dependencies by a novel fusion operation of various spatial and temporal graphs, treated for different time periods in parallel. Meanwhile, by integrating this fusion graph module and a novel gated convolution module into a unified layer, SFTGNN could handle long sequences by learning more spatial-temporal dependencies with layers stacked. Experimental results on several public traffic datasets demonstrate that our method achieves state-of-the-art performance consistently than other baselines