2,050 research outputs found
Neural Ideal Point Estimation Network
Understanding politics is challenging because the politics take the influence
from everything. Even we limit ourselves to the political context in the
legislative processes; we need a better understanding of latent factors, such
as legislators, bills, their ideal points, and their relations. From the
modeling perspective, this is difficult 1) because these observations lie in a
high dimension that requires learning on low dimensional representations, and
2) because these observations require complex probabilistic modeling with
latent variables to reflect the causalities. This paper presents a new model to
reflect and understand this political setting, NIPEN, including factors
mentioned above in the legislation. We propose two versions of NIPEN: one is a
hybrid model of deep learning and probabilistic graphical model, and the other
model is a neural tensor model. Our result indicates that NIPEN successfully
learns the manifold of the legislative bill texts, and NIPEN utilizes the
learned low-dimensional latent variables to increase the prediction performance
of legislators' votings. Additionally, by virtue of being a domain-rich
probabilistic model, NIPEN shows the hidden strength of the legislators' trust
network and their various characteristics on casting votes
Hierarchically Clustered Representation Learning
The joint optimization of representation learning and clustering in the
embedding space has experienced a breakthrough in recent years. In spite of the
advance, clustering with representation learning has been limited to flat-level
categories, which often involves cohesive clustering with a focus on instance
relations. To overcome the limitations of flat clustering, we introduce
hierarchically-clustered representation learning (HCRL), which simultaneously
optimizes representation learning and hierarchical clustering in the embedding
space. Compared with a few prior works, HCRL firstly attempts to consider a
generation of deep embeddings from every component of the hierarchy, not just
leaf components. In addition to obtaining hierarchically clustered embeddings,
we can reconstruct data by the various abstraction levels, infer the intrinsic
hierarchical structure, and learn the level-proportion features. We conducted
evaluations with image and text domains, and our quantitative analyses showed
competent likelihoods and the best accuracies compared with the baselines.Comment: 10 pages, 7 figures, Under review as a conference pape
Hierarchical Context enabled Recurrent Neural Network for Recommendation
A long user history inevitably reflects the transitions of personal interests
over time. The analyses on the user history require the robust sequential model
to anticipate the transitions and the decays of user interests. The user
history is often modeled by various RNN structures, but the RNN structures in
the recommendation system still suffer from the long-term dependency and the
interest drifts. To resolve these challenges, we suggest HCRNN with three
hierarchical contexts of the global, the local, and the temporary interests.
This structure is designed to withhold the global long-term interest of users,
to reflect the local sub-sequence interests, and to attend the temporary
interests of each transition. Besides, we propose a hierarchical context-based
gate structure to incorporate our \textit{interest drift assumption}. As we
suggest a new RNN structure, we support HCRNN with a complementary
\textit{bi-channel attention} structure to utilize hierarchical context. We
experimented the suggested structure on the sequential recommendation tasks
with CiteULike, MovieLens, and LastFM, and our model showed the best
performances in the sequential recommendations
Bivariate Beta-LSTM
Long Short-Term Memory (LSTM) infers the long term dependency through a cell
state maintained by the input and the forget gate structures, which models a
gate output as a value in [0,1] through a sigmoid function. However, due to the
graduality of the sigmoid function, the sigmoid gate is not flexible in
representing multi-modality or skewness. Besides, the previous models lack
modeling on the correlation between the gates, which would be a new method to
adopt inductive bias for a relationship between previous and current input.
This paper proposes a new gate structure with the bivariate Beta distribution.
The proposed gate structure enables probabilistic modeling on the gates within
the LSTM cell so that the modelers can customize the cell state flow with
priors and distributions. Moreover, we theoretically show the higher upper
bound of the gradient compared to the sigmoid function, and we empirically
observed that the bivariate Beta distribution gate structure provides higher
gradient values in training. We demonstrate the effectiveness of bivariate Beta
gate structure on the sentence classification, image classification, polyphonic
music modeling, and image caption generation.Comment: AAAI 202
Adversarial Dropout for Supervised and Semi-supervised Learning
Recently, the training with adversarial examples, which are generated by
adding a small but worst-case perturbation on input examples, has been proved
to improve generalization performance of neural networks. In contrast to the
individually biased inputs to enhance the generality, this paper introduces
adversarial dropout, which is a minimal set of dropouts that maximize the
divergence between the outputs from the network with the dropouts and the
training supervisions. The identified adversarial dropout are used to
reconfigure the neural network to train, and we demonstrated that training on
the reconfigured sub-network improves the generalization performance of
supervised and semi-supervised learning tasks on MNIST and CIFAR-10. We
analyzed the trained model to reason the performance improvement, and we found
that adversarial dropout increases the sparsity of neural networks more than
the standard dropout does.Comment: submitted to AAAI-1
Efficient Episodic Memory Utilization of Cooperative Multi-Agent Reinforcement Learning
In cooperative multi-agent reinforcement learning (MARL), agents aim to
achieve a common goal, such as defeating enemies or scoring a goal. Existing
MARL algorithms are effective but still require significant learning time and
often get trapped in local optima by complex tasks, subsequently failing to
discover a goal-reaching policy. To address this, we introduce Efficient
episodic Memory Utilization (EMU) for MARL, with two primary objectives: (a)
accelerating reinforcement learning by leveraging semantically coherent memory
from an episodic buffer and (b) selectively promoting desirable transitions to
prevent local convergence. To achieve (a), EMU incorporates a trainable
encoder/decoder structure alongside MARL, creating coherent memory embeddings
that facilitate exploratory memory recall. To achieve (b), EMU introduces a
novel reward structure called episodic incentive based on the desirability of
states. This reward improves the TD target in Q-learning and acts as an
additional incentive for desirable transitions. We provide theoretical support
for the proposed incentive and demonstrate the effectiveness of EMU compared to
conventional episodic control. The proposed method is evaluated in StarCraft II
and Google Research Football, and empirical results indicate further
performance improvement over state-of-the-art methods.Comment: Accepted at ICLR 202
Frequency Domain-based Dataset Distillation
This paper presents FreD, a novel parameterization method for dataset
distillation, which utilizes the frequency domain to distill a small-sized
synthetic dataset from a large-sized original dataset. Unlike conventional
approaches that focus on the spatial domain, FreD employs frequency-based
transforms to optimize the frequency representations of each data instance. By
leveraging the concentration of spatial domain information on specific
frequency components, FreD intelligently selects a subset of frequency
dimensions for optimization, leading to a significant reduction in the required
budget for synthesizing an instance. Through the selection of frequency
dimensions based on the explained variance, FreD demonstrates both theoretical
and empirical evidence of its ability to operate efficiently within a limited
budget, while better preserving the information of the original dataset
compared to conventional parameterization methods. Furthermore, based on the
orthogonal compatibility of FreD with existing methods, we confirm that FreD
consistently improves the performances of existing distillation methods over
the evaluation scenarios with different benchmark datasets. We release the code
at https://github.com/sdh0818/FreD.Comment: Accepted at NeurIPS 202
Implicit Kernel Attention
\textit{Attention} computes the dependency between representations, and it
encourages the model to focus on the important selective features.
Attention-based models, such as Transformers and graph attention networks (GAT)
are widely utilized for sequential data and graph-structured data. This paper
suggests a new interpretation and generalized structure of the attention in
Transformer and GAT. For the attention in Transformer and GAT, we derive that
the attention is a product of two parts: 1) the RBF kernel to measure the
similarity of two instances and 2) the exponential of norm to compute
the importance of individual instances. From this decomposition, we generalize
the attention in three ways. First, we propose implicit kernel attention with
an implicit kernel function, instead of manual kernel selection. Second, we
generalize norm as the norm. Third, we extend our attention to
structured multi-head attention. Our generalized attention shows better
performance on classification, translation, and regression tasks
Generalized Gumbel-Softmax Gradient Estimator for Various Discrete Random Variables
Estimating the gradients of stochastic nodes is one of the crucial research
questions in the deep generative modeling community, which enables the gradient
descent optimization on neural network parameters. This estimation problem
becomes further complex when we regard the stochastic nodes to be discrete
because pathwise derivative techniques cannot be applied. Hence, the stochastic
gradient estimation of discrete distributions requires either a score function
method or continuous relaxation of the discrete random variables. This paper
proposes a general version of the Gumbel-Softmax estimator with continuous
relaxation, and this estimator is able to relax the discreteness of probability
distributions including more diverse types, other than categorical and
Bernoulli. In detail, we utilize the truncation of discrete random variables
and the Gumbel-Softmax trick with a linear transformation for the relaxed
reparameterization. The proposed approach enables the relaxed discrete random
variable to be reparameterized and to backpropagated through a large scale
stochastic computational graph. Our experiments consist of (1) synthetic data
analyses, which show the efficacy of our methods; and (2) applications on VAE
and topic model, which demonstrate the value of the proposed estimation in
practices
- …