155 research outputs found
Hidden Two-Stream Convolutional Networks for Action Recognition
Analyzing videos of human actions involves understanding the temporal
relationships among video frames. State-of-the-art action recognition
approaches rely on traditional optical flow estimation methods to pre-compute
motion information for CNNs. Such a two-stage approach is computationally
expensive, storage demanding, and not end-to-end trainable. In this paper, we
present a novel CNN architecture that implicitly captures motion information
between adjacent frames. We name our approach hidden two-stream CNNs because it
only takes raw video frames as input and directly predicts action classes
without explicitly computing optical flow. Our end-to-end approach is 10x
faster than its two-stage baseline. Experimental results on four challenging
action recognition datasets: UCF101, HMDB51, THUMOS14 and ActivityNet v1.2 show
that our approach significantly outperforms the previous best real-time
approaches.Comment: Accepted at ACCV 2018, camera ready. Code available at
https://github.com/bryanyzhu/Hidden-Two-Strea
Mutually-Regularized Dual Collaborative Variational Auto-encoder for Recommendation Systems
Recently, user-oriented auto-encoders (UAEs) have been widely used in
recommender systems to learn semantic representations of users based on their
historical ratings. However, since latent item variables are not modeled in
UAE, it is difficult to utilize the widely available item content information
when ratings are sparse. In addition, whenever new items arrive, we need to
wait for collecting rating data for these items and retrain the UAE from
scratch, which is inefficient in practice. Aiming to address the above two
problems simultaneously, we propose a mutually-regularized dual collaborative
variational auto-encoder (MD-CVAE) for recommendation. First, by replacing
randomly initialized last layer weights of the vanilla UAE with stacked latent
item embeddings, MD-CVAE integrates two heterogeneous information sources,
i.e., item content and user ratings, into the same principled variational
framework where the weights of UAE are regularized by item content such that
convergence to a non-optima due to data sparsity can be avoided. In addition,
the regularization is mutual in that user ratings can also help the dual item
content module learn more recommendation-oriented item content embeddings.
Finally, we propose a symmetric inference strategy for MD-CVAE where the first
layer weights of the UAE encoder are tied to the latent item embeddings of the
UAE decoder. Through this strategy, no retraining is required to recommend
newly introduced items. Empirical studies show the effectiveness of MD-CVAE in
both normal and cold-start scenarios. Codes are available at
https://github.com/yaochenzhu/MD-CVAE
Deep Causal Reasoning for Recommendations
Traditional recommender systems aim to estimate a user's rating to an item
based on observed ratings from the population. As with all observational
studies, hidden confounders, which are factors that affect both item exposures
and user ratings, lead to a systematic bias in the estimation. Consequently, a
new trend in recommender system research is to negate the influence of
confounders from a causal perspective. Observing that confounders in
recommendations are usually shared among items and are therefore multi-cause
confounders, we model the recommendation as a multi-cause multi-outcome (MCMO)
inference problem. Specifically, to remedy confounding bias, we estimate
user-specific latent variables that render the item exposures independent
Bernoulli trials. The generative distribution is parameterized by a DNN with
factorized logistic likelihood and the intractable posteriors are estimated by
variational inference. Controlling these factors as substitute confounders,
under mild assumptions, can eliminate the bias incurred by multi-cause
confounders. Furthermore, we show that MCMO modeling may lead to high variance
due to scarce observations associated with the high-dimensional causal space.
Fortunately, we theoretically demonstrate that introducing user features as
pre-treatment variables can substantially improve sample efficiency and
alleviate overfitting. Empirical studies on simulated and real-world datasets
show that the proposed deep causal recommender shows more robustness to
unobserved confounders than state-of-the-art causal recommenders. Codes and
datasets are released at https://github.com/yaochenzhu/deep-deconf
AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents
Evaluating large language models (LLMs) as general-purpose agents is
essential for understanding their capabilities and facilitating their
integration into practical applications. However, the evaluation process
presents substantial challenges. A primary obstacle is the benchmarking of
agent performance across diverse scenarios within a unified framework,
especially in maintaining partially-observable environments and ensuring
multi-round interactions. Moreover, current evaluation frameworks mostly focus
on the final success rate, revealing few insights during the process and
failing to provide a deep understanding of the model abilities. To address
these challenges, we introduce AgentBoard, a pioneering comprehensive benchmark
and accompanied open-source evaluation framework tailored to analytical
evaluation of LLM agents. AgentBoard offers a fine-grained progress rate metric
that captures incremental advancements as well as a comprehensive evaluation
toolkit that features easy assessment of agents for multi-faceted analysis
through interactive visualization. This not only sheds light on the
capabilities and limitations of LLM agents but also propels the
interpretability of their performance to the forefront. Ultimately, AgentBoard
serves as a significant step towards demystifying agent behaviors and
accelerating the development of stronger LLM agents.Comment: Preprin
- …