337 research outputs found
Diversification Quotients: Quantifying Diversification via Risk Measures
To overcome several limitations of existing diversification indices, we
introduce the diversification quotient (DQ). Defined through a parametric
family of risk measures, DQ satisfies three natural properties, namely,
non-negativity, location invariance and scale invariance, which are shown to be
conflicting for any traditional diversification index based on a single risk
measure. We pay special attention to the two most important classes of risk
measures in banking and insurance, the Value-at-Risk (VaR) and the Expected
Shortfall (ES, also called CVaR). DQs based on VaR and ES enjoy many convenient
technical properties, and they are efficient to optimize in portfolio
selection. By analyzing the popular multivariate models of elliptical and
regular varying distributions, we find that DQ can properly capture tail
heaviness and common shocks which are neglected by traditional diversification
indices. When illustrated with financial data, DQ is intuitive to interpret,
and its performance is competitive when contrasted with other diversification
methods in portfolio optimization
Empower Sequence Labeling with Task-Aware Neural Language Model
Linguistic sequence labeling is a general modeling approach that encompasses
a variety of problems, such as part-of-speech tagging and named entity
recognition. Recent advances in neural networks (NNs) make it possible to build
reliable models without handcrafted features. However, in many cases, it is
hard to obtain sufficient annotations to train these models. In this study, we
develop a novel neural framework to extract abundant knowledge hidden in raw
texts to empower the sequence labeling task. Besides word-level knowledge
contained in pre-trained word embeddings, character-aware neural language
models are incorporated to extract character-level knowledge. Transfer learning
techniques are further adopted to mediate different components and guide the
language model towards the key knowledge. Comparing to previous methods, these
task-specific knowledge allows us to adopt a more concise model and conduct
more efficient training. Different from most transfer learning methods, the
proposed framework does not rely on any additional supervision. It extracts
knowledge from self-contained order information of training sequences.
Extensive experiments on benchmark datasets demonstrate the effectiveness of
leveraging character-level knowledge and the efficiency of co-training. For
example, on the CoNLL03 NER task, model training completes in about 6 hours on
a single GPU, reaching F1 score of 91.710.10 without using any extra
annotation.Comment: AAAI 201
ODE-based Recurrent Model-free Reinforcement Learning for POMDPs
Neural ordinary differential equations (ODEs) are widely recognized as the
standard for modeling physical mechanisms, which help to perform approximate
inference in unknown physical or biological environments. In partially
observable (PO) environments, how to infer unseen information from raw
observations puzzled the agents. By using a recurrent policy with a compact
context, context-based reinforcement learning provides a flexible way to
extract unobservable information from historical transitions. To help the agent
extract more dynamics-related information, we present a novel ODE-based
recurrent model combines with model-free reinforcement learning (RL) framework
to solve partially observable Markov decision processes (POMDPs). We
experimentally demonstrate the efficacy of our methods across various PO
continuous control and meta-RL tasks. Furthermore, our experiments illustrate
that our method is robust against irregular observations, owing to the ability
of ODEs to model irregularly-sampled time series.Comment: Accepted by NeurIPS 202
Understanding the Difficulty of Training Transformers
Transformers have proved effective in many NLP tasks. However, their training
requires non-trivial efforts regarding designing cutting-edge optimizers and
learning rate schedulers carefully (e.g., conventional SGD fails to train
Transformers effectively). Our objective here is to understand from both empirical and theoretical
perspectives. Our analysis reveals that unbalanced gradients are not the root
cause of the instability of training. Instead, we identify an amplification
effect that influences training substantially -- for each layer in a
multi-layer Transformer model, heavy dependency on its residual branch makes
training unstable, since it amplifies small parameter perturbations (e.g.,
parameter updates) and results in significant disturbances in the model output.
Yet we observe that a light dependency limits the model potential and leads to
inferior trained models. Inspired by our analysis, we propose Admin
(aptive odel itialization) to stabilize
stabilize the early stage's training and unleash its full potential in the late
stage. Extensive experiments show that Admin is more stable, converges faster,
and leads to better performance. Implementations are released at:
https://github.com/LiyuanLucasLiu/Transforemr-Clinic.Comment: EMNLP 202
- …