116 research outputs found

    The Time-Varying Risk Return Tradeoff in the Long-Run

    Get PDF
    Lundblad(2007,JFE) shows that the risk-return tradeoff is unequivocally positive with a two-century history of equity market data. A further examination of the relation with the UK monthly stock returns from 1836 to 2010 produces rather weak risk-return relation. I show that the risk-return relation is mostly positive but varies considerably over time based on a new nonlinear ICAPM with multivariate GARCH-M terms with the time-varying risk-return tradeoffs and hedging coefficients. The often observed negative risk-return relation is also statistically insignificant with the 95 % confidence bounds. The hedging coefficients also vary significantly across time. This complex nonlinearity seems to be the main culprit of the weak risk-return relation

    New return anomalies and new-Keynesian ICAPM

    Get PDF
    AbstractI propose a new multi-factor asset pricing model with new-Keynesian factors to explain stock return anomalies from 1972Q1 to 2009Q2. This new model explains the average returns across testing portfolios formed on financial distress, momentum, and standardized unexpected earnings with misspecification-robust statistics. Test portfolios formed on net stock issues and total accruals are also partly explained by new-Keynesian factors. Two monetary policy factors play an important role in explaining these new anomalies. The credit aspect of these new anomalies suggests an economic rationale for the model through capital market imperfections and the credit channel of monetary policy mechanism

    Curve Your Attention: Mixed-Curvature Transformers for Graph Representation Learning

    Full text link
    Real-world graphs naturally exhibit hierarchical or cyclical structures that are unfit for the typical Euclidean space. While there exist graph neural networks that leverage hyperbolic or spherical spaces to learn representations that embed such structures more accurately, these methods are confined under the message-passing paradigm, making the models vulnerable against side-effects such as oversmoothing and oversquashing. More recent work have proposed global attention-based graph Transformers that can easily model long-range interactions, but their extensions towards non-Euclidean geometry are yet unexplored. To bridge this gap, we propose Fully Product-Stereographic Transformer, a generalization of Transformers towards operating entirely on the product of constant curvature spaces. When combined with tokenized graph Transformers, our model can learn the curvature appropriate for the input graph in an end-to-end fashion, without the need of additional tuning on different curvature initializations. We also provide a kernelized approach to non-Euclidean attention, which enables our model to run in time and memory cost linear to the number of nodes and edges while respecting the underlying geometry. Experiments on graph reconstruction and node classification demonstrate the benefits of generalizing Transformers to the non-Euclidean domain.Comment: 19 pages, 7 figure

    Transformers meet Stochastic Block Models: Attention with Data-Adaptive Sparsity and Cost

    Full text link
    To overcome the quadratic cost of self-attention, recent works have proposed various sparse attention modules, most of which fall under one of two groups: 1) sparse attention under a hand-crafted patterns and 2) full attention followed by a sparse variant of softmax such as α\alpha-entmax. Unfortunately, the first group lacks adaptability to data while the second still requires quadratic cost in training. In this work, we propose SBM-Transformer, a model that resolves both problems by endowing each attention head with a mixed-membership Stochastic Block Model (SBM). Then, each attention head data-adaptively samples a bipartite graph, the adjacency of which is used as an attention mask for each input. During backpropagation, a straight-through estimator is used to flow gradients beyond the discrete sampling step and adjust the probabilities of sampled edges based on the predictive loss. The forward and backward cost are thus linear to the number of edges, which each attention head can also choose flexibly based on the input. By assessing the distribution of graphs, we theoretically show that SBM-Transformer is a universal approximator for arbitrary sequence-to-sequence functions in expectation. Empirical evaluations under the LRA and GLUE benchmarks demonstrate that our model outperforms previous efficient variants as well as the original Transformer with full attention. Our implementation can be found in https://github.com/sc782/SBM-Transformer .Comment: 19 pages, 8 figure

    Grouping-matrix based Graph Pooling with Adaptive Number of Clusters

    Full text link
    Graph pooling is a crucial operation for encoding hierarchical structures within graphs. Most existing graph pooling approaches formulate the problem as a node clustering task which effectively captures the graph topology. Conventional methods ask users to specify an appropriate number of clusters as a hyperparameter, then assume that all input graphs share the same number of clusters. In inductive settings where the number of clusters can vary, however, the model should be able to represent this variation in its pooling layers in order to learn suitable clusters. Thus we propose GMPool, a novel differentiable graph pooling architecture that automatically determines the appropriate number of clusters based on the input data. The main intuition involves a grouping matrix defined as a quadratic form of the pooling operator, which induces use of binary classification probabilities of pairwise combinations of nodes. GMPool obtains the pooling operator by first computing the grouping matrix, then decomposing it. Extensive evaluations on molecular property prediction tasks demonstrate that our method outperforms conventional methods.Comment: 10 pages, 3 figure

    3D Denoisers are Good 2D Teachers: Molecular Pretraining via Denoising and Cross-Modal Distillation

    Full text link
    Pretraining molecular representations from large unlabeled data is essential for molecular property prediction due to the high cost of obtaining ground-truth labels. While there exist various 2D graph-based molecular pretraining approaches, these methods struggle to show statistically significant gains in predictive performance. Recent work have thus instead proposed 3D conformer-based pretraining under the task of denoising, which led to promising results. During downstream finetuning, however, models trained with 3D conformers require accurate atom-coordinates of previously unseen molecules, which are computationally expensive to acquire at scale. In light of this limitation, we propose D&D, a self-supervised molecular representation learning framework that pretrains a 2D graph encoder by distilling representations from a 3D denoiser. With denoising followed by cross-modal knowledge distillation, our approach enjoys use of knowledge obtained from denoising as well as painless application to downstream tasks with no access to accurate conformers. Experiments on real-world molecular property prediction datasets show that the graph encoder trained via D&D can infer 3D information based on the 2D graph and shows superior performance and label-efficiency against other baselines.Comment: 16 pages, 5 figure
    • …
    corecore