Many of diverse phenomena in nature often inherently encode both short and
long term temporal dependencies, short term dependencies especially resulting
from the direction of flow of time. In this respect, we discovered experimental
evidences suggesting that {\it interrelations} of these events are higher for
closer time stamps. However, to be able for attention based models to learn
these regularities in short term dependencies, it requires large amounts of
data which are often infeasible. This is due to the reason that, while they are
good at learning piece wised temporal dependencies, attention based models lack
structures that encode biases in time series. As a resolution, we propose a
simple and efficient method that enables attention layers to better encode
short term temporal bias of these data sets by applying learnable, adaptive
kernels directly to the attention matrices. For the experiments, we chose
various prediction tasks using Electronic Health Records (EHR) data sets since
they are great examples that have underlying long and short term temporal
dependencies. The results of our experiments show exceptional classification
results compared to best performing models on most of the task and data sets