Micro-expressions have drawn increasing interest lately due to various
potential applications. The task is, however, difficult as it incorporates many
challenges from the fields of computer vision, machine learning and emotional
sciences. Due to the spontaneous and subtle characteristics of
micro-expressions, the available training and testing data are limited, which
make evaluation complex. We show that data leakage and fragmented evaluation
protocols are issues among the micro-expression literature. We find that fixing
data leaks can drastically reduce model performance, in some cases even making
the models perform similarly to a random classifier. To this end, we go through
common pitfalls, propose a new standardized evaluation protocol using facial
action units with over 2000 micro-expression samples, and provide an open
source library that implements the evaluation protocols in a standardized
manner. Code will be available in \url{https://github.com/tvaranka/meb}