We introduce the cross-match test - an exact, distribution free,
high-dimensional hypothesis test as an intrinsic evaluation metric for word
embeddings. We show that cross-match is an effective means of measuring
distributional similarity between different vector representations and of
evaluating the statistical significance of different vector embedding models.
Additionally, we find that cross-match can be used to provide a quantitative
measure of linguistic similarity for selecting bridge languages for machine
translation. We demonstrate that the results of the hypothesis test align with
our expectations and note that the framework of two sample hypothesis testing
is not limited to word embeddings and can be extended to all vector
representations.Comment: Accepted to RepEval 2017: The Second Workshop on Evaluating Vector
Space Representations for NL