We develop an approach for unsupervised learning of associations between
co-occurring perceptual events using a large graph. We applied this approach to
successfully solve the image captcha of China's railroad system. The approach
is based on the principle of suspicious coincidence. In this particular
problem, a user is presented with a deformed picture of a Chinese phrase and
eight low-resolution images. They must quickly select the relevant images in
order to purchase their train tickets. This problem presents several
challenges: (1) the teaching labels for both the Chinese phrases and the images
were not available for supervised learning, (2) no pre-trained deep
convolutional neural networks are available for recognizing these Chinese
phrases or the presented images, and (3) each captcha must be solved within a
few seconds. We collected 2.6 million captchas, with 2.6 million deformed
Chinese phrases and over 21 million images. From these data, we constructed an
association graph, composed of over 6 million vertices, and linked these
vertices based on co-occurrence information and feature similarity between
pairs of images. We then trained a deep convolutional neural network to learn a
projection of the Chinese phrases onto a 230-dimensional latent space. Using
label propagation, we computed the likelihood of each of the eight images
conditioned on the latent space projection of the deformed phrase for each
captcha. The resulting system solved captchas with 77% accuracy in 2 seconds on
average. Our work, in answering this practical challenge, illustrates the power
of this class of unsupervised association learning techniques, which may be
related to the brain's general strategy for associating language stimuli with
visual objects on the principle of suspicious coincidence.Comment: 8 pages, 7 figures, 14th Conference on Computer and Robot Vision 201