We present a new method to localize a camera within a previously unseen
environment perceived from an egocentric point of view. Although this is, in
general, an ill-posed problem, humans can effortlessly and efficiently
determine their relative location and orientation and navigate into a
previously unseen environments, e.g., finding a specific item in a new grocery
store. To enable such a capability, we design a new egocentric representation,
which we call ECO (Egocentric COgnitive map). ECO is biologically inspired, by
the cognitive map that allows human navigation, and it encodes the surrounding
visual semantics with respect to both distance and orientation. ECO possesses
three main properties: (1) reconfigurability: complex semantics and geometry is
captured via the synthesis of atomic visual representations (e.g., image
patch); (2) robustness: the visual semantics are registered in a geometrically
consistent way (e.g., aligning with respect to the gravity vector,
frontalizing, and rescaling to canonical depth), thus enabling us to learn
meaningful atomic representations; (3) adaptability: a domain adaptation
framework is designed to generalize the learned representation without manual
calibration. As a proof-of-concept, we use ECO to localize a camera within
real-world scenes---various grocery stores---and demonstrate performance
improvements when compared to existing semantic localization approaches