How to represent own body is one of the most interesting issues in cognitive developmental robotics which aims to understand the cognitive developmental processes that an intelligent robot would require and how to realize them in a physical entity. This paper presents a cognitive model how the robot acquires its own body representation, that is body scheme for the body surface. The internal observer assumption makes it difficult for a robot to associate the sensory information from different modalities because of the lacking of references between them that are usually given by the designer in the prenatal stage of the robot. Our model is based on cross-modal map learning among join, vision, and tactile sensor spaces by associating different pairs of sensor values when they are activated simultaneously. We show a preliminary experiment, and then discuss how our model can explain the reported phenomenon on body scheme and future issues