In this paper, we introduce an efficient framework to subtract the background from both visible and thermal imagery for pedestrians’ detection in the urban scene. We use a deep neural network (DNN) to train the background subtraction model. For the training of the DNN, we first generate an initial background map and then employ randomly 5% video frames, background map, and manually segmented ground truth. Then we apply a cognition-based post-processing to further smooth the foreground detection result. We evaluate our method against our previous work and 11 recently widely cited method on three challenge video series selected from a publicly available color-thermal benchmark dataset OCTBVS. Promising results have been shown that the proposed DNN-based approach can successfully detect the pedestrians with good shape in most scenes regardless of illuminate changes and occlusion problem