Rapid search and localization for nuclear sources can be an important aspect in preventing human harm from illicit material in dirty bombs or from contamination. In the case of a single mobile radiation detector, there are numerous challenges to overcome such as weak source intensity, multiple sources, background radiation, and the presence of obstructions, i.e., a non-convex environment. In this work, we investigate the sequential decision making capability of deep reinforcement learning in the nuclear source search context. A novel neural network architecture (RAD-A2C) based on the advantage actor critic (A2C) framework and a particle filter gated recurrent unit for localization is proposed. Performance is studied in a randomized 20×20 role= presentation style= box-sizing: border-box; max-height: none; display: inline; line-height: normal; font-size: 13.2px; overflow-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; color: rgb(34, 34, 34); font-family: Arial, Arial, Helvetica, sans-serif; position: relative; \u3e20×2020×20 m convex and non-convex simulation environment across a range of signal-to-noise ratio (SNR)s for a single detector and single source. RAD-A2C performance is compared to both an information-driven controller that uses a bootstrap particle filter and to a gradient search (GS) algorithm. We find that the RAD-A2C has comparable performance to the information-driven controller across SNR in a convex environment. The RAD-A2C far outperforms the GS algorithm in the non-convex environment with greater than 95% role= presentation style= box-sizing: border-box; max-height: none; display: inline; line-height: normal; font-size: 13.2px; overflow-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; color: rgb(34, 34, 34); font-family: Arial, Arial, Helvetica, sans-serif; position: relative; \u3e95%95% median completion rate for up to seven obstructions