We consider the problem of olfactory searches in a turbulent environment. We
focus on agents that respond solely to odor stimuli, with no access to spatial
perception nor prior information about the odor location. We ask whether
navigation strategies to a target can be learned robustly within a sequential
decision making framework. We develop a reinforcement learning algorithm using
a small set of interpretable olfactory states and train it with realistic
turbulent odor cues. By introducing a temporal memory, we demonstrate that two
salient features of odor traces, discretized in few olfactory states, are
sufficient to learn navigation in a realistic odor plume. Performance is
dictated by the sparse nature of turbulent plumes. An optimal memory exists
which ignores blanks within the plume and activates a recovery strategy outside
the plume. We obtain the best performance by letting agents learn their
recovery strategy and show that it is mostly casting cross wind, similar to
behavior observed in flying insects. The optimal strategy is robust to
substantial changes in the odor plumes, suggesting minor parameter tuning may
be sufficient to adapt to different environments.Comment: 18 pages, 8 figure