Online searching for healthcare information has gradually become a widely used internet case. Suppose a patient suffers the symptom but is unsure of the action he needs to take, a self-diagnosis tool can help the patient identify the possible conditions and whether this patient needs to seek immediate medical help. However, the accuracy and quality of the service provided by those self-diagnosis tools are still disappointing and need further improvement. This thesis focuses on an automatic differential diagnosis task with a comprehensive evaluation of reinforcement learning methods. Also, we present a systematic method to simulate medically correct patients records, which integrates a standard symptom modeling approach called NLICE. In this way, we can bridge the gap between limited available patients records and data-driven healthcare methodologies. This project investigates both flat-RL methods and hierarchical RL in an automatic differential diagnosis setting and evaluates the performance of those two kinds of methods on simulated patients records. More specifically, the action space for the differential diagnosis task is inevitably large, so the flat-RL performs relatively poorly in complicated scenarios. The hierarchical RL method can split a complex diagnosis task into smaller tasks: it contains two-level of policy learning, and each low-layer policy imitates one medical specialty. Therefore hierarchical RL method increases the Top 1 success rate from 23.1\% in flat-RL method to 45.4\%.Besides the advanced policy learning strategy, this thesis explores the ability of NLICE symptom modeling in distinguishing conditions that share similar symptoms. The experimental results experience increases in flat-RL and hierarchical RL models and finally achieve 36.2\% and 71.8\% Top 1 success rates, respectively. To further solve the sparse action space problem in the automatic diagnosis domain, the reward shaping algorithm is implemented in the reward configuration part. The average gained reward of hierarchical RL increases from -3.65 to 0.87. Additionally, we model the general demographic background of patients and utilize contextual information to perform the policy transformation strategy, which eliminates the miss classification problem in highly sex-age related diseases.Electrical Engineerin