First, a combination of multiple strategies were used to conduct preliminary screening of subdomain names, including dictionary enumeration, search engine mining, and website information crawling. Then, the improved Markov model algorithm was used to analyze the filtered data, and new subdomain names were generated and added to the result set. Thereafter, the data was checked and verified for authenticity. If the data did not meet the criteria, the analysis process was repeated. Finally, a rigorously validated data set was formed. The proposed algorithm significantly improves the efficiency and coverage of subdomain discovery, effectively making up for the shortcomings of traditional methods