In this study, we delve into the dynamics of Wordle using data analysis and
machine learning. Our analysis initially focused on the correlation between the
date and the number of submitted results. Due to initial popularity bias, we
modeled stable data using an ARIMAX model with coefficient values of 9, 0, 2,
and weekdays/weekends as the exogenous variable. We found no significant
relationship between word attributes and hard mode results.
To predict word difficulty, we employed a Backpropagation Neural Network,
overcoming overfitting via feature engineering. We also used K-means
clustering, optimized at five clusters, to categorize word difficulty
numerically. Our findings indicate that on March 1st, 2023, around 12,884
results will be submitted and the word "eerie" averages 4.8 attempts, falling
into the hardest difficulty cluster.
We further examined the percentage of loyal players and their propensity to
undertake daily challenges. Our models underwent rigorous sensitivity analyses,
including ADF, ACF, PACF tests, and cross-validation, confirming their
robustness. Overall, our study provides a predictive framework for Wordle
gameplay based on date or a given five-letter word. Results have been
summarized and submitted to the Puzzle Editor of the New York Times.Comment: 25 Pages, 28 Figure