This work is dedicated to the algorithm design in a competitive framework,
with the primary goal of learning a stable equilibrium. We consider the dynamic
price competition between two firms operating within an opaque marketplace,
where each firm lacks information about its competitor. The demand follows the
multinomial logit (MNL) choice model, which depends on the consumers' observed
price and their reference price, and consecutive periods in the repeated games
are connected by reference price updates. We use the notion of stationary Nash
equilibrium (SNE), defined as the fixed point of the equilibrium pricing policy
for the single-period game, to simultaneously capture the long-run market
equilibrium and stability. We propose the online projected gradient ascent
algorithm (OPGA), where the firms adjust prices using the first-order
derivatives of their log-revenues that can be obtained from the market feedback
mechanism. Despite the absence of typical properties required for the
convergence of online games, such as strong monotonicity and variational
stability, we demonstrate that under diminishing step-sizes, the price and
reference price paths generated by OPGA converge to the unique SNE, thereby
achieving the no-regret learning and a stable market. Moreover, with
appropriate step-sizes, we prove that this convergence exhibits a rate of
O(1/t)