Reinforcement learning often needs to deal with the exponential growth of
states and actions when exploring optimal control in high-dimensional spaces
(often known as the curse of dimensionality). In this work, we address this
issue by learning the inherent structure of action-wise similar MDP to
appropriately balance the performance degradation versus sample/computational
complexity. In particular, we partition the action spaces into multiple groups
based on the similarity in transition distribution and reward function, and
build a linear decomposition model to capture the difference between the
intra-group transition kernel and the intra-group rewards. Both our theoretical
analysis and experiments reveal a \emph{surprising and counter-intuitive
result}: while a more refined grouping strategy can reduce the approximation
error caused by treating actions in the same group as identical, it also leads
to increased estimation error when the size of samples or the computation
resources is limited. This finding highlights the grouping strategy as a new
degree of freedom that can be optimized to minimize the overall performance
loss. To address this issue, we formulate a general optimization problem for
determining the optimal grouping strategy, which strikes a balance between
performance loss and sample/computational complexity. We further propose a
computationally efficient method for selecting a nearly-optimal grouping
strategy, which maintains its computational complexity independent of the size
of the action space