To realize reachability as well as reduce control costs of Boolean Control
Networks (BCNs) with state-flipped control, a reinforcement learning based
method is proposed to obtain flip kernels and the optimal policy with minimal
flipping actions to realize reachability. The method proposed is model-free and
of low computational complexity. In particular, Q-learning (QL), fast QL, and
small memory QL are proposed to find flip kernels. Fast QL and small memory QL
are two novel algorithms. Specifically, fast QL, namely, QL combined with
transfer-learning and special initial states, is of higher efficiency, and
small memory QL is applicable to large-scale systems. Meanwhile, we present a
novel reward setting, under which the optimal policy with minimal flipping
actions to realize reachability is the one of the highest returns. Then, to
obtain the optimal policy, we propose QL, and fast small memory QL for
large-scale systems. Specifically, on the basis of the small memory QL
mentioned before, the fast small memory QL uses a changeable reward setting to
speed up the learning efficiency while ensuring the optimality of the policy.
For parameter settings, we give some system properties for reference. Finally,
two examples, which are a small-scale system and a large-scale one, are
considered to verify the proposed method