This paper considers a single-trajectory system identification problem for
linear systems under general nonlinear and/or time-varying policies with i.i.d.
random excitation noises. The problem is motivated by safe learning-based
control for constrained linear systems, where the safe policies during the
learning process are usually nonlinear and time-varying for satisfying the
state and input constraints. In this paper, we provide a non-asymptotic error
bound for least square estimation when the data trajectory is generated by any
nonlinear and/or time-varying policies as long as the generated state and
action trajectories are bounded. This significantly generalizes the existing
non-asymptotic guarantees for linear system identification, which usually
consider i.i.d. random inputs or linear policies. Interestingly, our error
bound is consistent with that for linear policies with respect to the
dependence on the trajectory length, system dimensions, and excitation levels.
Lastly, we demonstrate the applications of our results by safe learning with
robust model predictive control and provide numerical analysis