Balanced truncation is a standard technique for model reduction of linear time invariant dynamical systems. The most expensive step is the numerical solution of a pair of Lyapunov matrix equations. We consider the direct computation of the dominant invariant subspace of a symmetric positive semidefinite matrix, which is given implicitly as the solution of a Lyapunov matrix equation. We show how to apply subspace iteration with Ritz acceleration in this setting. An n by n Lyapunov matrix equation is equivalent to a standard linear system with n 2 unknowns. Theoretically, it is possible to apply any Krylov subspace method to this linear system, but this option has not really been explored, because of the O(n2) flops and storage requirement. In this dissertation we show that it is possible to reduce these requirement to O(n) for Lyapunov equations with a low rank inhomogeneous right-hand side. We show how to accomplish the reduction for a variety of methods including GMRES, CG, BCG and CGNR. In each case the key observation is a special relationship between certain Krylov subspaces in [special characters omitted] and [special characters omitted]. It is theoretically possible to precondition a Lyapunov matrix equation which is written as a standard linear system. However, our investigation has revealed that the choice of preconditioners is extremely limited, if we are to keep the storage and flops count at O(n). Above all we have found that while it is certainly possible to reduce the resource requirements to O(n), the constants are too large to be competitive. The fundamental problem is that Krylov subspace methods are not taking advantage of the low rank phenomenon for Lyapunov matrix equations. Currently, the most successful Lyapunov matrix equations solver is the low rank cyclic Smith method. Central to this method is the automatic selection of certain shift parameters, preconditioners and the solution of certain linear systems. This is extremely difficult to accomplish in general, and the problem simplifies considerably in the special case in which the defining matrices can be reordered as narrow banded matrices. Currently, it is the solution of these narrow banded linear systems which is the bottleneck in an efficient parallel implementation of the low rank cyclic Smith method. In the final part of this dissertation we consider the parallel solution of narrow-banded linear systems. We do an error analysis of the truncated SPIKE algorithm which applies to systems which are strictly diagonally dominant by rows. Above all, we establish bounds on the decay rate of the spikes and the truncation error which are tight. We explain why this analysis only carries partially to the general case. Our analysis of the truncated SPIKE algorithm has immediate implications for the overlapping partition method (OPM). Finally we consider the question of reducing the amount of interprocessor communication during the solve phase for a general narrow banded linear system. The final conclusion is that such a system is essentially block diagonal in a sense which can be made very precise