Content distribution over networks is often achieved by using mirror sites
that hold copies of files or portions thereof to avoid congestion and delay
issues arising from excessive demands to a single location. Accordingly, there
are distributed storage solutions that divide the file into pieces and place
copies of the pieces (replication) or coded versions of the pieces (coding) at
multiple source nodes. We consider a network which uses network coding for
multicasting the file. There is a set of source nodes that contains either
subsets or coded versions of the pieces of the file. The cost of a given
storage solution is defined as the sum of the storage cost and the cost of the
flows required to support the multicast. Our interest is in finding the storage
capacities and flows at minimum combined cost. We formulate the corresponding
optimization problems by using the theory of information measures. In
particular, we show that when there are two source nodes, there is no loss in
considering subset sources. For three source nodes, we derive a tight upper
bound on the cost gap between the coded and uncoded cases. We also present
algorithms for determining the content of the source nodes.Comment: IEEE Trans. on Information Theory (to appear), 201