Deep neural networks (DNNs) are very effective for multichannel speech
enhancement with fixed array geometries. However, it is not trivial to use DNNs
for ad-hoc arrays with unknown order and placement of microphones. We propose a
novel triple-path network for ad-hoc array processing in the time domain. The
key idea in the network design is to divide the overall processing into spatial
processing and temporal processing and use self-attention for spatial
processing. Using self-attention for spatial processing makes the network
invariant to the order and the number of microphones. The temporal processing
is done independently for all channels using a recently proposed dual-path
attentive recurrent network. The proposed network is a multiple-input
multiple-output architecture that can simultaneously enhance signals at all
microphones. Experimental results demonstrate the excellent performance of the
proposed approach. Further, we present analysis to demonstrate the
effectiveness of the proposed network in utilizing multichannel information
even from microphones at far locations.Comment: Accepted for publication in INTERSPEECH 202