The detection of Extreme Mass Ratio Inspirals (EMRIs) is intricate due to
their complex waveforms, extended duration, and low signal-to-noise ratio
(SNR), making them more challenging to be identified compared to compact binary
coalescences. While matched filtering-based techniques are known for their
computational demands, existing deep learning-based methods primarily handle
time-domain data and are often constrained by data duration and SNR. In
addition, most existing work ignores time-delay interferometry (TDI) and
applies the long-wavelength approximation in detector response calculations,
thus limiting their ability to handle laser frequency noise. In this study, we
introduce DECODE, an end-to-end model focusing on EMRI signal detection by
sequence modeling in the frequency domain. Centered around a dilated causal
convolutional neural network, trained on synthetic data considering TDI-1.5
detector response, DECODE can efficiently process a year's worth of
multichannel TDI data with an SNR of around 50. We evaluate our model on 1-year
data with accumulated SNR ranging from 50 to 120 and achieve a true positive
rate of 96.3% at a false positive rate of 1%, keeping an inference time of less
than 0.01 seconds. With the visualization of three showcased EMRI signals for
interpretability and generalization, DECODE exhibits strong potential for
future space-based gravitational wave data analyses.Comment: 13 pages, 5 figures, and 2 table