Time domain classification of transient RFI

Abstract

Since the emergence of radio astronomy as a field, it has been afflicted by radio frequency interference (RFI). RFI continues to present a problem despite increasingly sophisticated countermeasures developed over the decades. Due to technological improvements, radio telescopes have become more sensitive (for example, MeerKAT’s L-band receiver). Existing RFI has become more prominent as a result. At the same time, the prevalence of RFI-generating devices has increased as new technologies have been adopted by society. Many approaches have been developed for mitigating RFI, which are typically used in concert. New telescope arrays are often built far from human habitation in radio-quiet reserves. In South Africa, a radio-quiet reserve has been established in which several world class instruments are under construction. Despite the remote location of the reserve, careful attention is paid to the possibility of RFI. For example, some instruments will begin observations while others are still under construction. The infrastructure and equipment related to the construction work may increase the risk of RFI, especially transient RFI. A number of mitigation strategies have been employed, including the use of fixed and mobile RFI monitoring stations. Such stations operate independently of the main telescope arrays and continuously monitor a wide bandwidth in all directions. They are capable of recording spectra and high resolution time domain captures of transient RFI. Once detected, and if identified, an RFI source can be found and dealt with. The ability to identify the sources of detected RFI would be highly beneficial. Continuous wave intentional transmissions (telecommunication signals for example) are easily identified as they are required to adhere to allocated frequency bands. Transient RFI signals, however, are significantly more challenging to identify since they are generally broadband and highly intermittent. Transient RFI can be generated as a by-product of the normal operation of devices such as relays, AC machines and fluorescent lights, for example. Such devices may be present near radio telescope arrays as part of the infrastructure or equipment involved in the construction of new instruments. Other than contaminating observation data, transient RFI can also appear to have genuine astronomical origins. In one case, transient signals received from a microwave oven exhibited dispersion, suggesting a distant source. Therefore, the ability to identify transient RFI by source would be enormously valuable. Once identified, such sources may be removed or replaced where possible. Despite this need, there is a paucity of work on classifying transient RFI in the literature. This thesis focusses on the problem of identifying transient RFI by source in time domain data of the type captured by remote monitoring stations. Several novel approaches are explored in this thesis. If used with independent RFI monitoring stations, these approaches may aid in tracking down nearby RFI sources at a radio telescope array. They may also be useful for improving RFI flagging in data from radio telescopes themselves. Distinguishing between transient RFI and natural astronomical signals is likely to be an easier prospect than classifying transient RFI by source. Furthermore, these approaches may be better able to avoid excising genuine astronomical transients that nevertheless share some characteristics with RFI signals. The radio telescopes themselves are significantly more sensitive than RFI monitoring stations, and would thus be able to detect RFI sources more easily. However, terrestrial RFI would likely enter via sidelobes, tempering this advantage somewhat. In this thesis, transient RFI is first characterised, prior to classification by source. Labelled time-domain recordings of a number of transient RFI sources are acquired and statistically examined. Second, components analysis techniques are considered for feature selection. Cluster separation is analysed for principal components analysis (PCA) and kernel PCA, the latter proving most suitable. The effect of the supply voltage of certain RFI sources on cluster separation in the principal components domain is also explored. Several na¨ıve classification algorithms are tested, using kernel PCA for feature selection A more sophisticated dictionary-based approach is developed next. While there are variations in repeated recordings of the same RFI source, the signals tend to adhere to a common overarching structure. Full RFI signals are observed to consist of sequences of individual transients. An algorithm is presented to extract individual transients from full recordings, after which they are labelled using unsupervised clustering methods. This procedure results in a dictionary of archetypal transients, from which any full RFI sequence may be represented. Some approaches in Automated Speech Recognition (ASR) are similar: spoken words are divided into individual labelled phonemes. Representing RFI signals as sequences enables the use of hidden Markov models (HMMs) for identification. HMMs are well suited to sequence identification problems, and are known for their robustness to variation. For example, in ASR, HMMs are able to handle the variations in repeated utterances of the same word. When classifying the recorded RFI signals, good accuracy is achieved, improving on the results obtained using the more na¨ıve methods. Finally, a strategy involving deep learning techniques is explored. Recurrent neural networks and convolutional neural networks (CNNs) have shown great promise in a wide variety of classification tasks. Here, a model is developed that includes a pre-trained CNN layer followed by a bidirectional long short-term memory (BLSTM) layer. Special attention is paid to mitigating class imbalance when the model is used with individual transients extracted from full recordings. High classification accuracy is achieved, improving on the dictionary-based approach and the other na¨ıve methods. Recommendations are made for future work on developing these approaches further for practical use with remote monitoring stations. Other possibilities for future research are also discussed, including testing the robustness of the proposed approaches. They may also prove useful for RFI excision in observation data from radio telescopes

    Similar works