522 research outputs found

    Identification and Classification of Radio Pulsar Signals Using Machine Learning

    Get PDF
    Automated single-pulse search approaches are necessary as ever-increasing amount of observed data makes the manual inspection impractical. Detecting radio pulsars using single-pulse searches, however, is a challenging problem for machine learning because pul- sar signals often vary significantly in brightness, width, and shape and are only detected in a small fraction of observed data. The research work presented in this dissertation is focused on development of ma- chine learning algorithms and approaches for single-pulse searches in the time domain. Specifically, (1) We developed a two-stage single-pulse search approach, named Single- Pulse Event Group IDentification (SPEGID), which automatically identifies and clas- sifies pulsars in radio pulsar search data. SPEGID first identifies pulse candidates as trial single-pulse event groups and then extracts features from the candidates and trains classifiers using supervised machine learning. SPEGID also addressed the challenges in- troduced by the current data processing techniques and successfully identified bright and dim candidates as well as other types of challenging pulsar candidates. (2) To address the lack of training data in the early stages of pulsar surveys, we explored the cross-surveys prediction. Our results showed that using instance-based and parameter-based transfer learning methods improved the performance of pulsar classification across surveys. (3) We developed a hybrid recommender system aimed to detect rare pulsar signals that are often missed by supervised learning. The proposed recommender system uses a target rare case to state users’ requirements and ranks the candidates using a similarity func- tion which is calculated as a weighted sum of individual feature similarities. Our hybrid recommender system successfully detects both low signal-to-noise ratio (S/N) pulsars and Fast Radio Bursts (FRBs). The approaches proposed in this dissertation were used to analyze data from the Green Bank Telescope 350 MHz drift (GBTDrift) pulsar survey and the Arecibo 327 MHz (AO327) drift pulsar survey and discovered eight pulsars that were overlooked in previous analysis done with existing methods

    RFI mitigation with phase-only adaptive beamforming

    Full text link
    Connected radio interferometers are sometimes used in the tied-array mode: signals from antenna elements are coherently added and the sum signal applied to a VLBI backend or pulsar processing machine. Usually there is no computer-controlled amplitude weighting in the existing radio interferometer facilities. Radio frequency interference (RFI) mitigation with phase-only adaptive beamforming is proposed for this mode of observation. Small phase perturbations are introduced in each of the antenna's signal. The values of these perturbations are optimized in such a way that the signal from a radio source of interest is preserved and RFI signals suppressed. An evolutionary programming algorithm is used for this task. Computer simulations, made for both one-dimensional and two-dimensional array set-ups, show considerable suppression of RFI and acceptable changes to the main array beam in the radio source direction.Comment: 7 pages, 11 figure

    Searching for Needles in the Cosmic Haystack

    Get PDF
    Searching for pulsar signals in radio astronomy data sets is a difficult task. The data sets are extremely large, approaching the petabyte scale, and are growing larger as instruments become more advanced. Big Data brings with it big challenges. Processing the data to identify candidate pulsar signals is computationally expensive and must utilize parallelism to be scalable. Labeling benchmarks for supervised classification is costly. To compound the problem, pulsar signals are very rare, e.g., only 0.05% of the instances in one data set represent pulsars. Furthermore, there are many different approaches to candidate classification with no consensus on a best practice. This dissertation is focused on identifying and classifying radio pulsar candidates from single pulse searches. First, to identify and classify Dispersed Pulse Groups (DPGs), we developed a supervised machine learning approach that consists of RAPID (a novel peak identification algorithm), feature extraction, and supervised machine learning classification. We tested six algorithms for classification with four imbalance treatments. Results showed that classifiers with imbalance treatments had higher recall values. Overall, classifiers using multiclass RandomForests combined with Synthetic Majority Oversampling TEchnique (SMOTE) were the most efficient; they identified additional known pulsars not in the benchmark, with less false positives than other classifiers. Second, we developed a parallel single pulse identification method, D-RAPID, and introduced a novel automated multiclass labeling (ALM) technique that we combined with feature selection to improve execution performance. D-RAPID improved execution performance over RAPID by a factor of 5. We also showed that the combination of ALM and feature selection sped up the execution performance of RandomForest by 54% on average with less than a 2% average reduction in classification performance. Finally, we proposed CoDRIFt, a novel classification algorithm that is distributed for scalability and employs semi-supervised learning to leverage unlabeled data to inform classification. We evaluated and compared CoDRIFt to eleven other classifiers. The results showed that CoDRIFt excelled at classifying candidates in imbalanced benchmarks with a majority of non-pulsar signals (\u3e95%). Furthermore, CoDRIFt models created with very limited sets of labeled data (as few as 22 labeled minority class instances) were able to achieve high recall (mean = 0.98). In comparison to the other algorithms trained on similar sets, CoDRIFt outperformed them all, with recall 2.9% higher than the next best classifier and a 35% average improvement over all eleven classifiers. CoDRIFt is customizable for other problem domains with very large, imbalanced data sets, such as fraud detection and cyber attack detection
    • …
    corecore