29 research outputs found

    An Automated Pipeline for Variability Detection and Classification for the Small Telescopes Installed at the Liverpool Telescope

    Get PDF
    The Small Telescopes at the Liverpool Telescope (STILT) is an almost decade old project to install a number of wide field optical instruments to the Liverpool Telescope, named Skycams, to monitor weather conditions and yield useful photometry on bright astronomical sources. The motivation behind this thesis is the development of algorithms and techniques which can automatically exploit the data generated during the first 1200 days of Skycam operation to catalogue variable sources in the La Palma sky. A previously developed pipeline reduces the Skycam images and produces photometric time-series data named light curves of millions of objects. 590,492 of these objects have 100 or more data points of sufficient quality to attempt a variability analysis. The large volume and relatively high noise of this data necessitated the use of Machine Learning and sophisticated optimisation techniques to successfully extract this information. The Skycam instruments have no control over the orientation and pointing of the Liverpool Telescope and therefore resample areas of the sky highly irregularly. The term used for this resampling in astronomy is ‘cadence’. The unusually irregular Skycam cadence places increased strain on the algorithms designed for the detection of periodicity in light curves. This thesis details the development of a period estimation method based on a novel implementation of a genetic algorithm combined with a generational clustering method. Named GRAPE (Genetic Routine for Astronomical Period Estimation), this algorithm deconstructs the space of possible periods for a light curve into regions in which the genetic population clusters. These regions are then fine-tuned using a k-means clustering algorithm to return a set of independent period candidates which are then analysed using a Vuong closeness test to discriminate between aliased and true periods. This thesis demonstrates the capability of GRAPE on a set of synthetic light curves built using traditional regular cadence sampling and Skycam style cadence for four different shapes of periodic light curve. The performance of GRAPE on these light curves is compared to a more traditional periodogram which returns a set of peaks and is then analysed using Vuong closeness tests. GRAPE obtains similar performance compared to the periodogram on all the light curve shapes but with less computational complexity allowing for more efficient light curve analysis. Automated classification of variable light curves has been explored over the last decade. Multiple features have been engineered to identify patterns in the light curves of different classes of variable star. Within the last few years deep learning has come to prominence as a method of automatically generating informative representations of the data for the solution of a desired problem, such as a classification task. A set of models using Random Forests, Support Vector Machines and Neural Networks were trained using a set of variable Skycam light curves of five classes. Using 16 features engineered from previous methods an Area under the Curve (AUC) of 0.8495 was obtained. Replacing these features with inputs from the pixel intensities from a 100 by 20 pixel image representation, produced an AUC of 0.6348, which improved to 0.7952 when provided with additional context to the dimensionality of the image. Despite the inferior performance, the importance of the different pixels produced relations in the trained models demonstrating that they had produced features based on well-understood patterns in the different classes of light curve. Using features produced by Richards et al. and Kim & Bailer-Jones et al., a set of features to train machine learning classification models was constructed. In addition to this set of features, a semi-supervised set of novel features was designed to describe the shape of light curves phased around the GRAPE candidate period. This thesis investigates the performance of the PolyFit algorithm of Prsa et al., a technique to fit four piecewise polynomials with discontinuous knots capable of connecting across the phase boundary at phases of zero and one. This method was designed to fit eclipsing binary phased light curves however were also described to be fully capable on other variable star types. The optimisation method used by PolyFit is replaced by a novel genetic algorithm optimisation routine to fit the model to Skycam data with substantial improvement in performance. The PolyFit model is applied to the candidate period and twice this period for every classified light curve. This interpolation produces novel features which describe similar statistics to the previously developed methods but which appear significantly more resilient to the Skycam noise and are often preferred by the trained models. In addition, Principal Component Analysis (PCA) is used to investigate a set of 6897 variable light curves and discover that the first ten principal components are sufficient to describe 95\% of the variance of the fitted models. This trained PCA model is retained and used to generate twenty novel shape features. Whilst these features are not dominant in their importance to the learned models, they have above average importance and help distinguish some objects in the light curve classification task. The second principal component in particular is an important feature in the discrimination of short period pulsating and eclipsing variables as it appears to be an automatically learned robust skewness measure. The method described in this thesis produces 112 features of the Skycam light curves, 38 variability indices which are quickly obtainable and 74 which require the computation of a candidate period using GRAPE. A number of machine learning classifiers are investigated to produce high-performance models for the detection and classification of variable light curves from the Skycam dataset. A Random Forest classifier uses a training set of 859 light curves of 12 object classes to produce a classifier with a multi-class F1 score of 0.533. It would be computationally infeasible to produce all the features for every Skycam light curve, therefore an automated pipeline has been developed which combines a Skycam trend removal pipeline, GRAPE and our machine learned classifiers. It initialises with a set of Skycam light curves from objects cross-matched from the American Association of Variable Star Observers (AAVSO) Variable Star Index (VSI), one of the most comprehensive catalogues of variable stars available. The learned models classify the full 112 features generated for these cross-matched light curves and confident matches are selected to produce a training set for a binary variability detection model. This model utilises only the 38 variability indices to identify variable light curves rapidly without the use of GRAPE. This variability model, trained using a random forest classifier, obtains an F1 score of 0.702. Applying this model to the 590,492 Skycam light curves yields 103,790 variable candidates of which 51,129 candidates have been classified and are available for further analysis

    Confirmation of Monoperiodicity Above 20 Seconds for Two Blue Large-Amplitude Pulsators

    Get PDF
    Blue Large-Amplitude Pulsators (BLAPs) are a new class of pulsating variable star. They are located close to the hot subdwarf branch in the Hertzsprung-Russell diagram and have spectral classes of late O or early B. Stellar evolution models indicate that these stars are likely radially pulsating, driven by iron group opacity in their interiors. A number of variable stars with a similar driving mechanism exist near the hot subdwarf branch with multi-periodic oscillations caused by either pressure (p) or gravity (g) modes. No multi-periodic signals were detected in the OGLE discovery light curves since it would be difficult to detect short period signals associated with higher-order p modes with the OGLE cadence. Using the RISE instrument on the Liverpool Telescope, we produced high cadence light curves of two BLAPs, OGLE-BLAP-009 (mv=15.65m_{\mathrm{v}}=15.65 mag) and OGLE-BLAP-014 (mv=16.79m_{\mathrm{v}}=16.79 mag) using a 720720 nm longpass filter. Frequency analysis of these light curves identify a primary oscillation with a period of 31.935±0.009831.935\pm0.0098 mins and an amplitude from a Fourier series fit of 0.2360.236 mag for BLAP-009. The analysis of BLAP-014 identifies a period of 33.625±0.021433.625\pm0.0214 mins and an amplitude of 0.2250.225 mag. Analysis of the residual light curves reveals no additional short period variability down to an amplitude of 15.20±0.2615.20\pm0.26 mmag for BLAP-009 and 58.60±3.4458.60\pm3.44 mmag for BLAP-014 for minimum periods of 2020 s and 6060 s respectively. These results further confirm that the BLAPs are monoperiodic

    The Classification of Periodic Light Curves from non-survey optimized observational data through Automated Extraction of Phase-based Visual Features

    Get PDF
    We implement two hidden-layer feedforward networks to classify 3011 variable star light curves. These light curves are generated from a reduction of non-survey optimized observational images gathered by wide-field cameras mounted on the Liverpool Telescope. We extract 16 features found to be highly informative in previous studies but achieve only 19.82% accuracy on a 30% test set, 5.56% above a random model. Noise and sampling defects present in these light curves poison these features primarily by reducing our Periodogram period match rate to fewer than 5%. We propose using an automated visual feature extraction technique by transforming the phase-folded light curves into image based representations. This eliminates much of the noise and the missing phase data, due to sampling defects, should have a less destructive effect on these shape features as they still remain at least partially present. We produced a set of scaled images with pixels turned either on or off based on a threshold of data points in each pixel defined as at minimum one fifth of those of the most populated pixel for each light curve. Training on the same feedforward network, we achieve 29.13% accuracy, a 13.16% improvement over a random model and we also show this technique scales with an improvement to 33.51% accuracy by increasing the number of hidden layer neurons. We concede that this improvement is not yet sufficient to allow these light curves to be used for automated classification and in conclusion we discuss a new pipeline currently being developed that simultaneously incorporates period estimation and classification. This method is inspired by approximating the manual methods employed by astronomers

    GRAPE: Genetic Routine for Astronomical Period Estimation

    Get PDF
    Period estimation is an important task in the classification of many variable astrophysical objects. Here we present GRAPE: A Genetic Routine for Astronomical Period Estimation, a genetic algorithm optimised for the processing of survey data with spurious and aliased artefacts. It uses a Bayesian Generalised Lomb-Scargle (BGLS) fitness function designed for use with the Skycam survey conducted at the Liverpool Telescope. We construct a set of simulated light curves using both regular survey cadence and the unique Skycam variable cadence with four types of signal: sinusoidal, sawtooth, symmetric eclipsing binary and eccentric eclipsing binary. We apply GRAPE and a frequency spectrum BGLS periodogram to the light curves and show that the performance of GRAPE is superior to the frequency spectrum for any signal well modelled by the fitness function. This is due to treating the parameter space as a continuous variable.We also show that the Skycam sampling is sufficient to correctly estimate the period of over 90% of the sinusoidal shape light curves relative to the more standard regular cadence.We note that GRAPE has a computational overhead which makes it slower on light curves with low numbers of observations and faster with higher numbers of observations and discuss the potential optimisations used to speedup the runtime. Finally, we analyse the period dependence and baseline importance of the performance of both methods and propose improvements which will extend this method to the detection of quasi-periodic signals

    Classifying Periodic Astrophysical Phenomena from non-survey optimized variable-cadence observational data

    Get PDF
    Modern time-domain astronomy is capable of collecting a staggeringly large amount of data on millions of objects in real time. Therefore, the production of methods and systems for the automated classification of time-domain astronomical objects is of great importance. The Liverpool Telescope has a number of wide-field image gathering instruments mounted upon its structure, the Small Telescopes Installed at the Liverpool Telescope. These instruments have been in operation since March 2009 gathering data of large areas of sky around the current field of view of the main telescope generating a large dataset containing millions of light sources. The instruments are inexpensive to run as they do not require a separate telescope to operate but this style of surveying the sky introduces structured artifacts into our data due to the variable cadence at which sky fields are resampled. These artifacts can make light sources appear variable and must be addressed in any processing method. The data from large sky surveys can lead to the discovery of interesting new variable objects. Efficient software and analysis tools are required to rapidly determine which potentially variable objects are worthy of further telescope time. Machine learning offers a solution to the quick detection of variability by characterising the detected signals relative to previously seen exemplars. In this paper, we introduce a processing system designed for use with the Liverpool Telescope identifying potentially interesting objects through the application of a novel representation learning approach to data collected automatically from the wide-field instruments. Our method automatically produces a set of classification features by applying Principal Component Analysis on set of variable light curves using a piecewise polynomial fitted via a genetic algorithm applied to the epoch-folded data. The epoch-folding requires the selection of a candidate period for variable light curves identified using a genetic algorithm period estimation method specifically developed for this dataset. A Random Forest classifier is then used to classify the learned features to determine if a light curve is generated by an object of interest. This system allows for the telescope to automatically identify new targets through passive observations which do not affect day-to-day operations as the unique artifacts resulting from such a survey method are incorporated into the methods. We demonstrate the power of this feature extraction method compared to feature engineering performed by previous studies by training classification models on 859 light curves of 12 known variable star classes from our dataset. We show that our new features produce a model with a superior mean cross-validation F1 score of 0.4729 with a standard deviation of 0.0931 compared with the engineered features at 0.3902 with a standard deviation of 0.0619. We show that the features extracted from the representation learning are given relatively high importance in the final classification model. Additionally, we compare engineered features computed on the interpolated polynomial fits and show that they produce more reliable distributions than those fit to the raw light curve when the period estimation is correct

    A Dynamic, Modular Intelligent-Agent framework for Astronomical Light Curve Analysis and Classification

    Get PDF
    Modern time-domain astronomy is capable of collecting a staggeringly large amount of data on millions of objects in real time. This makes it almost impossible for objects to be identified manually. Therefore the production of methods and systems for the automated classification of time-domain astro-nomical objects is of great importance. The Liverpool Telescope has a number of wide-field image gathering instruments mounted upon its structure. These in-struments have been in operation since March 2009 gathering data of multi-degree sized areas of sky around the current field of view of the main telescope. Utilizing a Structured Query Language database established by a pre-processing operation upon the resultant images, which has identified millions of candidate variable stars with multiple time-varying magnitude observations, we applied a method designed to extract time-translation invariant features from the time-series light curves of each object for future input into a classification system. These efforts were met with limited success due to noise and uneven sampling within the time-series data. Additionally, finely surveying these light curves is a processing intensive task. Fortunately, these algorithms are capable of multi-threaded implementations based on available resources. Therefore we propose a new system designed to utilize multiple intelligent agents that distribute the data analysis across multiple machines whilst simultaneously a powerful intelligence service operates to constrain the light curves and eliminate false signals due to noise and local alias periods. This system will be highly scalable, capable of operating on a wide range of hardware whilst maintaining the production of ac-curate features based on the fitting of harmonic models to the light curves within the initial Structural Query Language database

    SQL Injection Attack Classification through the Feature Extraction of SQL Query strings using a Gap-Weighted String Subsequence Kernel

    Get PDF
    SQL Injection Attacks are one of the most common methods behind data security breaches. Previous research has attempted to produce viable detection solutions in order to filter SQL Injection Attacks from regular queries. Unfortunately it has proven to be a challenging problem with many solutions suffering from disadvantages such as being unable to process in real time as a preventative solution, a lack of adaptability to differing types of attack and the requirement for access to difficult-to-obtain information about the source application. This paper presents a novel solution of classifying SQL queries purely on the features of the initial query string. A Gap-Weighted String Subsequence Kernel algorithm is implemented to identify subsequences of shared characters between query strings for the output of a similarity metric. Finally a Support Vector Machine is trained on the similarity metrics between known query strings which are then used to classify unknown test queries. By gathering all feature data from the query strings, additional information from the source application is not required. The probabilistic nature of the learned models allows the solution to adapt to new threats whilst in operation. The proposed solution is evaluated using a number of test datasets derived from the Amnesia testbed datasets. The demonstration software achieved 97.07% accuracy for Select type queries and 92.48% accuracy for Insert type queries. This limited success rate is due to unsanitised quotation marks within legitimate inputs confusing the feature extraction. Using a test dataset that denies legitimate queries the use of unsanitised quotation marks, the Select and Insert query accuracy rose

    An update on the development of ASPIRED

    Get PDF
    We are reporting the updates in version 0.2.0 of the Automated SpectroPhotometric REDuction (ASPIRED) pipeline, designed for common use on different instruments. The default settings support many typical long-slit spectrometer configurations, whilst it also offers a flexible set of functions for users to refine and tailor-make their automated pipelines to an instrument's individual characteristics. Such automation provides near real-time data reduction to allow adaptive observing strategies, which is particularly important in the Time Domain Astronomy. Over the course of last year, significant improvement was made in the internal data handling as well as data I/O, accuracy and repeatability in the wavelength calibration

    Requirements and Limitations of Thermal Drones for Effective Search and Rescue in Marine and Coastal Areas

    Get PDF
    Search and rescue (SAR) is a vital line of defense against unnecessary loss of life. However, in a potentially hazardous environment, it is important to balance the risks associated with SAR action. Drones have the potential to help with the efficiency, success rate and safety of SAR operations as they can cover large or hard to access areas quickly. The addition of thermal cameras to the drones provides the potential for automated and reliable detection of people in need of rescue. We performed a pilot study with a thermal-equipped drone for SAR applications in Morecambe Bay. In a variety of realistic SAR scenarios, we found that we could detect humans who would be in need of rescue, both by the naked eye and by a simple automated method. We explore the current advantages and limitations of thermal drone systems, and outline the future path to a useful system for deployment in real-life SAR

    Occurrence of L-iduronic acid and putative D-glucuronyl C5-epimerases in prokaryotes

    Get PDF
    Glycosaminoglycans (GAGs) are polysaccharides that are typically present in a wide diversity of animal tissue. Most common GAGs are well-characterized and pharmaceutical applications exist for many of these compounds, e.g. heparin and hyaluronan. In addition, also bacterial glycosaminoglycan-like structures exist. Some of these bacterial GAGs have been characterized, but until now no bacterial GAG has been found that possesses the modifications that are characteristic for many of the animal GAGs such as sulfation and C5-epimerization. Nevertheless, the latter conversion may also occur in bacterial and archaeal GAGs, as some prokaryotic polysaccharides have been demonstrated to contain L-iduronic acid. However, experimental evidence for the enzymatic synthesis of L-iduronic acid in prokaryotes is as yet lacking. We therefore performed an in silico screen for D-glucuronyl C5-epimerases in prokaryotes. Multiple candidate C5-epimerases were found, suggesting that many more microorganisms are likely to exist possessing an L-iduronic acid residue as constituent of their cell wall polysaccharides
    corecore