123 research outputs found

    LSTM Networks for Detection and Classification of Anomalies in Raw Sensor Data

    Get PDF
    In order to ensure the validity of sensor data, it must be thoroughly analyzed for various types of anomalies. Traditional machine learning methods of anomaly detections in sensor data are based on domain-specific feature engineering. A typical approach is to use domain knowledge to analyze sensor data and manually create statistics-based features, which are then used to train the machine learning models to detect and classify the anomalies. Although this methodology is used in practice, it has a significant drawback due to the fact that feature extraction is usually labor intensive and requires considerable effort from domain experts. An alternative approach is to use deep learning algorithms. Research has shown that modern deep neural networks are very effective in automated extraction of abstract features from raw data in classification tasks. Long short-term memory networks, or LSTMs in short, are a special kind of recurrent neural networks that are capable of learning long-term dependencies. These networks have proved to be especially effective in the classification of raw time-series data in various domains. This dissertation systematically investigates the effectiveness of the LSTM model for anomaly detection and classification in raw time-series sensor data. As a proof of concept, this work used time-series data of sensors that measure blood glucose levels. A large number of time-series sequences was created based on a genuine medical diabetes dataset. Anomalous series were constructed by six methods that interspersed patterns of common anomaly types in the data. An LSTM network model was trained with k-fold cross-validation on both anomalous and valid series to classify raw time-series sequences into one of seven classes: non-anomalous, and classes corresponding to each of the six anomaly types. As a control, the accuracy of detection and classification of the LSTM was compared to that of four traditional machine learning classifiers: support vector machines, Random Forests, naive Bayes, and shallow neural networks. The performance of all the classifiers was evaluated based on nine metrics: precision, recall, and the F1-score, each measured in micro, macro and weighted perspective. While the traditional models were trained on vectors of features, derived from the raw data, that were based on knowledge of common sources of anomaly, the LSTM was trained on raw time-series data. Experimental results indicate that the performance of the LSTM was comparable to the best traditional classifiers by achieving 99% accuracy in all 9 metrics. The model requires no labor-intensive feature engineering, and the fine-tuning of its architecture and hyper-parameters can be made in a fully automated way. This study, therefore, finds LSTM networks an effective solution to anomaly detection and classification in sensor data

    An Analysis of the Effectiveness of Applying a Machine Learning Approach for Classification of Technical Documents in Knowledge Discovery Systems

    Get PDF
    An important component of knowledge management (KM) is the organization of documents for quick and easy access. One advantageous and effective way of organizing these documents is to group them by a fixed set of specific knowledge categories. For large-scale technical teams, the number of categories can reach thousands or even tens of thousands, which makes the aforementioned cataloging especially useful. Text classification (TC) is a sophisticated process that involves data pre-processing, transformation, dimensionality reduction, application of classification techniques, classifier evaluation, and classifier validation. TC remains a prominent research topic and still depends on human work rather than on machine learning (ML). It is a relatively new area of research and remains in a premature phase. The goal is to develop and evaluate a prototype model that uses ML algorithms to classify technical documentation in a KM system for technical teams of financial institutions involved in software development projects. This research contributes to the field of KM by determining whether an ML approach constitutes a feasible solution for TC in knowledge discovery

    Fe II Diagnostic Tools for Quasars

    Full text link
    The enrichment of Fe, relative to alpha-elements such as O and Mg, represents a potential means to determine the age of quasars and probe the galaxy formation epoch. To explore how \ion{Fe}{2} emission in quasars is linked to physical conditions and abundance, we have constructed a 830-level \ion{Fe}{2} model atom and investigated through photoionization calculations how \ion{Fe}{2} emission strengths depend on non-abundance factors. We have split \ion{Fe}{2} emission into three major wavelength bands, \ion{Fe}{2} (UV), \ion{Fe}{2}(Opt1), and \ion{Fe}{2}(Opt2), and explore how the \ion{Fe}{2}(UV)/\ion{Mg}{2}, \ion{Fe}{2}(UV)/\ion{Fe}{2}(Opt1) and \ion{Fe}{2}(UV)/\ion{Fe}{2}(Opt2) emission ratios depend upon hydrogen density and ionizing flux in broad-line regions (BLR's) of quasars. Our calculations show that: 1) similar \ion{Fe}{2}(UV)/\ion{Mg}{2} ratios can exist over a wide range of physical conditions; 2) the \ion{Fe}{2}(UV)/\ion{Fe}{2}(Opt1) and \ion{Fe}{2}(UV)/\ion{Fe}{2}(Opt2) ratios serve to constrain ionizing luminosity and hydrogen density; and 3) flux measurements of \ion{Fe}{2} bands and knowledge of ionizing flux provide tools to derive distances to BLR's in quasars. To derive all BLR physical parameters with uncertainties, comparisons of our model with observations of a large quasar sample at low redshift (z<1z<1) is desirable. The STIS and NICMOS spectrographs aboard the Hubble Space Telescope (HST) offer the best means to provide such observations.Comment: ApJ accepte

    Effects of X-ray irradiation and disk flaring on the [NeII] 12.8 micron emission from young stellar objects

    Full text link
    The [Ne II] fine-structure emission line at 12.8 micron has been detected in several young stellar objects (YSO) spectra. This line is thought to be produced by X-ray irradiation of the warm protoplanetary disk atmospheres, however the observational correlation between [Ne II] luminosities and measured X-ray luminosities shows a large scatter. Such spread limits the utility of this line as a probe of the gaseous phase of disks, as several authors have suggested pollution by outflows as a probable cause of the observed scatter. In this work we explore the possibility that the large variations in the observed [Ne II] luminosity may be caused instead by different star-disk parameters. In particular we study the effects that the hardness of the irradiating source and the structure (flaring) of the disk have on the luminosity and spectral profile of the [Ne II] 12.8 micron line. We find that varying these parameter can indeed cause up to an order of magnitude variation in the emission luminosities which may explain the scatter observed, although our models predict somewhat smaller luminosities than those recently reported by other authors who observed the line with the Spitzer Space Telescope. Our models also show that the hardness of the spectrum has only a limited (undetectable) effect on the line profiles, while changes in the flaring power of the disk significantly affect the size of the [Ne II] emission region and, as a consequence, its line profile. In particular we suggest that broad line profiles centred on the stellar radial velocity may be indicative of flat disks seen at large inclination angles.Comment: 9 pages, 8 figures. accepted for publication in MNRA

    Discovering Sparse Representations of Lie Groups with Machine Learning

    Full text link
    Recent work has used deep learning to derive symmetry transformations, which preserve conserved quantities, and to obtain the corresponding algebras of generators. In this letter, we extend this technique to derive sparse representations of arbitrary Lie algebras. We show that our method reproduces the canonical (sparse) representations of the generators of the Lorentz group, as well as the U(n)U(n) and SU(n)SU(n) families of Lie groups. This approach is completely general and can be used to find the infinitesimal generators for any Lie group.Comment: 14 pages, 6 figure

    Identifying the Group-Theoretic Structure of Machine-Learned Symmetries

    Full text link
    Deep learning was recently successfully used in deriving symmetry transformations that preserve important physics quantities. Being completely agnostic, these techniques postpone the identification of the discovered symmetries to a later stage. In this letter we propose methods for examining and identifying the group-theoretic structure of such machine-learned symmetries. We design loss functions which probe the subalgebra structure either during the deep learning stage of symmetry discovery or in a subsequent post-processing stage. We illustrate the new methods with examples from the U(n) Lie group family, obtaining the respective subalgebra decompositions. As an application to particle physics, we demonstrate the identification of the residual symmetries after the spontaneous breaking of non-Abelian gauge symmetries like SU(3) and SU(5) which are commonly used in model building.Comment: 10 pages, 8 figures, 2 table

    The energy budget for X-ray to infrared reprocessing in Compton-thin and Compton-thick active galaxies

    Full text link
    Heavily obscured active galactic nuclei (AGNs) play an important role in contributing to the cosmic X-ray background (CXRB). However, the AGNs found in deep X-ray surveys are often too weak to allow direct measurement of the column density of obscuring matter. One method adopted in recent years to identify heavily obscured, Compton-thick AGNs under such circumstances is to use the observed mid-infrared to X-ray luminosity ratio as a proxy for the column density. This is based on the supposition that the amount of energy lost by the illuminating X-ray continuum to the obscuring matter and reprocessed into infrared emission is directly related to the column density and that the proxy is not sensitive to other physical parameters of the system (aside from contamination by dust emission from, for example, star-forming regions). Using Monte Carlo simulations, we find that the energy losses experienced by the illuminating X-ray continuum in the obscuring matter are far more sensitive to the shape of the X-ray continuum and to the covering factor of the X-ray reprocessor than they are to the column density of the material. Specifically we find that it is possible for the infrared to X-ray luminosity ratio for a Compton-thin source to be just as large as that for a Compton-thick source even without any contamination from dust. Since the intrinsic X-ray continuum and covering factor of the reprocessor are poorly constrained from deep X-ray survey data, we conclude that the mid-infrared to X-ray luminosity ratio is not a reliable proxy for the column density of obscuring matter in AGNs even when there is no other contribution to the mid-infrared luminosity aside from X-ray reprocessing. This conclusion is independent of the geometry of the obscuring matter.Comment: Accepted for publication in MNRAS. 12 pages, 7 figure

    Medical ethnobotany of herbal practitioners in the Turkestan Range, southwestern Kyrgyzstan

    Get PDF
    This study recorded and analyzed traditional knowledge of medicinal plants in the Turkestan Range in southwestern Kyrgyzstan, where ethnobotanical knowledge has been largely under-documented to date. Data was collected through participant observation and both semi-structured and in-depth interviews with 10 herbal specialists. A total of 50 medicinal plant taxa were documented, distributed among 46 genera and 27 botanical families. In folk medicine they are applied in 75 different formulations, which cure 63 human and three animal ailments. Quantitative ethnobotanical indices were calculated to analyze traditional knowledge of the informants and to determine the cultural importance of particular medicinal plants. Ziziphora pamiroalaica, Peganum harmala, and Inula orientalis obtained the highest use value (UV). The best-represented and culturally important families were Lamiaceae, Asteraceae, and Apiaceae. Gastro-intestinal system disorders was the most prevalent ailment category. Most medicinal plants were gathered from nearby environments, however, species with a higher cultural value occurred at distant rather than nearby collection sites. The findings of this study proved the gap in documentation of traditional knowledge in Kyrgyzstan, indicating that further studies on the traditional use of wild plant resources could bring important insights into ecosystems’ diversity with implications to human ecology and bio-cultural diversity conservation in Central Asia

    Theoretical spectra of photoevaporating protoplanetary discs: An atlas of atomic and low-ionisation emission lines

    Full text link
    We present a calculation of the atomic and low-ionisation emission line spectra of photoevaporating protoplanetary discs. Line luminosities and profiles are obtained from detailed photoionisation calculations of the disc and wind structures surrounding young active solar-type stars. The disc and wind density and velocity fields were obtained from the recently developed radiation-hydrodynamic models of Owen et al., that include stellar X-ray and EUV irradiation of protoplanetary discs at various stages of clearing, from primordial sources to inner hole sources of various hole sizes. Our models compare favourably with currently available observations, lending support to an X-ray driven photoevaporation model for disc dispersal. In particular, we find that X-rays drive a warm, predominantly neutral flow where the OI 6300A line can be produced by neutral hydrogen collisional excitation. Our models can, for the first time, provide a very good match to both luminosities and profiles of the low-velocity component of the OI 6300A line and other forbidden lines observed by Hartigan et al., which covered a large sample of T-Tauri stars. We find that the OI 6300A and the NeII 12.8um lines are predominantly produced in the X-ray-driven wind and thus appear blue-shifted by a few km/s for some of the systems when observed at non-edge-on inclinations. We note however that blue-shifts are only produced under certain conditions: X-ray luminosity, spectral shape and inner hole size all affect the location of the emitting region and the physical conditions in the wind. We caution therefore that while a blueshifted line is a tell-tale sign of an outflow, the lack of a blueshift should not be necessarily interpreted as a lack of outflow.Comment: 18 pages, 7 figures, accepted to be published in MNRAS - changes in the revised version: reference list update

    Genetic analyses of Seoul hantavirus genome recovered from rats (Rattus norvegicus) in the Netherlands unveils diverse routes of spread into Europe

    Get PDF
    Seoul virus (SEOV) is the etiologic agent of hemorrhagic fever with renal syndrome. It is carried by brown rats (Rattus norvegicus), a commensal rodent that closely cohabitates with humans in urban environments. SEOV has a worldwide distribution, and in Europe, it has been found in rats in UK, France, Sweden, and Belgium, and human cases of SEOV infection have been reported in Germany, UK, France, and Belgium. In the search of hantaviruses in brown rats from the Netherlands, we found both serological and genetic evidence for the presence of SEOV in the local wild rat population. To further decipher the relationship with other SEOV variants globally, the complete genome of SEOV in the Netherlands was recovered. SEOV sequences obtained from three positive rats (captured at close trapping locations at the same time) were found highly similar. Phylogenetic analyses demonstrated that two lineages of SEOV circulate in Europe. Strains from the Netherlands and UK, together with the Baxter strain from US, constitute one of these two, while the second includes strains from Europe and Asia. Our results support a hypothesis of diverse routes of SEOV spread into Europe. These findings, combined with other indications on the expansion of the spatial European range of SEOV, suggest an increased risk of this virus for the public health, highlighting the need for increased surveillance.Peer reviewe
    • 

    corecore