1,622 research outputs found

    A New Approach to Time Domain Classification of Broadband Noise in Gravitational Wave Data

    Get PDF
    Broadband noise in gravitational wave (GW) detectors, also known as triggers, can often be a deterrant to the efficiency with which astrophysical search pipelines detect sources. It is important to understand their instrumental or environmental origin so that they could be eliminated or accounted for in the data. Since the number of triggers is large, data mining approaches such as clustering and classification are useful tools for this task. Classification of triggers based on a handful of discrete properties has been done in the past. A rich information content is available in the waveform or 'shape' of the triggers that has had a rather restricted exploration so far. This paper presents a new way to classify triggers deriving information from both trigger waveforms as well as their discrete physical properties using a sequential combination of the Longest Common Sub-Sequence (LCSS) and LCSS coupled with Fast Time Series Evaluation (FTSE) for waveform classification and the multidimensional hierarchical classification (MHC) analysis for the grouping based on physical properties. A generalized k-means algorithm is used with the LCSS (and LCSS+FTSE) for clustering the triggers using a validity measure to determine the correct number of clusters in absence of any prior knowledge. The results have been demonstrated by simulations and by application to a segment of real LIGO data from the sixth science run.Comment: 16 pages, 16 figure

    An Oversampling Mechanism for Multimajority Datasets using SMOTE and Darwinian Particle Swarm Optimisation

    Get PDF
    Data skewness continues to be one of the leading factors which adversely impacts the machine learning algorithms performance. An approach to reduce this negative effect of the data variance is to pre-process the former dataset with data level resampling strategies. Resampling strategies have been seen in two forms, oversampling and undersampling. An oversampling strategy is proposed in this article for tackling multiclass imbalanced datasets. This proposed approach optimises the state-of-the-art oversampling technique SMOTE with the Darwinian Particle Swarm Optimization technique. This proposed method DOSMOTE generates synthetic optimised samples for balancing the datasets. This strategy will be more effective on multimajority datasets.  An experimental study is performed on peculiar multimajority datasets to measure the effectiveness of the proposed approach. As a result, the proposed method produces promising results when compared to the conventional oversampling strategies

    Hydrometeor classification from dual-polarized weather radar: extending fuzzy logic from S-band to C-band data

    No full text
    International audienceA model-based fuzzy classification method for C-band polarimetric radar data, named Fuzzy Radar Algorithm for Hydrometeor Classification at C-band (FRAHCC), is presented. Membership functions are designed for best fitting simulation data at C-band, and they are derived for ten different hydrometeor classes by means of a scattering model, based on T-Matrix numerical method. The fuzzy logic classification technique uses a reduced set of polarimetric observables, i.e. copolar reflectivity and differential reflectivity, and it is finally applied to data coming from radar sites located in Gattatico and S. Pietro Capofiume in North Italy. The final purpose is to show qualitative accuracy improvements with respect to the use of a set of ten bidimensional MBFs, previously adopted and well suited to S-band data but not to C-band data

    Genetic predisposition to mosaic Y chromosome loss in blood.

    Get PDF
    Mosaic loss of chromosome Y (LOY) in circulating white blood cells is the most common form of clonal mosaicism1-5, yet our knowledge of the causes and consequences of this is limited. Here, using a computational approach, we estimate that 20% of the male population represented in the UK Biobank study (n = 205,011) has detectable LOY. We identify 156 autosomal genetic determinants of LOY, which we replicate in 757,114 men of European and Japanese ancestry. These loci highlight genes that are involved in cell-cycle regulation and cancer susceptibility, as well as somatic drivers of tumour growth and targets of cancer therapy. We demonstrate that genetic susceptibility to LOY is associated with non-haematological effects on health in both men and women, which supports the hypothesis that clonal haematopoiesis is a biomarker of genomic instability in other tissues. Single-cell RNA sequencing identifies dysregulated expression of autosomal genes in leukocytes with LOY and provides insights into why clonal expansion of these cells may occur. Collectively, these data highlight the value of studying clonal mosaicism to uncover fundamental mechanisms that underlie cancer and other ageing-related diseases.This research has been conducted using the UK Biobank Resource under application 9905 and 19808. This work was supported by the Medical Research Council [Unit Programme number MC_UU_12015/2]. Full study-specific and individual acknowledgements can be found in the supplementary information

    Markovian analysis of the sequential behavior of the spontaneous spinal cord dorsum potentials induced by acute nociceptive stimulation in the anesthetized cat

    Get PDF
    In a previous study we developed a Machine Learning procedure for the automatic identification and classification of spontaneous cord dorsum potentials (CDPs). This study further supported the proposal that in the anesthetized cat, the spontaneous CDPs recorded from different lumbar spinal segments are generated by a distributed network of dorsal horn neurons with structured (non-random) patterns of functional connectivity and that these configurations can be changed to other non-random and stable configurations after the noceptive stimulation produced by the intradermic injection of capsaicin in the anesthetized cat. Here we present a study showing that the sequence of identified forms of the spontaneous CDPs follows a Markov chain of at least order one. That is, the system has memory in the sense that the spontaneous activation of dorsal horn neuronal ensembles producing the CDPs is not independent of the most recent activity. We used this markovian property to build a procedure to identify portions of signals as belonging to a specific functional state of connectivity among the neuronal networks involved in the generation of the CDPs. We have tested this procedure during acute nociceptive stimulation produced by the intradermic injection of capsaicin in intact as well as spinalized preparations. Altogether, our results indicate that CDP sequences cannot be generated by a renewal stochastic process. Moreover, it is possible to describe some functional features of activity in the cord dorsum by modeling the CDP sequences as generated by a Markov order one stochastic process. Finally, these Markov models make possible to determine the functional state which produced a CDP sequence. The proposed identification procedures appear to be useful for the analysis of the sequential behavior of the ongoing CDPs recorded from different spinal segments in response to a variety of experimental procedures including the changes produced by acute nociceptive stimulation. They are envisaged as a useful tool to examine alterations of the patterns of functional connectivity between dorsal horn neurons under normal and different pathological conditions, an issue of potential clinical concern.Peer ReviewedPostprint (published version

    PCA as a practical indicator of OPLS-DA model reliability

    Get PDF
    Background—Principal Component Analysis (PCA) and Orthogonal Projections to Latent Structures Discriminant Analysis (OPLS-DA) are powerful statistical modeling tools that provide insights into separations between experimental groups based on high-dimensional spectral measurements from NMR, MS or other analytical instrumentation. However, when used without validation, these tools may lead investigators to statistically unreliable conclusions. This danger is especially real for Partial Least Squares (PLS) and OPLS, which aggressively force separations between experimental groups. As a result, OPLS-DA is often used as an alternative method when PCA fails to expose group separation, but this practice is highly dangerous. Without rigorous validation, OPLS-DA can easily yield statistically unreliable group separation. Methods—A Monte Carlo analysis of PCA group separations and OPLS-DA cross-validation metrics was performed on NMR datasets with statistically significant separations in scores-space. A linearly increasing amount of Gaussian noise was added to each data matrix followed by the construction and validation of PCA and OPLS-DA models. Results—With increasing added noise, the PCA scores-space distance between groups rapidly decreased and the OPLS-DA cross-validation statistics simultaneously deteriorated. A decrease in correlation between the estimated loadings (added noise) and the true (original) loadings was also observed. While the validity of the OPLS-DA model diminished with increasing added noise, the group separation in scores-space remained basically unaffected

    PCA as a practical indicator of OPLS-DA model reliability

    Get PDF
    Background—Principal Component Analysis (PCA) and Orthogonal Projections to Latent Structures Discriminant Analysis (OPLS-DA) are powerful statistical modeling tools that provide insights into separations between experimental groups based on high-dimensional spectral measurements from NMR, MS or other analytical instrumentation. However, when used without validation, these tools may lead investigators to statistically unreliable conclusions. This danger is especially real for Partial Least Squares (PLS) and OPLS, which aggressively force separations between experimental groups. As a result, OPLS-DA is often used as an alternative method when PCA fails to expose group separation, but this practice is highly dangerous. Without rigorous validation, OPLS-DA can easily yield statistically unreliable group separation. Methods—A Monte Carlo analysis of PCA group separations and OPLS-DA cross-validation metrics was performed on NMR datasets with statistically significant separations in scores-space. A linearly increasing amount of Gaussian noise was added to each data matrix followed by the construction and validation of PCA and OPLS-DA models. Results—With increasing added noise, the PCA scores-space distance between groups rapidly decreased and the OPLS-DA cross-validation statistics simultaneously deteriorated. A decrease in correlation between the estimated loadings (added noise) and the true (original) loadings was also observed. While the validity of the OPLS-DA model diminished with increasing added noise, the group separation in scores-space remained basically unaffected
    • …
    corecore