142,637 research outputs found

    Handling of Missing Values in Static and Dynamic Data Sets

    Get PDF
    This thesis contributes by first, conducting a comparative study of traditional and modern classifications by highlighting the differences in their performance. Second, an algorithm to enhance the prediction of values to be used for data imputation with nonlinear models is presented. Third, a novel algorithm model selection to enhance prediction performance in the presence of missing data is presented. It includes an overview of nonlinear model selection with complete data, and provides summary descriptions of Box-Tidwell and fractional polynomial methods for model selection. In particular, it focuses on the fractional polynomial method for nonlinear modelling in cases of missing data. An analysis ex- ample is presented to illustrate the performance of this method

    DropIn: Making Reservoir Computing Neural Networks Robust to Missing Inputs by Dropout

    Full text link
    The paper presents a novel, principled approach to train recurrent neural networks from the Reservoir Computing family that are robust to missing part of the input features at prediction time. By building on the ensembling properties of Dropout regularization, we propose a methodology, named DropIn, which efficiently trains a neural model as a committee machine of subnetworks, each capable of predicting with a subset of the original input features. We discuss the application of the DropIn methodology in the context of Reservoir Computing models and targeting applications characterized by input sources that are unreliable or prone to be disconnected, such as in pervasive wireless sensor networks and ambient intelligence. We provide an experimental assessment using real-world data from such application domains, showing how the Dropin methodology allows to maintain predictive performances comparable to those of a model without missing features, even when 20\%-50\% of the inputs are not available

    Reject Inference, Augmentation and Sample Selection

    Get PDF

    Detecting early signs of depressive and manic episodes in patients with bipolar disorder using the signature-based model

    Full text link
    Recurrent major mood episodes and subsyndromal mood instability cause substantial disability in patients with bipolar disorder. Early identification of mood episodes enabling timely mood stabilisation is an important clinical goal. Recent technological advances allow the prospective reporting of mood in real time enabling more accurate, efficient data capture. The complex nature of these data streams in combination with challenge of deriving meaning from missing data mean pose a significant analytic challenge. The signature method is derived from stochastic analysis and has the ability to capture important properties of complex ordered time series data. To explore whether the onset of episodes of mania and depression can be identified using self-reported mood data.Comment: 12 pages, 3 tables, 10 figure

    Accuracy and responses of genomic selection on key traits in apple breeding

    Get PDF
    open13siThe application of genomic selection in fruit tree crops is expected to enhance breeding efficiency by increasing prediction accuracy, increasing selection intensity and decreasing generation interval. The objectives of this study were to assess the accuracy of prediction and selection response in commercial apple breeding programmes for key traits. The training population comprised 977 individuals derived from 20 pedigreed full-sib families. Historic phenotypic data were available on 10 traits related to productivity and fruit external appearance and genotypic data for 7829 SNPs obtained with an Illumina 20K SNP array. From these data, a genome-wide prediction model was built and subsequently used to calculate genomic breeding values of five application full-sib families. The application families had genotypes at 364 SNPs from a dedicated 512 SNP array, and these genotypic data were extended to the high-density level by imputation. These five families were phenotyped for 1 year and their phenotypes were compared to the predicted breeding values. Accuracy of genomic prediction across the 10 traits reached a maximum value of 0.5 and had a median value of 0.19. The accuracies were strongly affected by the phenotypic distribution and heritability of traits. In the largest family, significant selection response was observed for traits with high heritability and symmetric phenotypic distribution. Traits that showed non-significant response often had reduced and skewed phenotypic variation or low heritability. Among the five application families the accuracies were uncorrelated to the degree of relatedness to the training population. The results underline the potential of genomic prediction to accelerate breeding progress in outbred fruit tree crops that still need to overcome long generation intervals and extensive phenotyping costs.openMuranty, H.; Troggio, M.; Sadok, I.B.; Mehdi A.R.; Auwerkerken, A.; Banchi, E.; Velasco, R.; Stevanato, P.; Eric van de Weg, W.; Di Guardo, M.; Kumar, S.; Laurens, F.; Bink, M.C.A.M.Muranty, H.; Troggio, M.; Sadok, I. B.; Mehdi, A. R.; Auwerkerken, A.; Banchi, E.; Velasco, R.; Stevanato, Piergiorgio; Eric van de Weg, W.; Di Guardo, M.; Kumar, S.; Laurens, F.; Bink, M. C. A. M

    Using Big Data to Enhance the Bosch Production Line Performance: A Kaggle Challenge

    Full text link
    This paper describes our approach to the Bosch production line performance challenge run by Kaggle.com. Maximizing the production yield is at the heart of the manufacturing industry. At the Bosch assembly line, data is recorded for products as they progress through each stage. Data science methods are applied to this huge data repository consisting records of tests and measurements made for each component along the assembly line to predict internal failures. We found that it is possible to train a model that predicts which parts are most likely to fail. Thus a smarter failure detection system can be built and the parts tagged likely to fail can be salvaged to decrease operating costs and increase the profit margins.Comment: IEEE Big Data 2016 Conferenc

    Invisible Z-Boson Decays at e+e- Colliders

    Full text link
    The measurement of the invisible Z-boson decay width at e+e- colliders can be done "indirectly", by subtracting the Z-boson visible partial widths from the Z-boson total width, or "directly", from the process e+e- -> \gamma \nu \bar{\nu}. Both procedures are sensitive to different types of new physics and provide information about the couplings of the neutrinos to the Z-boson. At present, measurements at LEP and CHARM II are capable of constraining the left-handed Z\nu\nu-coupling, 0.45 <~ g_L <~ 0.5, while the right-handed one is only mildly bounded, |g_R| <= 0.2. We show that measurements at a future e+e- linear collider at different center-of-mass energies, \sqrt{s} = MZ and \sqrt{s}s ~ 170 GeV, would translate into a markedly more precise measurement of the Z\nu\nu-couplings. A statistically significant deviation from Standard Model predictions will point toward different new physics mechanisms, depending on whether the discrepancy appears in the direct or the indirect measurement of the invisible Z-width. We discuss some scenarios which illustrate the ability of different invisible Z-boson decay measurements to constrain new physics beyond the Standard Model
    • 

    corecore