278 research outputs found

    Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science

    Full text link
    As the field of data science continues to grow, there will be an ever-increasing demand for tools that make machine learning accessible to non-experts. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning---pipeline design. We implement an open source Tree-based Pipeline Optimization Tool (TPOT) in Python and demonstrate its effectiveness on a series of simulated and real-world benchmark data sets. In particular, we show that TPOT can design machine learning pipelines that provide a significant improvement over a basic machine learning analysis while requiring little to no input nor prior knowledge from the user. We also address the tendency for TPOT to design overly complex pipelines by integrating Pareto optimization, which produces compact pipelines without sacrificing classification accuracy. As such, this work represents an important step toward fully automating machine learning pipeline design.Comment: 8 pages, 5 figures, preprint to appear in GECCO 2016, edits not yet made from reviewer comment

    Effect of V and N on the microstructure evolution during continuous casting of steel

    Get PDF
    Low Carbon (LC) steel is not expected to be sensitive to hot tearing and/or cracking while microalloyed steels are known for their high cracking sensitivity during continuous casting. Experience of the Direct Sheet Plant caster at Tata Steel in Ijmuiden (the Netherlands), seems to contradict this statement. It is observed that a LC steel grade has a high risk of cracking alias hot tearing, while a High Strength Low Alloyed (HSLA) steel has a very low cracking occurrence. Another HSLA steel grade, with a similar composition but less N and V is however very sensitive to hot tearing. An extreme crack results in a breakout. A previous statistical analysis of the breakout occurrence reveals a one and a half times higher possibility of a breakout for the HSLA grade compared to the LC grade. HSLA with extra N, V shows a four times smaller possibility of breakout than LC. This study assigns the unexpected effect of the chemical composition on the hot tearing sensitivity to the role of some alloying elements such as V and N as structure refiners.This research was carried out under project number M41.5.08320 within the framework of the Research Program of the Materials innovation institute M2i (www.m2i.nl)

    Applying configurational theory to build a typology of ethnocentric consumers

    Get PDF
    Purpose – Individuals showing high consumer ethnocentrism (CE) prefer domestic over foreign-made products and their preferences may contribute to barriers to international market entry. Therefore, how to identify such consumers is an important question. Shankarmahesh’s (2006) review reveals inconsistencies in the literature with regard to CE and its antecedents. To shed theoretical and empirical light on these inconsistencies, the purpose of this paper is to contribute two new perspectives on CE: first, a typology that classifies ethnocentric consumers by the extent to which they support government-controlled protectionism and consumer-controlled protectionism; and second, a configurational (recipe) perspective on the antecedents. Design/methodology/approach – The study applies fuzzy-set qualitative comparative analysis of survey data from 3,859 consumers. The study contrasts the findings with findings using traditional statistical hypotheses testing via multiple regression analysis. Findings – The results reveal several configurations of antecedents that are sufficient for consistently explaining three distinct types of CE. No single antecedent condition is necessary for high CE to occur. Practical implications – The findings help global business strategists in their market entry decisions and in their targeting and segmentation efforts. Originality/value – The authors show the value of asymmetrical thinking about the relationship between CE and its antecedents. The results expand understanding of CE and challenge conventional net-effects thinking about its antecedents

    Validity and reliability of the Patient-Reported Arthralgia Inventory; validation of a newly-developed survey instrument to measure arthralgia

    Get PDF
    BACKGROUND: There is a need for a survey instrument to measure arthralgia (joint pain) that has been psychometrically validated in the context of existing reference instruments. We developed the 16-item Patient-Reported Arthralgia Inventory (PRAI) to measure arthralgia severity in 16 joints, in the context of a longitudinal cohort study to assess aromatase inhibitor-associated arthralgia in breast cancer survivors and arthralgia in postmenopausal women without breast cancer. We sought to evaluate the reliability and validity of the PRAI instrument in these populations, as well as to examine the relationship of patient-reported morning stiffness and arthralgia. METHODS: We administered the PRAI on paper in 294 women (94 initiating aromatase inhibitor therapy and 200 postmenopausal women without breast cancer) at weeks 0, 2, 4, 6, 8, 12, 16, and 52, as well as once in 36 women who had taken but were no longer taking aromatase inhibitor therapy. RESULTS: Cronbach’s alpha was 0.9 for internal consistency of the PRAI. Intraclass correlation coefficients of test-retest reliability were in the range of 0.87–0.96 over repeated PRAI administrations; arthralgia severity was higher in the non-cancer group at baseline than at subsequent assessments. Women with joint comorbidities tended to have higher PRAI scores than those without (estimated difference in mean scores: −0.3, 95% confidence interval [CI] −0.5, −0.2; P<0.001). The PRAI was highly correlated with the Functional Assessment of Cancer Therapy-Endocrine Subscale item “I have pain in my joints” (reference instrument; Spearman r range: 0.76–0.82). Greater arthralgia severity on the PRAI was also related to decreased physical function (r=−0.47, 95% CI −0.55, −0.37; P<0.001), higher pain interference (r=0.65, 95% CI 0.57–0.72; P<0.001), less active performance status (estimated difference in location (−0.6, 95% CI −0.9, −0.4; P<0.001), and increased morning stiffness duration (r=0.62, 95% CI 0.54–0.69; P<0.0001). CONCLUSION: We conclude that the psychometric properties of the PRAI are satisfactory for measuring arthralgia severity

    BlinkML: Efficient Maximum Likelihood Estimation with Probabilistic Guarantees

    Full text link
    The rising volume of datasets has made training machine learning (ML) models a major computational cost in the enterprise. Given the iterative nature of model and parameter tuning, many analysts use a small sample of their entire data during their initial stage of analysis to make quick decisions (e.g., what features or hyperparameters to use) and use the entire dataset only in later stages (i.e., when they have converged to a specific model). This sampling, however, is performed in an ad-hoc fashion. Most practitioners cannot precisely capture the effect of sampling on the quality of their model, and eventually on their decision-making process during the tuning phase. Moreover, without systematic support for sampling operators, many optimizations and reuse opportunities are lost. In this paper, we introduce BlinkML, a system for fast, quality-guaranteed ML training. BlinkML allows users to make error-computation tradeoffs: instead of training a model on their full data (i.e., full model), BlinkML can quickly train an approximate model with quality guarantees using a sample. The quality guarantees ensure that, with high probability, the approximate model makes the same predictions as the full model. BlinkML currently supports any ML model that relies on maximum likelihood estimation (MLE), which includes Generalized Linear Models (e.g., linear regression, logistic regression, max entropy classifier, Poisson regression) as well as PPCA (Probabilistic Principal Component Analysis). Our experiments show that BlinkML can speed up the training of large-scale ML tasks by 6.26x-629x while guaranteeing the same predictions, with 95% probability, as the full model.Comment: 22 pages, SIGMOD 201

    Surface Oscillations in Overdense Plasmas Irradiated by Ultrashort Laser Pulses

    Full text link
    The generation of electron surface oscillations in overdense plasmas irradiated at normal incidence by an intense laser pulse is investigated. Two-dimensional (2D) particle-in-cell simulations show a transition from a planar, electrostatic oscillation at 2ω2\omega, with ω\omega the laser frequency, to a 2D electromagnetic oscillation at frequency ω\omega and wavevector k>ω/ck>\omega/c. A new electron parametric instability, involving the decay of a 1D electrostatic oscillation into two surface waves, is introduced to explain the basic features of the 2D oscillations. This effect leads to the rippling of the plasma surface within a few laser cycles, and is likely to have a strong impact on laser interaction with solid targets.Comment: 9 pages (LaTeX, Revtex4), 4 GIF color figures, accepted for publication in Phys. Rev. Let

    MRI plaque imaging reveals high-risk carotid plaques especially in diabetic patients irrespective of the degree of stenosis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Plaque imaging based on magnetic resonance imaging (MRI) represents a new modality for risk assessment in atherosclerosis. It allows classification of carotid plaques in high-risk and low-risk lesion types (I-VIII). Type 2 diabetes mellitus (DM 2) represents a known risk factor for atherosclerosis, but its specific influence on plaque vulnerability is not fully understood. This study investigates whether MRI-plaque imaging can reveal differences in carotid plaque features of diabetic patients compared to nondiabetics.</p> <p>Methods</p> <p>191 patients with moderate to high-grade carotid artery stenosis were enrolled after written informed consent was obtained. Each patient underwent MRI-plaque imaging using a 1.5-T scanner with phased-array carotid coils. The carotid plaques were classified as lesion types I-VIII according to the MRI-modified AHA criteria. For 36 patients histology data was available.</p> <p>Results</p> <p>Eleven patients were excluded because of insufficient MR-image quality. DM 2 was diagnosed in 51 patients (28.3%). Concordance between histology and MRI-classification was 91.7% (33/36) and showed a Cohen's kappa value of 0.81 with a 95% CI of 0.98-1.15. MRI-defined high-risk lesion types were overrepresented in diabetic patients (n = 29; 56.8%). Multiple logistic regression analysis revealed association between DM 2 and MRI-defined high-risk lesion types (OR 2.59; 95% CI [1.15-5.81]), independent of the degree of stenosis.</p> <p>Conclusion</p> <p>DM 2 seems to represent a predictor for the development of vulnerable carotid plaques irrespective of the degree of stenosis and other risk factors. MRI-plaque imaging represents a new tool for risk stratification of diabetic patients.</p> <p>See Commentary: <url>http://www.biomedcentral.com/1741-7015/8/78/abstract</url></p
    • 

    corecore