15,735 research outputs found

    Topics in statistical data analysis for high-energy physics

    Full text link
    These lectures concern two topics that are becoming increasingly important in the analysis of High Energy Physics (HEP) data: Bayesian statistics and multivariate methods. In the Bayesian approach we extend the interpretation of probability to cover not only the frequency of repeatable outcomes but also to include a degree of belief. In this way we are able to associate probability with a hypothesis and thus to answer directly questions that cannot be addressed easily with traditional frequentist methods. In multivariate analysis we try to exploit as much information as possible from the characteristics that we measure for each event to distinguish between event types. In particular we will look at a method that has gained popularity in HEP in recent years: the boosted decision tree (BDT).Comment: 22 pages, Lectures given at the 2009 European School of High-Energy Physics, Bautzen, Germany, 14-27 Jun 200

    Impact of risk attitude on optimal IOR initiation time: A case study solved in a sequential decision-making framework powered by machine learning-based non-linear regression

    Get PDF
    The least-squares Monte Carlo algorithm (LSM) is an efficient approximate dynamic programming algorithm for solving sequential decision-making problems, leveraging regression. Previous studies have showcased the LSM workflow and linear regression in a sequential decision problem for optimizing the improved-oil-recovery (IOR) initiation and termination time, based on expected monetary value maximization as the decision criterion under risk neutrality. In this work, risk attitude is introduced in the IOR optimization problem to assess the impact on the decisions. Risk behaviours are modelled using utility functions, and the optimal decision strategy is found by maximizing the expected utility. Since the utility functions introduce non-linearity, machine learning non-linear regression techniques are used in the LSM workflow to approximate the expected utilities. Results suggest that risk-averse decision-makers prefer longer primary recovery lifetime compared to risk-neutral and risk-seeking decision-makers. This behaviour is attributed to the net present value (NPV) uncertainty related to the capital expenditure (CAPEX) incurred by switching to secondary recovery. Risk-averse decision-makers prefer shorter secondary recovery lifetime. This behaviour is attributed to the operational expenditure (OPEX) and production late-stage marginal cash inflow. The more risk-seeking the decision-maker is, the sooner they prefer to switch to secondary recovery, and the longer they would run the secondary recovery. The value of the information increases as the decision-maker is more risk-seeking. The differences in the production lifetime decisions with the consideration of future information versus the decisions ignoring future information also increase as the decision-maker is more risk-seeking. A change in the problem setting to a more marginal and uncertain case shows that risk-averse decision-makers would not run the project. Risk-neutral decision-makers would only run the project if future information were incorporated. This reinforces the importance of sequential decision-making, where value is created from information. Risk-seeking decision-makers would run the project with or without information. The novelties and contributions from the present work include: • Modelling, demonstration, and discussion of the impact of different risk attitudes on decisions. • Selection and application of the best machine learning method for non-linear regression in the LSM approach. • Demonstration of the value of considering future information in solving sequential decision-making problems

    Search for VH and Technicolor Production in the qqbb Final State Using the RunII D0 Detector

    Get PDF
    A search for dijet resonance production in a four-jet all-hadronic final state from the DO detector at Fermilab's Tevatron is presented. The data set, acquired at a ppbar center-of-mass energy of sqrt{s}=1.96 TeV, contains primarily multijet events and represents approximately 1 fb-1 of data. The cross section limits for associated Higgs production and Technicolor processes are determined through a background subtraction method using data to estimate the background. This four-jet channel is potentially very powerful, but is extremely challenging due to the large multijet background from QCD processes. Background rejection is performed by utilizing b-tagging, pre-selection cuts, a multi-variate boosted decision tree discriminant, and the correlated information contained in the M(bb) and M(jj) dijet invariant masses. The search for VH (WH+ZH) processes yields a 95% confidence level observed upper limit of 20.4 pb on the VH cross section for a Higgs mass of 115 GeV/c2. Additionally, a 95% confidence level observed upper limit of 16.7 pb was set for a Higgs boson mass of 125 GeV/c2 and 24.6 pb was set for a Higgs boson mass of 135 GeV/c2. The same data set was used to place limits on the Technicolor process ρTC→ WπTC where the technirho mass was fixed to 240 GeV/c2. For a technipion mass of 115 GeV/c2 we find a 95% confidence level observed upper limit on the cross section of 49 pb. The technipion masses of 125 GeV/c2 and 140 GeV/c2, the 95% confidence level observed upper limits are 57 pb and 71 pb, respectively

    Sources of pesticide losses to surface waters and groundwater at field and landscape scales

    Get PDF
    Pesticide residues in groundwater and surface waters may harm aquatic ecosystems and result in a deterioration of drinking water quality. EU legislation and policy emphasize risk management and risk reduction for pesticides to ensure long-term, sustainable use of water across Europe. Different tools applicable at scales ranging from farm to national and EU scales are required to meet the needs of the various managers engaged with the task of protecting water resources. The use of computer-based pesticide fate and transport models at such large scales is challenging since models are scale-specific and generally developed for the soil pedon or plot scale. Modelling at larger scales is further complicated by the spatial and temporal variability of agro-environmental conditions and the uncertainty in predictions. The objective of this thesis was to identify the soil processes that dominate diffuse pesticide losses at field and landscape scales and to develop methods that can help identify 'high risk' areas for leaching. The underlying idea was that pesticide pollution of groundwater and surface waters can be mitigated if pesticide application on such areas is reduced. Macropore flow increases the risk of pesticide leaching and was identified as the most important process responsible for spatial variation of diffuse pesticide losses from a 30 ha field and a 9 km² catchment in the south of Sweden. Point-sources caused by careless handling of pesticides when filling or cleaning spraying equipment were also a significant source of contamination at the landscape scale. The research presented in this thesis suggests that the strength of macropore flow due to earthworm burrows and soil aggregation can be predicted from widely available soil survey information such as texture, management practices etc. Thus, a simple classification of soils according to their susceptibility to macropore flow may facilitate the use of process-based models at the landscape scale. Predictions of a meta-model of the MACRO model suggested that, at the field scale, fine-textured soils are high-risk areas for pesticide leaching. Uncertainty in pesticide degradation and sorption did not significantly affect predictions of the spatial extent of these high-risk areas. Thus, site-specific pesticide application seems to be a promising method for mitigating groundwater contamination at this scale

    Variable selection and sensitivity analysis using dynamic trees, with an application to computer code performance tuning

    Full text link
    We investigate an application in the automatic tuning of computer codes, an area of research that has come to prominence alongside the recent rise of distributed scientific processing and heterogeneity in high-performance computing environments. Here, the response function is nonlinear and noisy and may not be smooth or stationary. Clearly needed are variable selection, decomposition of influence, and analysis of main and secondary effects for both real-valued and binary inputs and outputs. Our contribution is a novel set of tools for variable selection and sensitivity analysis based on the recently proposed dynamic tree model. We argue that this approach is uniquely well suited to the demands of our motivating example. In illustrations on benchmark data sets, we show that the new techniques are faster and offer richer feature sets than do similar approaches in the static tree and computer experiment literature. We apply the methods in code-tuning optimization, examination of a cold-cache effect, and detection of transformation errors.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS590 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • …
    corecore