403 research outputs found

    Improving offline evaluation of contextual bandit algorithms via bootstrapping techniques

    Get PDF
    In many recommendation applications such as news recommendation, the items that can be rec- ommended come and go at a very fast pace. This is a challenge for recommender systems (RS) to face this setting. Online learning algorithms seem to be the most straight forward solution. The contextual bandit framework was introduced for that very purpose. In general the evaluation of a RS is a critical issue. Live evaluation is of- ten avoided due to the potential loss of revenue, hence the need for offline evaluation methods. Two options are available. Model based meth- ods are biased by nature and are thus difficult to trust when used alone. Data driven methods are therefore what we consider here. Evaluat- ing online learning algorithms with past data is not simple but some methods exist in the litera- ture. Nonetheless their accuracy is not satisfac- tory mainly due to their mechanism of data re- jection that only allow the exploitation of a small fraction of the data. We precisely address this issue in this paper. After highlighting the limita- tions of the previous methods, we present a new method, based on bootstrapping techniques. This new method comes with two important improve- ments: it is much more accurate and it provides a measure of quality of its estimation. The latter is a highly desirable property in order to minimize the risks entailed by putting online a RS for the first time. We provide both theoretical and ex- perimental proofs of its superiority compared to state-of-the-art methods, as well as an analysis of the convergence of the measure of quality

    Bias in random forest variable importance measures: Illustrations, sources and a solution

    Get PDF
    BACKGROUND: Variable importance measures for random forests have been receiving increased attention as a means of variable selection in many classification tasks in bioinformatics and related scientific fields, for instance to select a subset of genetic markers relevant for the prediction of a certain disease. We show that random forest variable importance measures are a sensible means for variable selection in many applications, but are not reliable in situations where potential predictor variables vary in their scale of measurement or their number of categories. This is particularly important in genomics and computational biology, where predictors often include variables of different types, for example when predictors include both sequence data and continuous variables such as folding energy, or when amino acid sequence data show different numbers of categories. RESULTS: Simulation studies are presented illustrating that, when random forest variable importance measures are used with data of varying types, the results are misleading because suboptimal predictor variables may be artificially preferred in variable selection. The two mechanisms underlying this deficiency are biased variable selection in the individual classification trees used to build the random forest on one hand, and effects induced by bootstrap sampling with replacement on the other hand. CONCLUSION: We propose to employ an alternative implementation of random forests, that provides unbiased variable selection in the individual classification trees. When this method is applied using subsampling without replacement, the resulting variable importance measures can be used reliably for variable selection even in situations where the potential predictor variables vary in their scale of measurement or their number of categories. The usage of both random forest algorithms and their variable importance measures in the R system for statistical computing is illustrated and documented thoroughly in an application re-analyzing data from a study on RNA editing. Therefore the suggested method can be applied straightforwardly by scientists in bioinformatics research

    A COMPARISON OF RESAMPLING TECHNIQUES WHEN PARAMETERS ARE ON A BOUNDARY: THE BOOTSTRAP, SUBSAMPLE BOOTSTRAP, AND SUBSAMPLE JACKKNIFE

    Get PDF
    This paper compares the finite sample performance of subsample bootstrap and subsample jackknife techniques to the traditional bootstrap method when parameters are constrained to be on some boundary. To assess how these three methods perform in an empirical application, a negative semi-definite translog cost function is estimated using U.S. manufacturing data.Research Methods/ Statistical Methods,

    Monk business: an example of the dynamics of organizations.

    Get PDF
    In this paper we present a dynamic model of an organization. It is shown that the quality of the members of the organization may cycle and that even if the organization promotes excellency, the organization may end up populated by mediocre agents only.Overlapping generations; Quanty organization;

    Assessing variability in carbon footprint throughout the food supply chain: a case study of Valencian oranges

    Full text link
    [EN] Purpose This study aims to analyse the variability in the carbon footprint (CF) of organically and conventionally produced Valencian oranges (Spain), including both farming and post-harvest (PH) stages. At the same time, two issues regarding sample representativeness are addressed: how to determine confidence intervals from small samples and how to calculate the aggregated mean CF (and its variability) when the inventory is derived from different sources. Methods The functional unit was 1 kg of oranges at a European distribution centre. Farming data come from a survey of two samples of organic and conventional farms; PH data come from one PH centre; and data on exportation to the main European markets were obtained from official secondary sources. To assess the variability of the farming subsystem, a bootstrap of the mean CF was performed. The variability of the PH subsystem was assessed through a Monte Carlo simulation and a subsequent subsampling bootstrap. A weighted discrete distribution of the CF of distribution and end-of-life (EoL) was built, which was also bootstrapped. The empirical distribution of the overall CF was obtained by summing all iterations of the three bootstrap procedures of the subsystems. Results and discussion The CF of the baseline scenarios for conventional and organic production were 0.82 and 0.67 kg CO2 equivalent·kg orange¿1, respectively; the difference between their values was due mainly to differences in the farming subsystem. Distribution and EoL was the subsystem contributing the most to the CF (59.3 and 75.7% of the total CF for conventional and organic oranges, respectively), followed by the farming subsystem (34.1 and 19.8% for conventional and organic oranges, respectively). The confidence intervals for the CF of oranges were 0.72¿0.92 and 0.61¿0.82 kg CO2 equivalent·kg orange¿1 for conventional and organic oranges, respectively, and a significant difference was found between them. If organic production were to reach 50% of the total exported production, the CF would be reduced by 5.4¿8.4%. Conclusions The case study and the methods used show that bootstrap techniques can help to test for the existence of significant differences and estimate confidence intervals of the mean CF. Furthermore, these techniques allow several CF sources to be combined so as to estimate the uncertainty in the mean CF estimate. Assessing the variability in the mean CF (or in other environmental impacts) gives a more reliable measure of the mean impact.The Spanish Ministerio de Economia y Competitividad for provided financial support in the project Design of a life-cycle indicator for sustainability in agricultural systems (CTM2013-47340-R).Ribal, J.; Estruch, V.; Clemente, G.; Loreto Fenollosa, M.; Sanjuan, N. (2019). Assessing variability in carbon footprint throughout the food supply chain: a case study of Valencian oranges. International Journal of Life Cycle Assessment. 24(8):1515-1532. https://doi.org/10.1007/s11367-018-01580-9S15151532248Agustí M, Martínez-Fuentes A, Mesejo C (2002) Citrus fruit quality. Physiological basis and techniques of improvement. Agrociencia 6(2):1–16Altman N, Krzywinski M (2017) Points of significance: P values and the search for significance. Nat Methods 14:1–4De Backer ED, Aertsens J, Vergucht S, Steurbaut W (2009) Assessing the ecological soundness of organic and conventional agriculture by means of life cycle assessment (LCA): a case study of leek production. Brit Food J 111(10):1028–1061Beccali M, Cellura M, Iudicello M, Mistretta M (2009) Resource consumption and environmental impacts of the Agrofood sector: life cycle assessment of Italian citrus-based products. J Environ Manag 43(4):707–724Bessou C, Basset-Mens C, Latunussa C, Vélu A, Heitz H, Vannière H, Caliman JP (2016) Partial modelling of the perennial crop cycle misleads LCA results in two contrasted case studies. Int J Life Cycle Assess 21(3):297–310Boone L, De Meester S, Vandecasteele B, Muylle H, Roldán-Ruiz I, Nemecek T, Dewulf J (2016) Environmental life cycle assessment of grain maize production: an analysis of factors causing variability. Sci Total Environ 553:551–564Boulard T, Raeppel C, Brun R, Lecompte F, Hayer F, Carmassi G, Gaillard G (2011) Environmental impact of greenhouse tomato production in France. Agron Sustain Dev 31(4):757–777CAMACCDR (2017a) Generalitat Valenciana. Conselleriad’Agricultura, Med Ambient, Canvi Climatic i Desenvolupament Rural. Informe del Sector Agrari Valencià 2015. Available at: http://www.agroambient.gva.es/documents/162218839/163510152/ISAV2015/ccc50371-e0c8-4462-9f3a-259fac20c49e . Accessed 9 March 2017CAMACCDR (2017b) Generalitat Valenciana. Conselleria d’Agricultura, Med Ambient, Canvi Climatic i Desenvolupament Rural. Informe sobre la superficie ecológica 2016 Comunitat Valenciana. Available at: http://www.agroambient.gva.es/documents/162218839/164381878/INFORME+SOBRE+LA+SUPERFICIE+ECOL%C3%93GICA+2016.pdf/065f453a-5ac8-4577-8351-84127b8b1fab . Accessed 9 March 2017Canellada F, Laca A, Laca A, Díaz M (2018) Environmental impact of cheese production: a case study of a small-scale factory in southern Europe and global overview of carbon footprint. Sci Total Environ 635:167–177CAPDR (2017) Junta de Andalucía. Consejería de Agricultura, Pesca y Desarrollo Rural. Observatorio de precios y mercados. Available at: http://www.juntadeandalucia.es/agriculturaypesca/observatorio/servlet/FrontController?action=SelectInformes&claseInforme=cm&tipoInforme=por_campanna&ec=subsector&subsector=647048 . Accessed 10 March 2017Chen X, Corson MS (2014) Influence of emission-factor uncertainty and farm-characteristic variability in LCA estimates of environmental impacts of French dairy farms. J Clean Prod 81:150–157Chernick MR (2008) Bootstrap methods: a guide for practitioners and researchers. John Wilery & Sons. Inc., Hoboken, New JerseyChernick MR, LaBudde RA (2011) An introduction to bootstrap methods with applications to R. John Wiley & Sons, Hoboken, New JerseyColtro L, Mourad AL, Kletecke RM, Mendonça TA, Germer SPM (2009) Assessing the environmental profile of orange production in Brazil. Int J Life Cycle Assess 14(7):656–664Escobar N, Ribal J, Clemente G, Rodrigo A, Pascual A, Sanjuán N (2015) Uncertainty analysis in the environmental assessment of an integrated management system for restaurants and catering waste. Int J Life Cycle Assess 20(2):244–262European Union (2008) Commission Regulation 889/2008 of 5 September 2008 laying down detailed rules for the implementation of Council Regulation (EC) No 834/2007 on organic production and labelling of organic products with regard to organic production, labelling and control. Available at: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=uriserv:OJ.L_.2008.250.01.0001.01.ENG . Accessed 15 September 2017Eurostat (2018a) Recycling rates for packaging waste. Available at: https://ec.europa.eu/eurostat/tgm/refreshTableAction.do?tab=table&plugin=1&pcode=ten00063&language=en . Accessed 11 November 2018Eurostat (2018b) Recovery rates for packaging waste. Available at: https://ec.europa.eu/eurostat/tgm/refreshTableAction.do?tab=table&plugin=1&pcode=ten00062&language=en . Accessed 11 November 2018FEPEX (2017) Federación Española de Asociaciones de Productores Exportadores de Frutas, Hortalizas, Flores y Plantas vivas. EXPORTACIÓN/IMPORTACIÓN ESPAÑOLAS DE FRUTAS Y HORTALIZAS. Available at: http://www.fepex.es/datos-del-sector/exportacion-importacion-espa%C3%B1ola-frutas-hortalizas . Accessed 11 January 2017Finkbeiner M (2009) Carbon footprinting—opportunities and threats. Int J Life Cycle Assess 14:91–94Heidari MD, Mobli H, Omid M, Rafiee S, Marbini VJ, Elshout PM, Huijbregts MA (2017) Spatial and technological variability in the carbon footprint of durum wheat production in Iran. Int J Life Cycle Assess 22(12):1893–1900Heijungs R, Huijbregts M (2004) A review of approaches to treat uncertainty in LCA. In: Pahl C, Schmidt S, Jakeman T (eds) iEMSs 2004 International Congress: complexity and integrated resources management. International Environmental Modeling and Software Society, OsnabrueckHenriksson PJ, Heijungs R, Dao HM, Phan LT, de Snoo GR, Guinée JB (2015) Product carbon footprints and their uncertainties in comparative decision contexts. PLoS One 10(3):e0121221Henson S, Reardon T (2005) Private agri-food standards: implications for food policy and the agri-food system. Food Pol 30(3):241–253Hospido A, Milà i, Canals L, McLaren S, Truninger M, Edwards-Jones G, Clift R (2009) The role of seasonality in lettuce consumption: a case study of environmental and social aspects. Int J Life Cycle Assess 14(5):381–391Huijbregts MAJ (1998) Application of uncertainty and variability in LCA. Int J Life Cycle Assess 3(5):273–280Iriarte A, Almeida MG, Villalobos P (2014) Carbon footprint of premium quality export bananas: case study in Ecuador, the world's largest exporter. Sci Total Environ 472:1082–1088Jones AK, Jones DL, Cross P (2014) The carbon footprint of lamb: sources of variation and opportunities for mitigation. Agric Syst 123:97–107Josling T (2002) The impact of food industry globalization on agricultural trade policy. In: Agricultural globalization trade and the environment. Springer, Boston, pp 309–328Keyes S, Tyedmers P, Beazley K (2015) Evaluating the environmental impacts of conventional and organic apple production in Nova Scotia, Canada, through life cycle assessment. J Clean Prod 104:40–51Knudsen MT, de Almeida G, Langer V, de Abreu LS, Halberg N (2011) Environmental assessment of organic juice imported to Denmark: a case study on oranges (Citrus sinensis) from Brazil. Org Agric 1:167–185Lacirignola M, Blanc P, Girard R, Perez-Lopez P, Blanc I (2017) LCA of emerging technologies: addressing high uncertainty on inputs’ variability when performing global sensitivity analysis. Sci Total Environ 578:268–280Laurent A, Olsen SI, Hauschild MZ (2012) Limitations of carbon footprint as indicator of environmental sustainability. Environ Sci Technol 46(7):4100–4108Leys C, Ley C, Klein O, Bernard P, Licata L (2013) Detecting outliers: do not use standard deviation around the mean, use absolute deviation around the median. J Exp Soc Psychol 49(4):764–766LineRail (2014) Boletín Fundación Valencia Port. Julio –Diciembre 2013. Available at: http://www.fundacion.valenciaport.com/Articles/Newsletter/Boletin-LinePort-LineRail/Newsletter-2014.aspx . Accessed 01 April 2018Lo Giudice A, Mbohwa C, Clasadonte MT, Incrao C (2013) Environmental assessment of the citrus fruit production in Sicily using LCA. Ital J Food Sci 25(2):202De Luca AI, Falcone G, Stillitano T, Strano A, Gulisano G (2014) Sustainability assessment of quality-oriented citrus growing systems in Mediterranean area. Calitatea 15(141):103Luè A, Bresciani C, Colorni A, Lia F, Maras V, Radmilović Z, Anoyrkati E (2016) Future priorities for a climate-friendly transport: a European strategic research agenda toward 2030. Int J Sust Transpor 10(3):236–246MAPAMA (2017) Ministerio de Agricultura, Pesca, Alimentación y Medioambiente de España. Agricultura Ecológica. Estadisticas 2015. Available at: http://www.mapama.gob.es/es/alimentacion/temas/la-agricultura-ecologica/estadisticaseco 2015 connipoymetadatos_tcm7-435957.pdf . Accessed 20 January 2018Martínez-Jávega JM, Salvador A, Navarro P (2007) Adecuación del tratamiento de desverdización para minimizar alteraciones fisiológicas durante la comercialización de mandarinas. In Congreso Iberoamericano de Tecnología Postcosecha y Agroexportaciones Centro de Tecnología Postcosecha Instituto Valenciano de Investigaciones Agrarias Apartado Oficial (Vol. 46113)Meneses M, Pasqualino J, Castells F (2012) Environmental assessment of the milk life cycle: the effect of packaging selection and the variability of milk production data. J Environ Manag 107:76–83Meneses M, Torres CM, Castells F (2016) Sensitivity analysis in a life cycle assessment of an aged red wine production from Catalonia, Spain. Sci Total Environ 562:571–579Nicolo BF, De Luca AI, Stillitano T, Iofrida N, Falcone G, Gulisano G (2017) Environmental and economic sustainability assessment of navel oranges from the cultivation to the packinghouse according to environmental product declarations system. Calitatea 18(158):108Notarnicola B, Sala S, Anton A, McLaren SJ, Saouter E, Sonesson U (2017) The role of life cycle assessment in supporting sustainable Agri-food systems: a review of the challenges. J Clean Prod 140:399–409Pardo J, Soler G, Buj A (2016) Calendario de recolección de cítricos cultivados en España. Instituto Valenciano de Investigaciones Agrarias Available at: wwwiviagvaes/variedades/ Accesed 12 April 2016Pérez Neira DP, Soler Montiel MS, Delgado Cabeza MD, Reigada A (2018) Energy use and carbon footprint of the tomato production in heated multi-tunnel greenhouses in Almeria within an exporting agri-food system context. Sci Total Environ 628:1627–1636Pergola M, D'Amico M, Celano G, Palese A, Scuderi A, Di Vita G et al (2013) Sustainability evaluation of Sicily's lemon and orange production: an energy, economic and environmental analysis. J Environ Manag 128:674–682Poore J, Nemecek T (2018) Reducing food’s environmental impacts through producers and consumers. Science 360(6392):987–992Renouf MA, Wegener MK, Pagan RJ (2010) Life cycle assessment of Australian sugarcane production with a focus on sugarcane growing. Int J Life Cycle Assess 15(9):927–937Ribal J, Ramírez-Sanz C, Estruch V, Clemente G, Sanjuán N (2017) Organic versus conventional citrus. Impact assessment and variability analysis in the Comunitat Valenciana (Spain). Int J Life Cycle Assess 22(4):571–586Roibás L, Loiseau E, Hospido A (2017) Determination of the carbon footprint of all Galician production and consumption activities: lessons learnt and guidelines for policymakers. J Environ Manag 198:289–299Röös E, Sundberg C, Hansson PA (2010) Uncertainties in the carbon footprint of food products: a case study on table potatoes. Int J Life Cycle Assess 15(5):478–488Röös E, Sundberg C, Hansson PA (2011) Uncertainties in the carbon footprint of refined wheat products: a case study on Swedish pasta. Int J Life Cycle Assess 16(4):338–350Sanjuan N, Ubeda L, Clemente G, Mulet A, Girona F (2005) LCA of integrated orange production in the Comunidad Valenciana (Spain). Int J Agric Res Gov Ecol (2):163–177SI, PAS 2050–1:2012 (2012) Assessment of life cycle greenhouse gas emissions from horticultural products—supplementary requirements for the cradle to gate stages of GHG assessments of horticultural products undertaken in accordance with PAS 2050. British Standards Institution, Londonda Silva VP, van der Werf HM, Spies A, Soares SR (2010) Variability in environmental impacts of Brazilian soybean according to crop production and transport scenarios. J Environ Manag 91(9):1831–1839 Steinmann ZJ, Hauck M, Karuppiah R, Laurenzi IJ, Huijbregts MA (2014) A methodology for separating uncertainty and variability in the life cycle greenhouse gas emissions of coal-fueled power generation in the USA. Int J Life Cycle Assess 19(5):1146–1155Van der Krogt D, Nilsson J, Host V (2007) The impact of cooperatives’ risk aversion and equity capital constraints on their inter-firm consolidation and collaboration strategies—with an empirical study of the European dairy industry. Agribusiness 23(4):453–472Vinyes E, Asin L, Alegre S, Muñoz P, Boschmonart J, Gasol CM (2017) Life cycle assessment of apple and peach production, distribution and consumption in Mediterranean fruit sector. J Clean Prod 149:313–320Webb J, Williams AG, Hope E, Evans D, Moorhouse E (2013) Do foods imported into the UK have a greater environmental impact than the same foods produced within the UK? Int J Life Cycle Assess 18(7):1325–1343Weber CL, Matthews HS (2008) Food-miles and the relative climate impacts of food choices in the United States. Environ Sci Technol 42(10):3508–3513Weidema BP, Thrane M, Christensen P, Schmidt J, Løkke S (2008) Carbon footprint. A catalyst for life cycle assessment? J Ind Ecol 12(1):3–6Williams AG, Audsley E, Sandars DL (2010) Environmental burdens of producing bread wheat, oilseed rape and potatoes in England and Wales using simulation and system modelling. Int J Life Cycle Assess 15(8):855–868Zaragozà JL (2016) Nueva ruta desde Valencia al norte de Europa para impulsar la exportación citrícola. Levante, 12/05/2016. Available at: https://www.levante-emv.com/economia/2016/05/12/nueva-ruta-valencia-saltarse-veto/1416515.html . Accessed 1 April 201

    RMCMC: A System for Updating Bayesian Models

    Get PDF
    A system to update estimates from a sequence of probability distributions is presented. The aim of the system is to quickly produce estimates with a user-specified bound on the Monte Carlo error. The estimates are based upon weighted samples stored in a database. The stored samples are maintained such that the accuracy of the estimates and quality of the samples is satisfactory. This maintenance involves varying the number of samples in the database and updating their weights. New samples are generated, when required, by a Markov chain Monte Carlo algorithm. The system is demonstrated using a football league model that is used to predict the end of season table. Correctness of the estimates and their accuracy is shown in a simulation using a linear Gaussian model

    Tiny microbes, enormous impacts: what matters in gut microbiome studies?

    Get PDF
    Many factors affect the microbiomes of humans, mice, and other mammals, but substantial challenges remain in determining which of these factors are of practical importance. Considering the relative effect sizes of both biological and technical covariates can help improve study design and the quality of biological conclusions. Care must be taken to avoid technical bias that can lead to incorrect biological conclusions. The presentation of quantitative effect sizes in addition to P values will improve our ability to perform meta-analysis and to evaluate potentially relevant biological effects. A better consideration of effect size and statistical power will lead to more robust biological conclusions in microbiome studies

    Evaluation of sampling strategies for age determination of cod (Gadus morhua) sampled at the North Sea International Bottom Trawl Survey

    Get PDF
    The North Sea cod stock assessment is based on indices of abundance-at-age from fishery-independent bottom trawl surveys. The age structure of the catch is estimated by sampling fish for otoliths collection in a length-stratified manner from trawl hauls. Since age determination of fish is costly and time consuming, only a fraction of fish is sampled for age from a larger sample of the length distribution and an age–length key (ALK) is then used to obtain the age distribution. In this study, we evaluate ALK estimators for calculating the indices of abundance-at-age, with and without the assumption of constant age–length structures over relatively large areas. We show that the ALK estimators give similar point estimates of abundance-at-age and yield similar performance with respect to precision. We also quantify the uncertainty of indices of abundance and examine the effect of reducing the number of fish sampled for age determination on precision. For various subsampling strategies of otoliths collection, we show that one fish per 5-cm-length group width per trawl haul is sufficient and the total number of fish subsampled for age from trawl surveys could be reduced by at least half (50%) without appreciable loss in precision.publishedVersio

    Data Analysis and Experimental Design for Accelerated Life Testing with Heterogeneous Group Effects

    Get PDF
    abstract: In accelerated life tests (ALTs), complete randomization is hardly achievable because of economic and engineering constraints. Typical experimental protocols such as subsampling or random blocks in ALTs result in a grouped structure, which leads to correlated lifetime observations. In this dissertation, generalized linear mixed model (GLMM) approach is proposed to analyze ALT data and find the optimal ALT design with the consideration of heterogeneous group effects. Two types of ALTs are demonstrated for data analysis. First, constant-stress ALT (CSALT) data with Weibull failure time distribution is modeled by GLMM. The marginal likelihood of observations is approximated by the quadrature rule; and the maximum likelihood (ML) estimation method is applied in iterative fashion to estimate unknown parameters including the variance component of random effect. Secondly, step-stress ALT (SSALT) data with random group effects is analyzed in similar manner but with an assumption of exponentially distributed failure time in each stress step. Two parameter estimation methods, from the frequentist’s and Bayesian points of view, are applied; and they are compared with other traditional models through simulation study and real example of the heterogeneous SSALT data. The proposed random effect model shows superiority in terms of reducing bias and variance in the estimation of life-stress relationship. The GLMM approach is particularly useful for the optimal experimental design of ALT while taking the random group effects into account. In specific, planning ALTs under nested design structure with random test chamber effects are studied. A greedy two-phased approach shows that different test chamber assignments to stress conditions substantially impact on the estimation of unknown parameters. Then, the D-optimal test plan with two test chambers is constructed by applying the quasi-likelihood approach. Lastly, the optimal ALT planning is expanded for the case of multiple sources of random effects so that the crossed design structure is also considered, along with the nested structure.Dissertation/ThesisDoctoral Dissertation Industrial Engineering 201
    • …
    corecore