34 research outputs found

    Cohort aggregation modelling for complex forest stands: Spruce-aspen mixtures in British Columbia

    Full text link
    Mixed-species growth models are needed as a synthesis of ecological knowledge and for guiding forest management. Individual-tree models have been commonly used, but the difficulties of reliably scaling from the individual to the stand level are often underestimated. Emergent properties and statistical issues limit their effectiveness. A more holistic modelling of aggregates at the whole stand level is a potentially attractive alternative. This work explores methodology for developing biologically consistent dynamic mixture models where the state is described by aggregate stand-level variables for species or age/size cohorts. The methods are demonstrated and tested with a two-cohort model for spruce-aspen mixtures named SAM. The models combine single-species submodels and submodels for resource partitioning among the cohorts. The partitioning allows for differences in competitive strength among species and size classes, and for complementarity effects. Height growth reduction in suppressed cohorts is also modelled. SAM fits well the available data, and exhibits behaviors consistent with current ecological knowledge. The general framework can be applied to any number of cohorts, and should be useful as a basis for modelling other mixed-species or uneven-aged stands.Comment: Accepted manuscript, to appear in Ecological Modellin

    Developing Efficient Strategies For Global Sensitivity Analysis Of Complex Environmental Systems Models

    Get PDF
    Complex Environmental Systems Models (CESMs) have been developed and applied as vital tools to tackle the ecological, water, food, and energy crises that humanity faces, and have been used widely to support decision-making about management of the quality and quantity of Earth’s resources. CESMs are often controlled by many interacting and uncertain parameters, and typically integrate data from multiple sources at different spatio-temporal scales, which make them highly complex. Global Sensitivity Analysis (GSA) techniques have proven to be promising for deepening our understanding of the model complexity and interactions between various parameters and providing helpful recommendations for further model development and data acquisition. Aside from the complexity issue, the computationally expensive nature of the CESMs precludes effective application of the existing GSA techniques in quantifying the global influence of each parameter on variability of the CESMs’ outputs. This is because a comprehensive sensitivity analysis often requires performing a very large number of model runs. Therefore, there is a need to break down this barrier by the development of more efficient strategies for sensitivity analysis. The research undertaken in this dissertation is mainly focused on alleviating the computational burden associated with GSA of the computationally expensive CESMs through developing efficiency-increasing strategies for robust sensitivity analysis. This is accomplished by: (1) proposing an efficient sequential sampling strategy for robust sampling-based analysis of CESMs; (2) developing an automated parameter grouping strategy of high-dimensional CESMs, (3) introducing a new robustness measure for convergence assessment of the GSA methods; and (4) investigating time-saving strategies for handling simulation failures/crashes during the sensitivity analysis of computationally expensive CESMs. This dissertation provides a set of innovative numerical techniques that can be used in conjunction with any GSA algorithm and be integrated in model building and systems analysis procedures in any field where models are used. A range of analytical test functions and environmental models with varying complexity and dimensionality are utilized across this research to test the performance of the proposed methods. These methods, which are embedded in the VARS–TOOL software package, can also provide information useful for diagnostic testing, parameter identifiability analysis, model simplification, model calibration, and experimental design. They can be further applied to address a range of decision making-related problems such as characterizing the main causes of risk in the context of probabilistic risk assessment and exploring the CESMs’ sensitivity to a wide range of plausible future changes (e.g., hydrometeorological conditions) in the context of scenario analysis

    Bayesian semiparametric and flexible models for analyzing biomedical data

    Get PDF
    In this thesis I develop novel Bayesian inference approaches for some typical data analysis problems as they arise with biomedical data. The common theme is the use of flexible and semi-parametric Bayesian models and computation intensive simulation-based implementations. In chapter 2, I propose a new approach for inference with multivariate ordinal data. The application concerns the assessment of toxicities in a phase III clinical trial. The method generalizes the ordinal probit model. It is based on flexible mixture models. In chapter 3, I develop a semi-parametric Bayesian approach for bio-panning phage display experiments. The nature of the model is a mixed effects model for repeated count measurements of peptides. I develop a non-parametric Bayesian random effects distribution and show how it can be used for the desired inference about organ-specific binding. In chapter 4, I introduce a variation of the product partition model with a non-exchangeable prior structure. The model is applied to estimate the success rates in a phase II clinical of patients with sarcoma. Each patient presents one subtype of the disease and subtypes are grouped by good, intermediate and poor prognosis. The prior model respects the varying prognosis across disease subtypes. Two subtypes with equal prognoses are more likely a priori to have similar success rates than two subtypes with different prognoses

    Attitudes towards old age and age of retirement across the world: findings from the future of retirement survey

    Get PDF
    The 21st century has been described as the first era in human history when the world will no longer be young and there will be drastic changes in many aspects of our lives including socio-demographics, financial and attitudes towards the old age and retirement. This talk will introduce briefly about the Global Ageing Survey (GLAS) 2004 and 2005 which is also popularly known as “The Future of Retirement”. These surveys provide us a unique data source collected in 21 countries and territories that allow researchers for better understanding the individual as well as societal changes as we age with regard to savings, retirement and healthcare. In 2004, approximately 10,000 people aged 18+ were surveyed in nine counties and one territory (Brazil, Canada, China, France, Hong Kong, India, Japan, Mexico, UK and USA). In 2005, the number was increased to twenty-one by adding Egypt, Germany, Indonesia, Malaysia, Poland, Russia, Saudi Arabia, Singapore, Sweden, Turkey and South Korea). Moreover, an additional 6320 private sector employers was surveyed in 2005, some 300 in each country with a view to elucidating the attitudes of employers to issues relating to older workers. The paper aims to examine the attitudes towards the old age and retirement across the world and will indicate some policy implications

    Multi-objective and multi-variate global sensitivity analysis of the soil-crop model XN-CERES in Southwest Germany

    Get PDF
    Soil-crop models enjoy ever-greater popularity as tools to assess the im- pact of environmental changes or management strategies on agricultural production. Soil-crop models are designed to coherently simulate the crop, nitrogen (N) and water dynamics of agricultural fields. However, soil-crop models depend on a vast number of uncertain model inputs, i.e., initial conditions and parameters. To assess the uncertainty in the simulation results (UCSR) and how they can be apportioned among the model inputs of the XN-CERES soil-crop model, an uncertainty and global sensitivity analysis (GSA) was conducted. We applied two different GSA methods, moment-independent and variance-based methods in the sense of the Factor Prioritization and the Factor Fixing setting. The former identifies the key drivers of uncertainty, i.e., which model input, if fixed to its true value, would lead to the greatest reduction of the UCSR. The latter identifies the model inputs that cannot be fixed at any value within their value range without affecting the UCSR. In total we calculated six sensitivity indices (SIs). The overall objective was to assess the cross-sub-model impact of parameters and the overall determinability of the XN-CERES applied on a deep loess soil profile in Southwest Germany. Therefore, we selected 39 parameters and 16 target variables (TGVs) to be included in the GSA. Furthermore, we assessed a weekly time series of the parameter sensitivities. The sub-models were crop, water, nitrogen and flux. In addition, we also compared moment-independent (MI) and variance-based (VB) GSA methods for their suitability for the two settings. The results show that the parameters of the TGVs of the four groups cannot be considered independently. Each group is impacted by the parameters of the other groups. Crop parameters are most important, followed by the Mualem van Genuchten (MvG) parameters. The nitrate (NO3-) content and the matric potential are the two TGVs that are most affected by the inter- action of parameters, especially crop and MvG parameters. However, the model output of these two TGVs is highly skewed and leptokrutic. Therefore, the variance is an unsuitable representation of the UCSR, and the reliability of the variance-based sensitivity indices SIVB is curtailed. Nitrogen group parameters play an overall minor role for the uncertainty of the whole XN-CERES, but nitrification rates can be calibrated on ammonium (NH4+) measurements. Considering the initial conditions shows the high importance of the initial NO3-; content. If it could be fixed, the uncertainty of crop groups TGVs, the matric potential and the N content in the soil could be reduced. Hence, multi-year predictions of yield suffer from uncertainty due to the simulated NO3-; content. Temporally resolved parameter show the big dependence between the crops development stage and the other 15 TGVs becomes visible. High temporally resolved measurements of the development stage are important to univocally estimate the crop parameters and reduce the uncertainty in the vegetative and generative biomass. Furthermore, potential periods of water and N-limiting situations are assessed, which is helpful for deriving management strategies. In addition, it become clear that measurement campaigns should be conducted at the simulation start and during the vegetation period to have enough information to calibrate the XN-CERES. Regarding the performance of the different GSA methods and the different SIs, we conclude that the sensitivity measure relying on the Kolmogorov-Smirnov metric (betaks) is most stable. It converges quickly and has no issues with highly skewed and leptokrutic model output distributions. The assessments of the first-effect index and the betaks provide information on the additivity of the model and parameters that cannot be fixed without impacting the simulation results. In summary, we could only identify three parameters that have no direct impact on any TGV at any time and are hence not determinable from any measurements of the TGVs considered. Furthermore, we can conclude that the groups parameters should not be calibrated independently because they always affect the uncertainty of the selected TGV directly or via interacting. However, no TGV is suitable to calibrate all parameters. Hence, the calibration of the XN-CERES requires measurements of TGVs from each group, even if the modeler is only interested in one specific TGV, e.g., yield. The GSA should be repeated in a drier climate or with restricted rooting depth. The convergence of the values for the Sobol indices remains an issue. Even larger sample sizes, another convergence criteria or graphical inspection cannot alleviate the issue. However, we can conclude that the sub-models of the XN-CERES cannot be considered in- dependently and that the model does what it is designed for: coherently simulating the crop, N and water dynamics with their interactions.Boden-Pflanze Modelle erfreuen sich immer größerer Beliebtheit, um die Auswirkungen von Umweltveränderungen und Managementstrategien auf die landwirtschaftliche Produktion zu bestimmen. Boden-Pflanzen Modelle sind so konzipiert, dass sie kohärent die Pflanzen-, Stickstoff- (N) und Wasserdynamik in landwirtschaftlichen Feldern simulieren. Leider hängen Boden-Pflanze Modelle von einer Vielzahl unsicherer Modellinputs wie Anfangs- und Randbedingungen sowie Parametern ab. Zur Bestimmung der Unsicherheit in den Simulationsergebnissen (UCSR) und in welchem Ausmaß diese von den Modellinputs des Boden-Pflanze Modells XN-CERES abhängt, wird in dieser Arbeit eine Unsicherheits- und Global Sensitivitäts Analyse (GSA) durchgeführt. Wir verwendeten zwei verschiedene GSA-Methoden, momentunabhängige und varianzbasierte Methoden, im Sinne der Settings: Faktor Priorisierung und Faktor Fixing. Ersteres identifiziert die Parameter, die durch Fixierung zur größten Reduktion der UCSR führen. Letzteres identifiziert die Parameter, die nicht fixiert werden können, ohne die UCSR zu beeinflussen. Insgesamt haben wir sechs verschiedene Sensitivitäts Indices (SIs) berechnet. Das übergeordnete Ziel der Arbeit war es die Teilmodell-übergreifende Wirkung der Parameter und die allgemeinen Bestimmbarkeit des Boden-Pflanzen Modells XN-CERES auf einem Lössstandort in Südwest Deutschland zu quantifizieren. Wir haben insgesamt 39 Parameter und 16 Zielvariablen (TGV) für die GSA ausgewählt. Darüber hinaus lösen wir die Parametersensitivitäten für die vier Teilmodelle Pflanze, Wasser, Stickstoff und Flüsse wöchentlich auf. Darüber hinaus vergleichen wir Moment unabhängige (MI) und Varianz basierte (VB) GSA Methoden und ihre Eignung für die beiden Settings für ein Boden-Pflanze Model. Die Ergebnisse zeigen, dass die Parameter der vier Gruppen im hohen Maße voneinander abhängen. Die Pflanzenparameter haben einen Einfluss auf jede der 16 TGVs. Es folgen die Mualem van Genuchten (MvG) Parameter. Der Nitrat (NO3-) Gehalt und das Matrixpotential am stärksten von Parameterinteraktionen betroffen sind. Allerdings sind die Verteilungen dieser beiden TGVs schief und leptokurtisch. Daher ist die Varianz eine schlechte Repräsentation für die UCSR und die Zuverlässigkeit der Varianz basierten Sensitivitätsindices (SIVB) entsprechend eingeschränkt. Die Parameter der Stickstoffgruppe spielen insgesamt eine untergeordnete Rolle. Die Betrachtung der Anfangsbedingungen zeigt, dass die Unsicherheit in der Simulation der TGVs der Pflanzengruppen, des Matrixpotentials und des N-Gehalts im Boden durch deren akurate Messung stark reduziert werden kann. Vorhersagen für Fruchtfolgen sind folglich unsicher, da der simulierte Ertrag der Hauptfrucht vom Zustand des Bodens nach der Vorfrucht abhängt. Zeitaufgelöste Parametersensitivitäten zeigen die große Abhängigkeit zwischen dem Entwicklungsstadium der Pflanze und den andern 15 TGVs wird sichtbar. Hochauflösende Messungen des Entwicklungsstadiums der Pflanze sind wichtig, um die Pflanzenparameter eindeutig kalibrieren zu können. Ebenfalls können durch zeitaufgelöste Parametersensitivitäten, Zeiträume von möglicher Wasser- und N-Knappheit identifiziert werden. Dies ist besonders wichtig für die Erstellung von Managementstrategien. Messungen sollten vorrangig zu Simulationsbeginn und während der Vegetationsperiode durchgeführt werden, um genügend Informationen für die Kalibrierung des Modells zu erhalten. Bezüglich der Leistung der verschiedenen GSA Methoden und der unterschiedlichen SIs, kommen wir zu dem Ergebnis, dass das auf der Kolmogorov-Smirnov Metrik basierte Sensitivitätsmaß (betaks) am stabilsten ist. Es konvergiert schnell und hat keine Probleme mit stark schiefen und leptokurtischen Verteilungen. Die Kombination aus First-Effect Index und betaks gibt Aufschluss über die Additivität des Modells und identifiziert Parameter, die nicht fixiert werden können. Zusammenfassend lässt sich sagen, dass wir nur drei Parameter identifizieren konnten, die keinen direkten Einfluss auf eine der untersuchten TGV haben. Der direkte Einfluss weiterer acht Parameter ist so gering, dass deren Kalibrierung schwierig ist. Darüber hinaus kommen wir zu dem Schluss, dass die Parameter der verschiedenen Gruppen nicht unabhängig voneinander kalibriert werden können. Weiter ist nicht jede TGV zur Kalibrierung aller Parameter geeignet. Für die Kalibrierung der gewählten Modellkombination sind daher Messungen von TGVs jeder Gruppe erforderlich, auch wenn nur Interesse an einer bestimmten TGV wie zum Beispiel dem Ertrag besteht. Aus der Arbeit ergeben sich einige generelle Empfehlungen. So sollte die GSA in einem trockeneren Klima oder mit eingeschränkter Durchwurzelungstiefe durchgeführt werden. Die Konvergenz der Werte für die Sobol-Indizes ist problematisch. Noch größere Stichprobengrößen, weitere Konvergenzkriterien oder grafische Prüfungen könnten hier Abhilfe schaffen

    Untangling hotel industry’s inefficiency: An SFA approach applied to a renowned Portuguese hotel chain

    Get PDF
    The present paper explores the technical efficiency of four hotels from Teixeira Duarte Group - a renowned Portuguese hotel chain. An efficiency ranking is established from these four hotel units located in Portugal using Stochastic Frontier Analysis. This methodology allows to discriminate between measurement error and systematic inefficiencies in the estimation process enabling to investigate the main inefficiency causes. Several suggestions concerning efficiency improvement are undertaken for each hotel studied.info:eu-repo/semantics/publishedVersio

    Exploring the value of big data analysis of Twitter tweets and share prices

    Get PDF
    Over the past decade, the use of social media (SM) such as Facebook, Twitter, Pinterest and Tumblr has dramatically increased. Using SM, millions of users are creating large amounts of data every day. According to some estimates ninety per cent of the content on the Internet is now user generated. Social Media (SM) can be seen as a distributed content creation and sharing platform based on Web 2.0 technologies. SM sites make it very easy for its users to publish text, pictures, links, messages or videos without the need to be able to program. Users post reviews on products and services they bought, write about their interests and intentions or give their opinions and views on political subjects. SM has also been a key factor in mass movements such as the Arab Spring and the Occupy Wall Street protests and is used for human aid and disaster relief (HADR). There is a growing interest in SM analysis from organisations for detecting new trends, getting user opinions on their products and services or finding out about their online reputation. Companies such as Amazon or eBay use SM data for their recommendation engines and to generate more business. TV stations buy data about opinions on their TV programs from Facebook to find out what the popularity of a certain TV show is. Companies such as Topsy, Gnip, DataSift and Zoomph have built their entire business models around SM analysis. The purpose of this thesis is to explore the economic value of Twitter tweets. The economic value is determined by trying to predict the share price of a company. If the share price of a company can be predicted using SM data, it should be possible to deduce a monetary value. There is limited research on determining the economic value of SM data for “nowcasting”, predicting the present, and for forecasting. This study aims to determine the monetary value of Twitter by correlating the daily frequencies of positive and negative Tweets about the Apple company and some of its most popular products with the development of the Apple Inc. share price. If the number of positive tweets about Apple increases and the share price follows this development, the tweets have predictive information about the share price. A literature review has found that there is a growing interest in analysing SM data from different industries. A lot of research is conducted studying SM from various perspectives. Many studies try to determine the impact of online marketing campaigns or try to quantify the value of social capital. Others, in the area of behavioural economics, focus on the influence of SM on decision-making. There are studies trying to predict financial indicators such as the Dow Jones Industrial Average (DJIA). However, the literature review has indicated that there is no study correlating sentiment polarity on products and companies in tweets with the share price of the company. The theoretical framework used in this study is based on Computational Social Science (CSS) and Big Data. Supporting theories of CSS are Social Media Mining (SMM) and sentiment analysis. Supporting theories of Big Data are Data Mining (DM) and Predictive Analysis (PA). Machine learning (ML) techniques have been adopted to analyse and classify the tweets. In the first stage of the study, a body of tweets was collected and pre-processed, and then analysed for their sentiment polarity towards Apple Inc., the iPad and the iPhone. Several datasets were created using different pre-processing and analysis methods. The tweet frequencies were then represented as time series. The time series were analysed against the share price time series using the Granger causality test to determine if one time series has predictive information about the share price time series over the same period of time. For this study, several Predictive Analytics (PA) techniques on tweets were evaluated to predict the Apple share price. To collect and analyse the data, a framework has been developed based on the LingPipe (LingPipe 2015) Natural Language Processing (NLP) tool kit for sentiment analysis, and using R, the functional language and environment for statistical computing, for correlation analysis. Twitter provides an API (Application Programming Interface) to access and collect its data programmatically. Whereas no clear correlation could be determined, at least one dataset was showed to have some predictive information on the development of the Apple share price. The other datasets did not show to have any predictive capabilities. There are many data analysis and PA techniques. The techniques applied in this study did not indicate a direct correlation. However, some results suggest that this is due to noise or asymmetric distributions in the datasets. The study contributes to the literature by providing a quantitative analysis of SM data, for example tweets about Apple and its most popular products, the iPad and iPhone. It shows how SM data can be used for PA. It contributes to the literature on Big Data and SMM by showing how SM data can be collected, analysed and classified and explore if the share price of a company can be determined based on sentiment time series. It may ultimately lead to better decision making, for instance for investments or share buyback

    Proceedings of the 36th International Workshop Statistical Modelling July 18-22, 2022 - Trieste, Italy

    Get PDF
    The 36th International Workshop on Statistical Modelling (IWSM) is the first one held in presence after a two year hiatus due to the COVID-19 pandemic. This edition was quite lively, with 60 oral presentations and 53 posters, covering a vast variety of topics. As usual, the extended abstracts of the papers are collected in the IWSM proceedings, but unlike the previous workshops, this year the proceedings will be not printed on paper, but it is only online. The workshop proudly maintains its almost unique feature of scheduling one plenary session for the whole week. This choice has always contributed to the stimulating atmosphere of the conference, combined with its informal character, encouraging the exchange of ideas and cross-fertilization among different areas as a distinguished tradition of the workshop, student participation has been strongly encouraged. This IWSM edition is particularly successful in this respect, as testified by the large number of students included in the program
    corecore