3,392,041 research outputs found

    Modelling football match scoring outcomes using multilevel models

    Get PDF
    Multilevel modelling technique recognizes the existence of hierarchal structures in the data by allowing for random effects at each level in the hierarchy, thus assessing the variation in the dependent variable at several hierarchical levels simultaneously. Multilevel modelling is becoming an increasingly popular technique for analysing nested data with such popularity accredited to the computational advances in the last two decades. In many sports, including football, the game fixtures are nested within seasons, which in turn are nested within country leagues invoking a multilevel structure in the data. Many gaming companies engage in sport data analysis in a bid to understand the dynamics and patterns of the game. This will assist the gaming company in developing fantasy sport games that will enhance gamer engagement and augment revenue to the company. This paper presents a comprehensive description of two and three level models, which are applied to a real football data set accessed from an online free football betting portal. The aim is to examine the relationship between the number of goals scored during a football match and several game-related predictors. These multilevel models, which assume a Poisson distribution and a logarithmic function, are implemented using the facilities of GLLAMM (Generalized Linear Latent and Mixed Models), which is a subroutine of STATA.peer-reviewe

    Modeling survival times using frailty models

    Get PDF
    Traditional survival models, including Kaplan Meier, Nelson Aalen and Cox regression assume a homogeneous population; however, these are inappropriate in the presence of heterogeneity. The introduction of frailty models four decades ago addressed this limitation. Fundamentally, frailty models apply the same principles of survival theory, however, they incorporate a multiplicative term in the distribution to address the impact of frailty and cater for any underlying unobserved heterogeneity. These frailty models will be used to relate survival durations for censored data to a number of pre-operative, operative and post-operative patient related variables to identify risks factors. The study is mainly focused on fitting shared and unshared frailty models to account for unobserved frailty within the data and simultaneously identify the risk factors that best predict the hazard of death.peer-reviewe

    Quantifying and Explaining Causal Effects of World Bank Aid Projects

    Get PDF
    In recent years, machine learning methods have enabled us to predict with good precision using large training data, such as deep learning. However, for many problems, we care more about causality than prediction. For example, instead of knowing that smoking is statistically associated with lung cancer, we are more interested in knowing that smoking is the cause of lung cancer. With causality, we can understand how the world progresses and how impacts are made on an outcome by influencing the cause. This thesis explores how to quantify the causal effects of a treatment on an observable outcome in the presence of heterogeneity. We focus on investigating the causal impacts that World Bank projects have on environmental changes. This high dimensional World Bank data set includes covariates from various sources and of different types, including time series data, such as the Normalized Difference Vegetation Index (NDVI) values, temperature and precipitation, spatial data such as longitude and latitude, and many other features such as distance to roads and rivers. We estimate the heterogeneous causal effect of World Bank projects on the change of NDVI values. Based on causal tree and causal forest proposed by Athey, we described the challenges we met and lessons we learned when applying these two methods to an actual World Bank data set. We show our observations of the heterogeneous causal effect of the World Bank projects on the change of environment. as we do not have the ground truth for the World Bank data set, we validate the results using synthetic data for simulation studies. The synthetic data is sampled from distributions fitted with the World Bank data set. We compared the results among various causal inference methods and observed that feature scaling is very important to generating meaningful data and results. in addition, we investigate the performance of the causal forest with various parameters such as leaf size, number of confounders, and data size. Causal forest is a black-box model, and the results from it cannot be easily interpreted. The results are also hard for humans to understand. By taking advantage of the tree structure, the neighbors of the project to be explained are selected. The weights are assigned to the neighbors according to dynamic distance metrics. We can learn a linear regression model with the neighbors and interpret the results with the help of the learned linear regression model. in summary, World Bank projects have small impacts on the change to the environment, and the result of an individual project can be interpreted using a linear regression model learned from closed projects

    Standard error estimation for EM applications related to Latent class models

    Get PDF
    The EM algorithm is a popular method for computing maximum likelihood estimates. It tends to be numerically stable, reduces execution time compared to other estimation procedures and is easy to implement in latent class models. However, the EM algorithm fails to provide a consistent estimator of the standard errors of maximum likelihood estimates in incomplete data applications. Correct standard errors can be obtained by numerical differentiation. The technique requires computation of a complete-data gradient vector and Hessian matrix, but not those associated with the incomplete data likelihood. Obtaining first and second derivatives numerically is computationally very intensive and execution time may become very expensive when fitting Latent class models using a Newton-type algorithm. When the execution time is too high one is motivated to use the EM algorithm solution to initialize the Newton Raphson algorithm. We also investigate the effect on the execution time when a final Newton-Raphson step follows the EM algorithm after convergence. In this paper we compare the standard errors provided by the EM and Newton-Raphson algorithms for two models and analyze how this bias is affected by the number of parameters in the model fit.peer-reviewe

    Investigating the factors which affect the performance of the EM algorithm in Latent class models

    Get PDF
    Latent class models have been used extensively in market segmentation to divide a total market into market groups of consumers who have relatively similar product needs and preferences. The advantage of these models over traditional clustering techniques lies in simultaneous estimation and segmentation, which is carried out using the EM algorithm. The identification of consumer segments allows target-marketing strategies to be developed. The data comprises the rating responses of 262 respondents to 24 laptop profiles described by four item attributes including the brand, price, random access memory (RAM) and the screen size. Using the facilities of R Studio, two latent class models were fitted by varying the number of clusters from 2 to 3. The parameter estimates obtained from these two latent class models were used to simulate a number of data sets for each cluster solution to be able to conduct a Monte-Carlo study, which investigates factors that have an effect on segment membership and parameter recovery and affect computational effort.peer-reviewe

    Using Item response models to investigate attitudes towards divorce

    Get PDF
    Item Response Theory (IRT) is a form of latent structure analysis that is used to analyze binary or ordinal response data. IRT models are used to evaluate the relationships between the latent trait of interest and the items measuring the trait. Several IRT models will be fitted to assess the factors that lead to divorce in the Maltese Islands. The 1-PL and 2-PL logistic Rasch models are used for dichotomous responses, whereas the 1-PL rating scale and 1-PL partial-credit models are used for polytomous responses. All the models are fitted using the generalized linear latent and mixed modeling (GLLAMM) framework. The gllamm directive estimates parameters by maximum likelihood using adaptive quadrature (Rabe-Hesketh, Skrondal, and Pickles 2002; 2005). In the 1-PL Rasch model, the probability that a person agrees with a divorce-related item is modeled as a function of subject ability and item difficulty parameters. The major weakness of this model is that the items have the same discrimination parameter. In the 2-PL Birnbaum model, an item-specific weight is added so that the slope of the item response function varies between the items. The 1-PL rating scale model specifies that the items share the same rating scale structure, while the 1-PL partial credit model specifies a distinct rating scale structure for each item.peer-reviewe

    Agenda: The Economic Impacts of Sea-Level Rise in Hampton Roads: An Appraisal of the Projects Underway

    Get PDF
    Agenda for the workshop The Economic Impacts of Sea-Level Rise in Hampton Roads: An Appraisal of the Projects Underway on May 18, 2016 at the Virginia Modeling and Simulation Center, 1030 University Blvd, Suffolk, VA 2343

    Proceedings, MSVSCC 2018

    Get PDF
    Proceedings of the 12th Annual Modeling, Simulation & Visualization Student Capstone Conference held on April 19, 2018 at VMASC in Suffolk, Virginia. 155 pp

    Design project planning, monitoring and re-planning through process simulation

    Get PDF
    Effective management of design schedules is a major concern in industry, since timely project delivery can have a significant influence on a company’s profitability. Based on insights gained through a case study of planning practice in aero-engine component design, this paper examines how task network simulation models can be deployed in a new way to support design process planning. Our method shows how simulation can be used to reconcile a description of design activities and information flows with project targets such as milestone delivery dates. It also shows how monitoring and re-planning can be supported using the non-ideal metrics which the case study revealed are used to monitor processes in practice. The approach is presented as a theoretical contribution which requires further work to implement and evaluate in practice
    • …
    corecore