1,250,188 research outputs found

    An Introduction to Fitting and Evaluating Mixed-effects Models in R

    Get PDF
    Mixed-effects modeling is a multidimensional statistical analysis capable of modeling complex relationships between predictor and outcome variables while accounting for random variance in various dimensions of the data. Although this technique is gaining popularity in applied linguistics research, learning how to model, and how to do so in R, can be intimidating. This guide provides an introduction to fitting mixed-effects models in R (Version 3.5.3) using RStudio. It includes a written introduction describing the modeling process, a video tutorial that focuses on getting started in RStudio, a sample data set, and an R script containing code to analyze the data. By the end of this introduction, researchers should have developed a basic understanding of the modeling process and should be able to (1) read data into R and inspect its structure, (2) create a series of plots to visualize trends and/or primary variables, and (3) fit and evaluate models

    Longitudinal analysis of three-dimensional facial shape data

    Get PDF
    Shape data encompass all the information that is left to describe a shape following removal of location, rotation and scale effects. Much work has been done in the analysis of two-dimensional shapes depicted by anatomical landmarks placed at points of importance. Less has been carried out in the area of three-dimensional shapes, particularly in terms of growth or change over time. This thesis considers the analysis of such longitudinal three-dimensional shape data. In doing so, two well established but normally unrelated areas of Statistics are brought together: those of longitudinal data analysis (specifically, linear mixed effects models) and shape analysis. A recently proposed method of analysing longitudinal high-dimensional data is presented in a novel application within the area of shape analysis, illustrated by a study comparing the facial shapes of cleft-lip and palate children with controls as they grow from three months to two years of age. Both anatomical landmarks and facial curves are considered. Chapter 1 broadly introduces the areas of shape analysis, linear mixed effects models and dimension reduction. Standard methods for measuring shapes are introduced, along with the difficulties inherent in analysing the resulting data. A broad overview of the methods of aligning individual shapes to remove the unwanted effects of location, rotation and scale is given, along with related geometrical issues in terms of the high-dimensional space in which a set of shapes resides. A general introduction to linear mixed effects models compares and contrasts them with simple linear models, explaining the reasons behind using them and presenting the different specifications of the conditional and marginal models. The area of dimension reduction is touched upon, specifically introducing B-splines and principal components analysis, with reference to the analysis of curves consisting of many points at small increments to one another. The data from the cleft-lip and palate study are introduced, along with a discussion of the primary interest of the analysis and the issue of missing data. Chapter 2 presents the statistical definition of a shape and introduces the area of statistical shape analysis in detail, specifically presenting the technicalities of shape space and distances, and methods such as Procrustes alignment of a set of shapes to remove unwanted effects. The concept of tangent coordinates is introduced as a projection of shape data into a Euclidean space, to enable the use of multivariate methods, and an outline given of thin-plate splines and deformations for the analysis of surfaces. Recent literature in the area of shape analysis is presented. Further recent literature addressing the modelling of growth in shapes is presented in Chapter 3, which goes on to discuss the use of linear mixed models on univariate and multivariate longitudinal data. The difficulties of applying mixed models to multivariate data are discussed and a recently proposed alternative method introduced, which involves fitting mixed models to the responses on pairs of outcomes rather than the full set. A description of the R function written as part of this thesis to fit such pairwise models follows, and this is applied to simulated triangles and quadrilaterals as an illustration. The initial application of the pairwise method to the cleft-lip and palate landmark data is presented in Chapter 4. The landmarks are described and the models are fitted to the tangent coordinate responses with different covariance structures for the random effects. The problems that arise and the deficiencies of the fitted models are extensively discussed. Chapter 5 goes on to address the issues raised in Chapter 4. A method of aligning the individual shapes based upon a subset of landmarks is suggested, along with a model that assumes independence of coordinates between dimensions but correlation within, and the benefits of these approaches compared. A simulation study is carried out to investigate the reasons behind and effects of random effects correlations that are estimated as being close to one, concluding that the problem lies in small variances that are poorly estimated, but that this is unlikely to be of severe detriment to the fixed effects estimates. A method of taking the principal components of the tangent coordinates is suggested, where the model responses are the principal components scores, and this proves to be the most appropriate way of applying the pairwise models in terms of model fit and computational efficiency. In Chapter 6, recent literature on the topic of curve analysis is presented, along with the way the facial curves are measured and the need for dimension reduction. Two methods are presented to this end: B-splines and principal components analysis, with the former suffering similar problems to the landmark analyses in terms of poorly estimated random effects variances, and the latter proving more successful. The application of the pairwise models to the principal components scores of the tangent coordinates provides a detailed analysis of the cleft-lip and palate data. Issues surrounding model comparison are addressed in Chapter 7, with several hypothesis tests presented and applied to simulated data. Drawbacks with some of the tests when applied to high dimensional or longitudinal data result in poor performance, but a method suggested by Faraway (1997) and a modification of the likelihood ratio test, both using bootstrapping, show similarly successful results. These are subsequently used to test for any differences in the time trends for the cleft and control groups post-surgery and find that there are significant differences. Condensed forms of this thesis have been presented at invited seminars and international conferences, and may be found in published form in Barry & Bowman (2006), Barry & Bowman (2007) and Barry & Bowman (2008)

    Advocating better habitat use and selection models in bird ecology

    Get PDF
    Studies on habitat use and habitat selection represent a basic aspect of bird ecology, due to its importance in natural history, distribution, response to environmental changes, management and conservation. Basically, a statistical model that identifies environmental variables linked to a species presence is searched for. In this sense, there is a wide array of analytical methods that identify important explanatory variables within a model, with higher explanatory and predictive power than classical regression approaches. However, some of these powerful models are not widespread in ornithological studies, partly because of their complex theory, and in some cases, difficulties on their implementation and interpretation. Here, I describe generalized linear models and other five statistical models for the analysis of bird habitat use and selection outperforming classical approaches: generalized additive models, mixed effects models, occupancy models, binomial N-mixture models and decision trees (classification and regression trees, bagging, random forests and boosting). Each of these models has its benefits and drawbacks, but major advantages include dealing with non-normal distributions (presence-absence and abundance data typically found in habitat use and selection studies), heterogeneous variances, non-linear and complex relationships among variables, lack of statistical independence and imperfect detection. To aid ornithologists in making use of the methods described, a readable description of each method is provided, as well as a flowchart along with some recommendations to help them decide the most appropriate analysis. The use of these models in ornithological studies is encouraged, given their huge potential as statistical tools in bird ecology.Fil: Palacio, Facundo Xavier. Consejo Nacional de Investigaciones CientĂ­ficas y TĂ©cnicas; Argentina. Universidad Nacional de La Plata. Facultad de Ciencias Naturales y Museo. DivisiĂłn ZoologĂ­a de Vertebrados. SecciĂłn OrnitologĂ­a; Argentin

    A review of R-packages for random-intercept probit regression in small clusters

    Get PDF
    Generalized Linear Mixed Models (GLMMs) are widely used to model clustered categorical outcomes. To tackle the intractable integration over the random effects distributions, several approximation approaches have been developed for likelihood-based inference. As these seldom yield satisfactory results when analyzing binary outcomes from small clusters, estimation within the Structural Equation Modeling (SEM) framework is proposed as an alternative. We compare the performance of R-packages for random-intercept probit regression relying on: the Laplace approximation, adaptive Gaussian quadrature (AGQ), Penalized Quasi-Likelihood (PQL), an MCMC-implementation, and integrated nested Laplace approximation within the GLMM-framework, and a robust diagonally weighted least squares estimation within the SEM-framework. In terms of bias for the fixed and random effect estimators, SEM usually performs best for cluster size two, while AGQ prevails in terms of precision (mainly because of SEM's robust standard errors). As the cluster size increases, however, AGQ becomes the best choice for both bias and precision

    Fast stable direct fitting and smoothness selection for Generalized Additive Models

    Get PDF
    Existing computationally efficient methods for penalized likelihood GAM fitting employ iterative smoothness selection on working linear models (or working mixed models). Such schemes fail to converge for a non-negligible proportion of models, with failure being particularly frequent in the presence of concurvity. If smoothness selection is performed by optimizing `whole model' criteria these problems disappear, but until now attempts to do this have employed finite difference based optimization schemes which are computationally inefficient, and can suffer from false convergence. This paper develops the first computationally efficient method for direct GAM smoothness selection. It is highly stable, but by careful structuring achieves a computational efficiency that leads, in simulations, to lower mean computation times than the schemes based on working-model smoothness selection. The method also offers a reliable way of fitting generalized additive mixed models

    Committee Machines for Hourly Water Demand Forecasting in Water Supply Systems

    Full text link
    [EN] Prediction models have become essential for the improvement of decision-making processes in public management and, particularly, for water supply utilities. Accurate estimation often needs to solve multimeasurement, mixed-mode, and space-time problems, typical of many engineering applications. As a result, accurate estimation of real world variables is still one of the major problems in mathematical approximation. Several individual techniques have shown very good estimation abilities. However, none of them are free from drawbacks. This paper faces the challenge of creating accurate water demand predictive models at urban scale by using so-called committee machines, which are ensemble frameworks of single machine learning models. The proposal is able to combine models of varied nature. Specifically, this paper analyzes combinations of such techniques as multilayer perceptrons, support vector machines, extreme learning machines, random forests, adaptive neural fuzzy inference systems, and the group method for data handling. Analyses are checked on two water demand datasets from Franca (Brazil). As an ensemble tool, the combined response of a committee machine outperforms any single constituent model.Ambrosio, JK.; Brentan, BM.; Herrera Fernández, AM.; Luvizotto, E.; Ribeiro, L.; Izquierdo Sebastián, J. (2019). Committee Machines for Hourly Water Demand Forecasting in Water Supply Systems. Mathematical Problems in Engineering. 2019:1-11. https://doi.org/10.1155/2019/97654681112019Montalvo, I., Izquierdo, J., Pérez-García, R., & Herrera, M. (2010). Improved performance of PSO with self-adaptive parameters for computing the optimal design of Water Supply Systems. Engineering Applications of Artificial Intelligence, 23(5), 727-735. doi:10.1016/j.engappai.2010.01.015Donkor, E. A., Mazzuchi, T. A., Soyer, R., & Alan Roberson, J. (2014). Urban Water Demand Forecasting: Review of Methods and Models. Journal of Water Resources Planning and Management, 140(2), 146-159. doi:10.1061/(asce)wr.1943-5452.0000314Adamowski, J. F. (2008). Peak Daily Water Demand Forecast Modeling Using Artificial Neural Networks. Journal of Water Resources Planning and Management, 134(2), 119-128. doi:10.1061/(asce)0733-9496(2008)134:2(119)Ghiassi, M., Zimbra, D. K., & Saidane, H. (2008). Urban Water Demand Forecasting with a Dynamic Artificial Neural Network Model. Journal of Water Resources Planning and Management, 134(2), 138-146. doi:10.1061/(asce)0733-9496(2008)134:2(138)Clemen, R. T. (1989). Combining forecasts: A review and annotated bibliography. International Journal of Forecasting, 5(4), 559-583. doi:10.1016/0169-2070(89)90012-5Herrera, M., García-Díaz, J. C., Izquierdo, J., & Pérez-García, R. (2011). Municipal Water Demand Forecasting: Tools for Intervention Time Series. Stochastic Analysis and Applications, 29(6), 998-1007. doi:10.1080/07362994.2011.610161Breiman, L. (2001). Machine Learning, 45(1), 5-32. doi:10.1023/a:1010933404324Barzegar, R., & Asghari Moghaddam, A. (2016). Combining the advantages of neural networks using the concept of committee machine in the groundwater salinity prediction. Modeling Earth Systems and Environment, 2(1). doi:10.1007/s40808-015-0072-8Nadiri, A. A., Gharekhani, M., Khatibi, R., Sadeghfam, S., & Moghaddam, A. A. (2017). Groundwater vulnerability indices conditioned by Supervised Intelligence Committee Machine (SICM). Science of The Total Environment, 574, 691-706. doi:10.1016/j.scitotenv.2016.09.093Brentan, B. M., Meirelles, G., Herrera, M., Luvizotto, E., & Izquierdo, J. (2017). Correlation Analysis of Water Demand and Predictive Variables for Short-Term Forecasting Models. Mathematical Problems in Engineering, 2017, 1-10. doi:10.1155/2017/6343625Brentan, B. M., Luvizotto Jr., E., Herrera, M., Izquierdo, J., & Pérez-García, R. (2017). Hybrid regression model for near real-time urban water demand forecasting. Journal of Computational and Applied Mathematics, 309, 532-541. doi:10.1016/j.cam.2016.02.009Johansson, C., Bergkvist, M., Geysen, D., Somer, O. D., Lavesson, N., & Vanhoudt, D. (2017). Operational Demand Forecasting In District Heating Systems Using Ensembles Of Online Machine Learning Algorithms. Energy Procedia, 116, 208-216. doi:10.1016/j.egypro.2017.05.068Polikar, R. (2006). Ensemble based systems in decision making. IEEE Circuits and Systems Magazine, 6(3), 21-45. doi:10.1109/mcas.2006.1688199Ferreira, R. P., Martiniano, A., Ferreira, A., Ferreira, A., & Sassi, R. J. (2016). Study on Daily Demand Forecasting Orders using Artificial Neural Network. IEEE Latin America Transactions, 14(3), 1519-1525. doi:10.1109/tla.2016.7459644Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273-297. doi:10.1007/bf00994018Schölkop, B. (2003). An Introduction to Support Vector Machines. Recent Advances and Trends in Nonparametric Statistics, 3-17. doi:10.1016/b978-044451378-6/50001-6Huang, G.-B., Wang, D. H., & Lan, Y. (2011). Extreme learning machines: a survey. International Journal of Machine Learning and Cybernetics, 2(2), 107-122. doi:10.1007/s13042-011-0019-yIvakhnenko, A. G. (1970). Heuristic self-organization in problems of engineering cybernetics. Automatica, 6(2), 207-219. doi:10.1016/0005-1098(70)90092-

    Computational Methods for D-optimal Design in Nonlinear Mixed Effects Models

    Get PDF
    This thesis aims to make a first step towards a foundation for a different, more practical approach to employing the principles of optimal experimental design in nonlinear mixed effects models. As an alternative to approaches which aim to mathematically account for parameter uncertainty and misspecification, it is proposed that the “space of possible parameter guesses” is investigated more directly, by visualising the resulting optimal designs and their relative performance. To provide some justification for the computational choices made in the packages, the thesis provides a comparison of two linearisation-based approaches to approximating the Fisher information matrix (First Order and Integrated First Order), a necessary step in computing D-optimality objective functions. This comparison is performed by utilising an approximation (Monte Carlo / Adaptive Gaussian Quadrature) which is not based on linearisation and which theoretically allows arbitrarily low error but at a high computational cost. It is concluded that the computationally cheaper First Order approximation is likely to be superior in all cases. A number of models taken from the applied and theoretical literature are introduced. Through these examples, it is shown how one can use the R-packages developed for this thesis (doptim and randon) to check robustness of proposed designs against parameter misspecification, in terms of information lost. A gentle introduction to using the packages is also provided, and it is demonstrated how to find D-, Dsand DA-optimal designs for nonlinear mixed effects models and, because the objective functions are made available to the user, how custom objective functions such as compound objective functions can also be generated and optimised

    Toward future 'mixed reality' learning spaces for STEAM education

    Get PDF
    Digital technology is becoming more integrated and part of modern society. As this begins to happen, technologies including augmented reality, virtual reality, 3d printing and user supplied mobile devices (collectively referred to as mixed reality) are often being touted as likely to become more a part of the classroom and learning environment. In the discipline areas of STEAM education, experts are expected to be at the forefront of technology and how it might fit into their classroom. This is especially important because increasingly, educators are finding themselves surrounded by new learners that expect to be engaged with participatory, interactive, sensory-rich, experimental activities with greater opportunities for student input and creativity. This paper will explore learner and academic perspectives on mixed reality case studies in 3d spatial design (multimedia and architecture), paramedic science and information technology, through the use of existing data as well as additional one-on-one interviews around the use of mixed reality in the classroom. Results show that mixed reality can provide engagement, critical thinking and problem solving benefits for students in line with this new generation of learners, but also demonstrates that more work needs to be done to refine mixed reality solutions for the classroom
    • …
    corecore