2,502 research outputs found
Clicks and Cliques. Exploring the Soul of the Community
In the paper we analyze 26 communities across the United States with the
objective to understand what attaches people to their community and how this
attachment differs among communities. How different are attached people from
unattached? What attaches people to their community? How different are the
communities? What are key drivers behind emotional attachment? To address these
questions, graphical, supervised and unsupervised learning tools were used and
information from the Census Bureau and the Knight Foundation were combined.
Using the same pre-processed variables as Knight (2010) most likely will drive
the results towards the same conclusions than the Knight foundation, so this
paper does not use those variables
Bayesian analysis of high-dimensional count data
This thesis describes my research work in past years in the Statistic Department of Iowa State University. There are several key statistical features common to the whole thesis. In the first place, all the statistical methods are developed taking a Bayesian perspective to conduct the statistical inference. A second common feature of the two main parts is that both correspond to high-dimensional problems. In the first case, because a large amount of information for a few individuals is available, and in the second part due to model space is really large which brings computational intractability issues. Finally, the response variable in all data used here is a positive count, in the first part, it is associated with the gene expression while in the second part it represents a number of automobile crashes
Student performance predictive models using LMS data in Primary Schools
Plan Ceibal is a public policy implemented in Uruguay, it is part of the global initiative One Lap- top per Child (OLPC, 2005). The basic feature is providing every student and teacher in primary school with a laptop or tablet and internet access. Different data sets were combined, students and teachers activities registered in the Learning Management System (LMS) and student’s performance in national standardized tests. Data were used to compute student’s engagement indexes, combining motivation, creativity, velocity and performance. Statistical models were used to determine key drivers of LMS use, this is relevant to define educational policies based on evidence. Models for LMS use are fitted for several regional levels. Additionally, statistical learning methods were fitted to predict student’s performance in national standardized test us- ing as predictor variables different constructed usage indexes from the LMS platform. A major challenge was how to deal with sub-grouping data structure into machine learning algorithms, usually developed for independent observations. Initial results suggest school district is the main driver of the technology usage in the classroom.ANI
Student performance predictive models using LMS data in Primary Schools
Plan Ceibal is a public policy implemented in Uruguay, it is part of the global initiative One Laptop per Child (OLPC, 2005). The basic feature is providing every student and teacher in primary school with a laptop or tablet and internet access. Different data sets were combined, students and teachers activities registered in the Learning Management System (LMS) and student's performance in national standardized tests. Data were used to compute student's engagement indexes, combining motivation, creativity, velocity and performance. Statistical models were used to determine key drivers of LMS use, this is relevant to define educational policies based on evidence. Models for LMS use are fitted for several regional levels. Additionally, statistical learning methods were fitted to predict student's performance in national standardized test using as predictor variables different constructed usage indexes from the LMS platform. A major challenge was how to deal with sub-grouping data structure into machine learning algorithms, usually developed for independent observations. Initial results suggest school district is the main driver of the technology usage in the classroom.ANI
Priorcovmatrix: explorar, visualizar y estimar matrices de covarianzas
La estimación de matrices de covarianza surge en problemas multivariados como la distribución normal multivariada o modelos de regresión generalizados mixtos donde los efectos aleatorios son modelados de forma conjunta. La inferencia Bayesiana sobre una matriz de covarianza requiere especificar una distribución de probabilidades para dicha matriz. Las distribuciones que tienen como dominio las matrices de covarianza no han recibido mucha atención en términos de caracterizar sus propiedades.
En este trabajo se presenta el paquete priorcovmatrix permite ajustar, simular y visualizar algunas distribuciones multivariadas utilizadas para modelar matrices de covarianza. La distribución Wishart inversa, Wishart inversa escalada, y otras distribuciones forman parte de la librería.Sociedad Argentina de Informática e Investigación Operativ
Priorcovmatrix: explorar, visualizar y estimar matrices de covarianzas
La estimación de matrices de covarianza surge en problemas multivariados como la distribución normal multivariada o modelos de regresión generalizados mixtos donde los efectos aleatorios son modelados de forma conjunta. La inferencia Bayesiana sobre una matriz de covarianza requiere especificar una distribución de probabilidades para dicha matriz. Las distribuciones que tienen como dominio las matrices de covarianza no han recibido mucha atención en términos de caracterizar sus propiedades.
En este trabajo se presenta el paquete priorcovmatrix permite ajustar, simular y visualizar algunas distribuciones multivariadas utilizadas para modelar matrices de covarianza. La distribución Wishart inversa, Wishart inversa escalada, y otras distribuciones forman parte de la librería.Sociedad Argentina de Informática e Investigación Operativ
Fully Bayesian analysis of allele-specific RNA-seq data
Diploid organisms have two copies of each gene, called alleles, that can be separately transcribed. The RNA abundance associated to any particular allele is known as allele-specific expression (ASE). When two alleles have polymorphisms in transcribed regions, ASE can be studied using RNA-seq read count data. ASE has characteristics different from the regular RNA-seq expression: ASE cannot be assessed for every gene, measures of ASE can be biased towards one of the alleles (reference allele), and ASE provides two measures of expression for a single gene for each biological samples with leads to additional complications for single-gene models. We present statistical methods for modeling ASE and detecting genes with differential allelic expression. We propose a hierarchical, overdispersed, count regression model to deal with ASE counts. The model accommodates gene-specific overdispersion, has an internal measure of the reference allele bias, and uses random effects to model the gene-specific regression parameters. Fully Bayesian inference is obtained using the fbseq package that implements a parallel strategy to make the computational times reasonable. Simulation and real data analysis suggest the proposed model is a practical and powerful tool for the study of differential ASE
SpICE: An interpretable method for spatial data
Statistical learning methods are widely utilized in tackling complex problems
due to their flexibility, good predictive performance and its ability to
capture complex relationships among variables. Additionally, recently developed
automatic workflows have provided a standardized approach to implementing
statistical learning methods across various applications. However these tools
highlight a main drawbacks of statistical learning: its lack of interpretation
in their results. In the past few years an important amount of research has
been focused on methods for interpreting black box models. Having interpretable
statistical learning methods is relevant to have a deeper understanding of the
model. In problems were spatial information is relevant, combined interpretable
methods with spatial data can help to get better understanding of the problem
and interpretation of the results.
This paper is focused in the individual conditional expectation (ICE-plot), a
model agnostic methods for interpreting statistical learning models and
combined them with spatial information. ICE-plot extension is proposed where
spatial information is used as restriction to define Spatial ICE curves
(SpICE). Spatial ICE curves are estimated using real data in the context of an
economic problem concerning property valuation in Montevideo, Uruguay.
Understanding the key factors that influence property valuation is essential
for decision-making, and spatial data plays a relevant role in this regard
Introducción a la estadística Bayesiana con aplicaciones de estimación en áreas pequeñas usando software STAN
En este mini-curso se presenta una breve introducción a la estadística Bayesiana utilizando
el programa STAN. Se utiliza un enfoque aplicado, recorriendo las características básicas
del modelado Bayesiano y su implementación en STAN en aplicaciones concretas. Como
ejemplos para el trabajo se utilizarán problemas de estimación en áreas pequeñas.ANIIFundación Ceiba
Uso de plataformas educativas del Plan Ceibal
En este trabajo se presenta el desarrollo de indicadores para evaluar el uso de plataformas educativas utilizadas por el Plan Ceibal, específicamente centrado en la plataforma CREA. A su vez se analiza la evolución del uso antes y durante los años de pandemia y se estudian los principales factores que explican su variabilidad. Los resultados indican que el uso de CREA es 5 veces más intenso en 2021 que en años pre-pandemia. Los principales factores para explicar dicha variabilidad en el uso de la plataforma se deben al departamento, contexto socioeconómico y el uso de la plataforma por parte del docente. En particular el impacto en el uso del docente por contexto socioeconómico presenta diferencias en los distintos departamentos del paísFundación CeibalANI
- …