8 research outputs found
Mixed linear models for longitudinal data in a factorial experiment with additional treatment
Em experimentos agronômicos são comuns ensaios planejados para estudar determinadas culturas por meio de múltiplas mensurações realizadas na mesma unidade amostral ao longo do tempo, espaço, profundidade entre outros. Essa forma com que as mensurações são coletadas geram conjuntos de dados que são chamados de dados longitudinais. Nesse contexto, é de extrema importância a utilização de metodologias estatísticas que sejam capazes de identificar possíveis padrões de variação e correlação entre as mensurações. A possibilidade de inclusão de efeitos aleatórios e de modelagem das estruturas de covariâncias tornou a metodologia de modelos lineares mistos uma das ferramentas mais apropriadas para a realização desse tipo de análise. Entretanto, apesar de todo o desenvolvimento teórico e computacional, a utilização dessa metodologia em delineamentos mais complexos envolvendo dados longitudinais e tratamentos adicionais, como os utilizados na área de forragicultura, ainda é passível de estudos. Este trabalho envolveu o uso do diagrama de Hasse e da estratégia top-down na construção de modelos lineares mistos no estudo de cortes sucessivos de forragem provenientes de um experimento de adubação com boro em alfafa (Medicago sativa L.) realizado no campo experimental da Embrapa Pecuária Sudeste. Primeiramente, considerou-se uma abordagem qualitativa para todos os fatores de estudo e devido à complexidade do delineamento experimental optou-se pela construção do diagrama de Hasse. A incorporação de efeitos aleatórios e seleção de estruturas de covariâncias para os resíduos foram realizadas com base no teste da razão de verossimilhanças calculado a partir de parâmetros estimados pelo método da máxima verossimilhança restrita e nos critérios de informação de Akaike (AIC), Akaike corrigido (AICc) e bayesiano (BIC). Os efeitos fixos foram testados por meio do teste Wald-F e, devido aos efeitos significativos das fontes de variação associadas ao fator longitudinal, desenvolveu-se um estudo de regressão. A construção do diagrama de Hasse foi fundamental para a compreensão e visualização simbólica do relacionamento de todos os fatores presentes no estudo, permitindo a decomposição das fontes de variação e de seus graus de liberdade, garantindo que todos os testes fossem realizados corretamente. A inclusão de efeito aleatório associado à unidade experimental foi essencial para a modelagem do comportamento de cada unidade e a estrutura de componentes de variância com heterogeneidade, incorporada aos resíduos, foi capaz de modelar eficientemente a heterogeneidade de variâncias presente nos diferentes cortes da cultura da alfafa. A verificação do ajuste foi realizada por meio de gráficos de diagnósticos de resíduos. O estudo de regressão permitiu avaliar a produtividade de matéria seca da parte aérea da planta (kg ha-1) de cortes consecutivos da cultura da alfafa, envolvendo a comparação de adubações com diferentes fontes e doses de boro. Os melhores resultados de produtividade foram observados para a combinação da fonte ulexita com as doses 3, 6 e 9 kg ha-1 de boro.Assays aimed at studying some crops through multiple measurements performed in the same sample unit along time, space, depth etc. have been frequently adopted in agronomical experiments. This type of measurement originates a dataset named longitudinal data, in which the use of statistical procedures capable of identifying possible standards of variation and correlation among measurements has great importance. The possibility of including random effects and modeling of covariance structures makes the methodology of mixed linear models one of the most appropriate tools to perform this type of analysis. However, despite of all theoretical and computational development, the use of such methodology in more complex designs involving longitudinal data and additional treatments, such as those used in forage crops, still needs to be studied. The present work covered the use of the Hasse diagram and the top-down strategy in the building of mixed linear models for the study of successive cuts from an experiment involving boron fertilization in alfalfa (Medicago sativa L.) carried out in the field area of Embrapa Southeast Livestock. First, we considered a qualitative approach for all study factors and we chose the Hasse diagram building due to the model complexity. The inclusion of random effects and selection of covariance structures for residues were performed based on the likelihood ratio test, calculated based on parameters estimated through the restricted maximum likelihood method, the Akaike\'s Information Criterion (AIC), the Akaike\'s information criterion corrected (AICc) and the Bayesian Information Criterion (BIC). The fixed effects were analyzed through the Wald-F test and we performed a regression study due to the significant effects of the variation sources associated with the longitudinal factor. The Hasse diagram building was essential for understanding and symbolic displaying regarding the relation among all factors present in the study, thus allowing variation sources and their degrees of freedom to be decomposed, assuring that all tests were correctly performed. The inclusion of random effect associated with the sample unit was essential for modeling the behavior of each unity. Furthermore, the structure of variance components with heterogeneity, added to the residues, was capable of modeling efficiently the heterogeneity of variances present in the different cuts of alfalfa plants. The fit was checked by residual diagnostic plots. The regression study allowed us to evaluate the productivity of shoot dry matter (kg ha-1) related to successive cuts of alfalfa plants, involving the comparison of fertilization with different boron sources and doses. We observed the best productivity in the combination of the source ulexite with the doses 3, 6 and 9 kg ha-1 boron
Statistical methods used in genome wide selection for growth curves in animals
O principal atrativo da genética molecular em benefício do melhoramento genético aplicado é a utilização direta das informações do DNA na seleção genômica, de modo a permitir alta eficiência seletiva, rapidez na obtenção de ganhos genéticos com a seleção e baixo custo. Uma forma prática e consistente de analisar a eficiência produtiva de animais de corte sujeitos à seleção é por meio dos estudos de curvas de crescimento, pois estas representam uma trajetória longitudinal dos pesos dos animais em função do tempo. Para isso, primeiramente ajustam-se modelos de crescimento (modelos não lineares) aos dados de peso-idade de cada animal submetido à seleção e consideram-se os parâmetros estimados como fenótipos. Este procedimento permite a obtenção de estimativas de parâmetros genéticos para qualquer ponto da trajetória de crescimento e possibilita a compreensão da arquitetura genética de toda a trajetória, uma vez que as informações de todas as pesagens são condensadas por esses poucos parâmetros interpretáveis biologicamente. Em seguida, os parâmetros estimados dos modelos de crescimento são utilizados para predizer os Valores Genéticos Genômicos (Genomic Breeding Value – GBV) por meio de métodos estatísticos específicos para a Seleção Genômica ix Ampla (Genome Wide Selection – GWS). O objetivo geral do presente trabalho foi empregar métodos estatísticos usados na Seleção Genômica Ampla, especificamente o RR-BLUP/GWS e o LASSO Bayesiano, no estudo de curvas de crescimento animal, considerando como variáveis fenotípicas as estimativas dos parâmetros de modelos de regressão não linear. Os objetivos específicos foram: estimar valores genéticos genômicos para cada indivíduo avaliado; estimar efeitos de marcadores SNPs e identificar os de maiores efeitos; selecionar, via técnicas de agrupamento, grupos de indivíduos geneticamente superiores em relação à curva de crescimento; e validar toda metodologia utilizada via estudo de simulação e aplicá-la a dados reais de uma população F2 de suínos proveniente do cruzamento de dois varrões da raça naturalizada brasileira Piau com 18 fêmeas de linhagem comercial (Landrace × Large White × Pietrain). Os resultados indicaram que os métodos estatísticos na Seleção Genômica Ampla foram eficientes no estudo de curvas de crescimento, considerando dados simulados e dados reais de peso-idade de suínos. A GWS apresentou alta acurácia na seleção para a trajetória das curvas de crescimento e possibilitou a detecção de QTLs (Quantitative Trait Loci) para os parâmetros da curva dos indivíduos considerados. Na ausência de genes de grande efeito, os métodos RRBLUP/ GWS e LASSO Bayesiano produziram resultados semelhantes, no entanto o método LASSO Bayesiano apresentou maior eficiência quando o gene halotano, caracterizado como de grande efeito, foi incluído como marcador nas análises.The main contribution of molecular genetics to the benefit of applied genetic breeding is the direct use of the DNA data in genomic selection, allowing high selective efficiency and speed in the acquisition of genetic gains in selection and low costs. A practical and consistent way of analyzing the productive efficiency of beef animals subjected to selection is through the study of growth curves, as these represent a longitudinal trajectory of the weights of the animals in function of time. Thus, firstly, growth models (non-linear models) are adjusted to the weight-age data of each animal submitted to selection and the parameters estimated as phenotypes are considered. This procedure permits to determine genetic parameter estimates for any growth trajectory point, and to understand the genetic architecture of the entire trajectory, since all the weighing information is condensed by these few biologically interpretable parameters. The parameters estimated from the growth models are used to predict the Genomic Breeding Value (GBV) by means of specific statistical methods for the Genome Wide Selection (GWS). The general objective of this work was to apply statistical methods used in the Genome Wide Selection, mainly RRBLUP/ GWS and the Bayesian LASSO on the study of animal growth curves, considering as phenotypic variables the estimates of the parameters of non-linear regression models. The specific objectives were: to estimate the genomic breeding values for each individual evaluated; to estimate the effect of SNP markers and to identify those with the greatest effects; to select, via grouping techniques, groups of individuals genetically superior, in relation to the growth curve; and to validate all the methodology used via simulation study and apply it to real data of an F2 population of swine originated from the cross of two males from the naturalized Brazilian race Piau with 18 females of a commercial line (Landrace × Large White × Pietrain).The results indicated that the Genome Wide Selection statistical methods were efficient in studying the growth curves, considering simulated and real swine weight-age data. GWS presented high accuracy in the selection of the growth curve trajectory, allowing the detection of the QTLs (Quantitative Trait Loci) for the curve parameters of the individuals studied. In the absence of genes of significant effect, the methods RR-BLUP/GWS and Bayesian LASSO showed similar results but the latter showed more efficiency when the halothane gene, characterized as of significant effect, was included as a marker in the analyses
Genomic growth curves of an outbred pig population
The success of pig production systems, including the evaluation of alternative management and marketing strategies, requires knowledge of the body weight behavior over time, commonly referred to as the growth curve. This knowledge allows the assessment of growth characteristics in actual production situations and translates this information into economic decisions. Differences among animal growth curves partly reflect genetic influences, with multiple genes contributing at different levels to the overall phenotype. Hence, selection strategies that attempt to modify the growth curve shape to meet demands of the pork market are very relevant. In the current post-genomic era, understanding the genomic basis of pig growth cannot be limited to simply estimating marker effects using body weight at a specific time as a phenotype, but must also consider changes in body weight over time. According to Pong-Wong and Hadjipavlou (2010) and Ibáñez-Escriche and Blasco (2011) this can be done by estimating the marker effects for parameters of nonlinear regression models that are widely used to describe growth curves. Regardless of the phenotype used, a major challenge in genome-wide selection (GS) is to identify the most powerful statistical methods for predicting phenotypic values based on estimates of marker effects. Since the seminal GS paper by Meuwissen et al. (2001), several studies have compared the efficiency of simple methods, such as the RR-BLUP (Random Regression Blup) (Meuwissen et al., 2001), with more sophisticated methods, such as Bayesian LASSO (BL) (de los Campos et al., 2009). The main difference between these two very popular GS methods is that the first one assumes, a priori, that all loci explain an equal amount of genetic variation, while the second one allows the assumption that each locus explains its own amount of this variation. Although these two methods have already been compared in other studies, so far there has been no comparison of these methods using a major gene, such as the halothane gene in pigs (Fujii et al., 1991), as a marker. In addition, these methods have not yet been applied to the analysis of growth curves in conjunction with nonlinear regression models. In this study, we compared the accuracies of RR-BLUP and BL for predicting genetic merit in an empirical application using weight-age data from an outbred F2 (Brazilian Piau X commercial) pig population (Silva et al., 2011). In this approach, the phenotypes were defined by parameter estimates obtained with a nonlinear logistic regression model and the halothane gene was considered a single nucleotide polymorphism (SNP) marker in order to evaluate the assumptions of the GS methods in relation to the genetic variation explained by each locus. Genomic growth curves based on genomic estimated breeding values were constructed and the most relevant SNPs associated with growth parameters were identified
Genome Wide Selection for growth curves
peer reviewedA methodology was proposed for the genetic evaluation of growth curves considering SNP (Single Nucleotide Polymorphisms) markers. At the first step, nonlinear regression growth models (Logistic) were fitted to the weight-age of each animal, and on second step the parameter estimates of the Logistic model were used as phenotype in a regression model (Bayesian LASSO - BL) which covariates were given by SNP genotypes. This approach allows the estimation of GBV (Genomic Breeding Values) for weight at either time of growth trajectory, allowing also the production of genomic growth curves, which selected groups of individuals with larger growth efficiency. The simulated data set was constituted of 2,000 individuals (being 1,000 in the training and 1,000 in the validation population) each one with 453 SNP markers distributed along 5 chromosomes. The results indicated high efficiency of the BL method to predict GBV in the validation population using information from the training population (correlation coefficients varying between 0.79 and 0.93). The BL also presented high efficiency to detect QTL, once the most expressive estimated SNP effects were located at positions closed to true QTL position fixed in the simulation
Genomic growth curves of an outbred pig population
Abstract In the current post-genomic era, the genetic basis of pig growth can be understood by assessing SNP marker effects and genomic breeding values (GEBV) based on estimates of these growth curve parameters as phenotypes. Although various statistical methods, such as random regression (RR-BLUP) and Bayesian LASSO (BL), have been applied to genomic selection (GS), none of these has yet been used in a growth curve approach. In this work, we compared the accuracies of RR-BLUP and BL using empirical weight-age data from an outbred F2 (Brazilian Piau X commercial) population. The phenotypes were determined by parameter estimates using a nonlinear logistic regression model and the halothane gene was considered as a marker for evaluating the assumptions of the GS methods in relation to the genetic variation explained by each locus. BL yielded more accurate values for all of the phenotypes evaluated and was used to estimate SNP effects and GEBV vectors. The latter allowed the construction of genomic growth curves, which showed substantial genetic discrimination among animals in the final growth phase. The SNP effect estimates allowed identification of the most relevant markers for each phenotype, the positions of which were coincident with reported QTL regions for growth traits
Genomic growth curves of an outbred pig population
In the current post-genomic era, the genetic basis of pig growth can be understood by assessing SNP marker effects and genomic breeding values (GEBV) based on estimates of these growth curve parameters as phenotypes. Although various statistical methods, such as random regression (RR-BLUP) and Bayesian LASSO (BL), have been applied to genomic selection (GS), none of these has yet been used in a growth curve approach. In this work, we compared the accuracies of RR-BLUP and BL using empirical weight-age data from an outbred F2 (Brazilian Piau X commercial) population. The phenotypes were determined by parameter estimates using a nonlinear logistic regression model and the halothane gene was considered as a marker for evaluating the assumptions of the GS methods in relation to the genetic variation explained by each locus. BL yielded more accurate values for all of the phenotypes evaluated and was used to estimate SNP effects and GEBV vectors. The latter allowed the construction of genomic growth curves, which showed substantial genetic discrimination among animals in the final growth phase. The SNP effect estimates allowed identification of the most relevant markers for each phenotype, the positions of which were
coincident with reported QTL regions for growth traits
AMAZONIA CAMTRAP: A data set of mammal, bird, and reptile species recorded with camera traps in the Amazon forest
The Amazon forest has the highest biodiversity on Earth. However, information on Amazonian vertebrate diversity is still deficient and scattered across the published, peer-reviewed, and gray literature and in unpublished raw data. Camera traps are an effective non-invasive method of surveying vertebrates, applicable to different scales of time and space. In this study, we organized and standardized camera trap records from different Amazon regions to compile the most extensive data set of inventories of mammal, bird, and reptile species ever assembled for the area. The complete data set comprises 154,123 records of 317 species (185 birds, 119 mammals, and 13 reptiles) gathered from surveys from the Amazonian portion of eight countries (Brazil, Bolivia, Colombia, Ecuador, French Guiana, Peru, Suriname, and Venezuela). The most frequently recorded species per taxa were: mammals: Cuniculus paca (11,907 records); birds: Pauxi tuberosa (3713 records); and reptiles: Tupinambis teguixin (716 records). The information detailed in this data paper opens up opportunities for new ecological studies at different spatial and temporal scales, allowing for a more accurate evaluation of the effects of habitat loss, fragmentation, climate change, and other human-mediated defaunation processes in one of the most important and threatened tropical environments in the world. The data set is not copyright restricted; please cite this data paper when using its data in publications and we also request that researchers and educators inform us of how they are using these data