29 research outputs found
Multivariate Statistical Machine Learning Methods for Genomic Prediction
This book is open access under a CC BY 4.0 license This open access book brings together the latest genome base prediction models currently being used by statisticians, breeders and data scientists. It provides an accessible way to understand the theory behind each statistical learning tool, the required pre-processing, the basics of model building, how to train statistical learning methods, the basic R scripts needed to implement each statistical learning tool, and the output of each tool. To do so, for each tool the book provides background theory, some elements of the R statistical software for its implementation, the conceptual underpinnings, and at least two illustrative examples with data from real-world genomic selection experiments. Lastly, worked-out examples help readers check their own comprehension. The book will greatly appeal to readers in plant (and animal) breeding, geneticists and statisticians, as it provides in a very accessible way the necessary theory, the appropriate R code, and illustrative examples for a complete understanding of each statistical learning tool. In addition, it weighs the advantages and disadvantages of each tool
Multivariate Statistical Machine Learning Methods for Genomic Prediction
This book is open access under a CC BY 4.0 license This open access book brings together the latest genome base prediction models currently being used by statisticians, breeders and data scientists. It provides an accessible way to understand the theory behind each statistical learning tool, the required pre-processing, the basics of model building, how to train statistical learning methods, the basic R scripts needed to implement each statistical learning tool, and the output of each tool. To do so, for each tool the book provides background theory, some elements of the R statistical software for its implementation, the conceptual underpinnings, and at least two illustrative examples with data from real-world genomic selection experiments. Lastly, worked-out examples help readers check their own comprehension. The book will greatly appeal to readers in plant (and animal) breeding, geneticists and statisticians, as it provides in a very accessible way the necessary theory, the appropriate R code, and illustrative examples for a complete understanding of each statistical learning tool. In addition, it weighs the advantages and disadvantages of each tool
A review of deep learning applications for the next generation of cognitive networks
Intelligence capabilities will be the cornerstone in the development of next-generation cognitive networks. These capabilities allow them to observe network conditions, learn from them, and then, using prior knowledge gained, respond to its operating environment to optimize network performance. This study aims to offer an overview of the current state of the art related to the use of deep learning in applications for intelligent cognitive networks that can serve as a reference for future initiatives in this field. For this, a systematic literature review was carried out in three databases, and eligible articles were selected that focused on using deep learning to solve challenges presented by current cognitive networks. As a result, 14 articles were analyzed. The results showed that applying algorithms based on deep learning to optimize cognitive data networks has been approached from different perspectives in recent years and in an experimental way to test its technological feasibility. In addition, its implications for solving fundamental challenges in current wireless networks are discussed
Índice de satisfacción laboral
El presente artículo propone un índice de satisfacción laboral utilizando el método de componentes principales. Para formar el índice se realiza la agregación directa de los primeros componentes principales, ponderados por la desviación estándar de cada uno. Este índice consigue cuantificar y resumir en un dato la satisfacción laboral de cada trabajador (a), expresándose en valores entre 0 y 100, lo cual facilita su interpretación y la toma de decisiones correspondientes. Adicionalmente, la metodología se aplicó a una base de datos con información de satisfacción labora
Genome and Environment Based Prediction Models and Methods of Complex Traits Incorporating Genotype × Environment Interaction
Genomic-enabled prediction models are of paramount importance for the successful implementation of genomic selection (GS) based on breeding values. As opposed to animal breeding, plant breeding includes extensive multienvironment and multiyear field trial data. Hence, genomic-enabled prediction models should include genotype × environment (G × E) interaction, which most of the time increases the prediction performance when the response of lines are different from environment to environment. In this chapter, we describe a historical timeline since 2012 related to advances of the GS models that take into account G × E interaction. We describe theoretical and practical aspects of those GS models, including the gains in prediction performance when including G × E structures for both complex continuous and categorical scale traits. Then, we detailed and explained the main G × E genomic prediction models for complex traits measured in continuous and noncontinuous (categorical) scale. Related to G × E interaction models this review also examine the analyses of the information generated with high-throughput phenotype data (phenomic) and the joint analyses of multitrait and multienvironment field trial data that is also employed in the general assessment of multitrait G × E interaction. The inclusion of nongenomic data in increasing the accuracy and biological reliability of the G × E approach is also outlined. We show the recent advances in large-scale envirotyping (enviromics), and how the use of mechanistic computational modeling can derive the crop growth and development aspects useful for predicting phenotypes and explaining G × E
Sample Size under Inverse Negative Binomial Group Testing for Accuracy in Parameter Estimation
Background:The group testing method has been proposed for the detection and estimation of genetically modified plants (adventitious presence of unwanted transgenic plants, AP). For binary response variables (presence or absence), group testing is efficient when the prevalence is low, so that estimation, detection, and sample size methods have been developed under the binomial model. However, when the event is rare (low prevalence
Methodology/Principal Findings: This research proposes three sample size procedures (two computational and one analytic) for estimating prevalence using group testing under inverse (negative) binomial sampling. These methods provide the required number of positive pools (rm), given a pool size (k), for estimating the proportion of AP plants using the Dorfman model and inverse (negative) binomial sampling. We give real and simulated examples to show how to apply these methods and the proposed sample-size formula. The Monte Carlo method was used to study the coverage and level of assurance achieved by the proposed sample sizes. An R program to create other scenarios is given in Appendix S2.
Conclusions: The three methods ensure precision in the estimated proportion of AP because they guarantee that the width (W) of the confidence interval (CI) will be equal to, or narrower than, the desired width (v), with a probability of c. With the Monte Carlo study we found that the computational Wald procedure (method 2) produces the more precise sample size (with coverage and assurance levels very close to nominal values) and that the samples size based on the Clopper-Pearson CI (method 1) is conservative (overestimates the sample size); the analytic Wald sample size method we developed (method 3) sometimes underestimated the optimum number of pools
Genomic selection in plant breeding: Key factors shaping two decades of progress
54 Pág.Genomic selection, the application of genomic prediction (GP) models to select candidate individuals, has significantly advanced in the past two decades, effectively accelerating genetic gains in plant breeding. This article provides a holistic overview of key factors that have influenced GP in plant breeding during this period. We delved into the pivotal roles of training population size and genetic diversity, and their relationship with the breeding population, in determining GP accuracy. Special emphasis was placed on optimizing training population size. We explored its benefits and the associated diminishing returns beyond an optimum size. This was done while considering the balance between resource allocation and maximizing prediction accuracy through current optimization algorithms. The density and distribution of single-nucleotide polymorphisms, level of linkage disequilibrium, genetic complexity, trait heritability, statistical machine-learning methods, and non-additive effects are the other vital factors. Using wheat, maize, and potato as examples, we summarize the effect of these factors on the accuracy of GP for various traits. The search for high accuracy in GP-theoretically reaching one when using the Pearson's correlation as a metric-is an active research area as yet far from optimal for various traits. We hypothesize that with ultra-high sizes of genotypic and phenotypic datasets, effective training population optimization methods and support from other omics approaches (transcriptomics, metabolomics and proteomics) coupled with deep-learning algorithms could overcome the boundaries of current limitations to achieve the highest possible prediction accuracy, making genomic selection an effective tool in plant breeding.This research was supported by SLU Grogrund (#SLU-LTV.2020.1.1.1-654) and an Einar and Inga Nilsson Foundation grant. J.I.y.S. was supported by grant PID2021-123718OB-I00 funded by MCIN/AEI/10.13039/501 100 011 033 and by “ERDF A way of making Europe,” CEX2020-000999-S. R.R.V. was supported by Novo Nordisk Fonden (0074727) and SLU’s Centre for Biological Control. In addition, J.I.y.S. and J.F.-G. were supported by the Beatriz Galindo Program BEAGAL 18/00115.Peer reviewe
Modelos matemáticos para enfermedades infecciosas
Objetivo. Describir la importancia de los modelos matemáticos en la comprensión de la dinámica de transmisión de las enfermedades infecciosas, así como en el diseño de medidas eficaces de control. Material y métodos. Se revisaron las publicaciones internacionales sobre el tema a través de medios digitales; se identificaron alrededor de 60 artículos, aunque sólo se revisaron 27 de éstos por su estrecha relación con el tema. Resultados. Este trabajo explica de manera sinóptica los antecedentes, importancia y clasificación de los modelos matemáticos en padecimientos infecciosos. De modo adicional se describen con detalle algunos modelos comunes de transmisión de enfermedades y otros de uso más reciente que se utilizan en la modelación de trastornos infecciosos. Conclusiones. El empleo de modelos matemáticos ha crecido en grado significativo en los últimos años y son de gran ayuda para idear medidas eficaces de control y erradicación de las enfermedades infecciosas
Maximum a posteriori Threshold Genomic Prediction Model for Ordinal Traits
Due to the ever-increasing data collected in genomic breeding programs, there is a need for genomic prediction models that can deal better with big data. For this reason, here we propose a Maximum a posteriori Threshold Genomic Prediction (MAPT) model for ordinal traits that is more efficient than the conventional Bayesian Threshold Genomic Prediction model for ordinal traits. The MAPT performs the predictions of the Threshold Genomic Prediction model by using the maximum a posteriori estimation of the parameters, that is, the values of the parameters that maximize the joint posterior density. We compared the prediction performance of the proposed MAPT to the conventional Bayesian Threshold Genomic Prediction model, the multinomial Ridge regression and support vector machine on 8 real data sets. We found that the proposed MAPT was competitive with regard to the multinomial and support vector machine models in terms of prediction performance, and slightly better than the conventional Bayesian Threshold Genomic Prediction model. With regard to the implementation time, we found that in general the MAPT and the support vector machine were the best, while the slowest was the multinomial Ridge regression model. However, it is important to point out that the successful implementation of the proposed MAPT model depends on the informative priors used to avoid underestimation of variance components