5,886 research outputs found

    Discrete Algorithms for Analysis of Genotype Data

    Get PDF
    Accessibility of high-throughput genotyping technology makes possible genome-wide association studies for common complex diseases. When dealing with common diseases, it is necessary to search and analyze multiple independent causes resulted from interactions of multiple genes scattered over the entire genome. The optimization formulations for searching disease-associated risk/resistant factors and predicting disease susceptibility for given case-control study have been introduced. Several discrete methods for disease association search exploiting greedy strategy and topological properties of case-control studies have been developed. New disease susceptibility prediction methods based on the developed search methods have been validated on datasets from case-control studies for several common diseases. Our experiments compare favorably the proposed algorithms with the existing association search and susceptibility prediction methods

    Single nucleotide polymorphism genes and mitochondrial DNA haplogroups as biomarkers for early prediction of knee osteoarthritis structural progressors: use of supervised machine learning classifiers

    Get PDF
    [Abstract] Background. Knee osteoarthritis is the most prevalent chronic musculoskeletal debilitating disease. Current treatments are only symptomatic, and to improve this, we need a robust prediction model to stratify patients at an early stage according to the risk of joint structure disease progression. Some genetic factors, including single nucleotide polymorphism (SNP) genes and mitochondrial (mt)DNA haplogroups/clusters, have been linked to this disease. For the first time, we aim to determine, by using machine learning, whether some SNP genes and mtDNA haplogroups/clusters alone or combined could predict early knee osteoarthritis structural progressors. Methods. Participants (901) were first classified for the probability of being structural progressors. Genotyping included SNP genes TP63, FTO, GNL3, DUS4L, GDF5, SUPT3H, MCF2L, and TGFA; mtDNA haplogroups H, J, T, Uk, and others; and clusters HV, TJ, KU, and C-others. They were considered for prediction with major risk factors of osteoarthritis, namely, age and body mass index (BMI). Seven supervised machine learning methodologies were evaluated. The support vector machine was used to generate gender-based models. The best input combination was assessed using sensitivity and synergy analyses. Validation was performed using tenfold cross-validation and an external cohort (TASOAC). Results. From 277 models, two were defined. Both used age and BMI in addition for the first one of the SNP genes TP63, DUS4L, GDF5, and FTO with an accuracy of 85.0%; the second profits from the association of mtDNA haplogroups and SNP genes FTO and SUPT3H with 82.5% accuracy. The highest impact was associated with the haplogroup H, the presence of CT alleles for rs8044769 at FTO, and the absence of AA for rs10948172 at SUPT3H. Validation accuracy with the cross-validation (about 95%) and the external cohort (90.5%, 85.7%, respectively) was excellent for both models. Conclusions. This study introduces a novel source of decision support in precision medicine in which, for the first time, two models were developed consisting of (i) age, BMI, TP63, DUS4L, GDF5, and FTO and (ii) the optimum one as it has one less variable: age, BMI, mtDNA haplogroup, FTO, and SUPT3H. Such a framework is translational and would benefit patients at risk of structural progressive knee osteoarthritis

    Modeling plant diseases under climate change: evolutionary perspectives

    Get PDF
    Infectious plant diseases are a major threat to global agricultural productivity, economic development, and ecological integrity. There is widespread concern that these social and natural disasters caused by infectious plant diseases may escalate with climate change and computer modeling offers a unique opportu-nity to address this concern. Here, we analyze the intrinsic problems associated with current modeling strategies and highlight the need to integrate evolutionary principles into polytrophic, eco-evolutionary frameworks to improve predictions. We particularly discuss how evolutionary shifts in functional trade-offs, relative adaptability between plants and pathogens, ecosystems, and climate preferences induced by climate change may feedback to future plant disease epidemics and how technological advances can facilitate the generation and integration of this relevant knowledge for better modeling predictions

    Determinación de la influencia potencial del suelo en la diferenciación de la productividad y en la clasificación de áreas susceptibles a la marchitez del banano en Venezuela

    Get PDF
    Banana, the edible fruit of Musaceae, is a staple food for more than 400 million people worldwide due to their nutritional and energy attributes. This makes Musaceae a crop of worldwide relevance, particularly in tropical regions, highlighting the impact of improved Musaceae cropping systems in the current efforts worldwide oriented towards a new agricultural revolution based on sustainable intensification. To achieve this, better practices for food production based on scientific and technical research capable to consider the complexity and variability within the agri-food sector are necessary. The research presented in this PhD Thesis is oriented towards providing answers to the causes of two aspects considered of high relevance for banana production, both affecting productivity and sustainability, always addressed for the Venezuelan conditions, one of the world’s largest producing countries: 1- The impact of phytosanitary risks related to Fusarium wilt and the influence of the soil on the incidence of Banana Wilt (BW) caused by a fungal-bacterial complex. 2- An observed trend towards loss of productivity and decline of soil quality in some commercial farms of Aragua and Trujillo states in Venezuela. The first issue, related to banana plant health, has been covered in two consecutive studies. Firstly, in Chapter I a systematic review on the effect of agro-environmental factors on the impact of Fusarium Wilt of Bananas, caused by Fusarium oxysporum f. sp. cubense (Foc) tropical race 4 (TR4), and the implications for the Venezuelan production system of this disease is presented. This Chapter synthetically characterizes reliable information on the biotic and abiotic factors related to Foc TR4 occurrence, in conjunction with a risk analysis and climate suitability maps for Foc TR4 in Venezuela. This chapter can serve as a basic summary of the available knowledge for use by plant health technicians and professionals, as well as for other stakeholders concerning disease management. The research oriented towards the plant health issues in banana is completed with the study presented in Chapter II. This chapter analyzes the relationship between soil properties and the incidence of Banana Wilt (BW), a disease of unknow etiology, that is attributed to be caused by a fungal-bacterial complex, in a case study of a commercial banana farm in the state of Aragua in Venezuela, whose incidence has reduced the planted area by more than 35.0% in recent years. The application of the Random Forest algorithm allowed to classify with good precision the incidence of BW in lacustrine soils of Venezuela based on the physical and chemical soil properties, being an effective tool for decision-making in the field. In addition, the use of soil information in banana areas of Venezuela allowed the identification of banana lots with high and low incidence of BW using also the Random Forest algorithm. The model showed that the incidence level (low or high) of Banana Wilt could be distinguished through its relationship with Zn, Fe, K, Ca, Mn and Clay content in the soil. These results can contribute to improve our understanding of the basic mechanisms and progression of BW incidence and identify soil variables that can play a determinant role in predicting risk and evolution of BW in banana farms in tropical lacustrine soils. The second issue, related to the relationship between banana productivity and soil properties, has been covered also in two studies. Chapter III contains the research oriented toward the development of an empirical correlation model to predict productivity based on soil characteristics. Five soil properties were found to have a clear agronomic and environmental importance: Mg, resistance to penetration, total microbial respiration, soil bulk density, and free-living omnivorous nematodes. This model could be used at the field level for the reliable identification of areas of high and low banana productivity in the studied areas of Venezuela. Finally, Chapter IV presents a study which can broaden the usefulness of soil information derived from soil profile descriptions. It validated the hypothesis that it is possible to delimit areas of different productivity within banana farms, in the two main banana producing areas of Venezuela (Aragua and Trujillo states) using soil morphological properties (e.g., soil structure). For this, we developed a model of categorical regression prediction calibrated with soil morphological properties such as biological activity, texture, dry consistency, reaction to HCl and structure type. In the future, if further studies are conducted validating this approach in other environmental conditions, banana productivity could be improved using information which might be already available or can be acquired at a moderate cost using standard soil profile descriptions. This PhD Thesis, has combined a systematic bibliographic review, crop and soil information from a systematic survey of different farm types in Venezuela with soil profile descriptions. Using that information, it has validated the hypothesis that by identifying the abiotic properties of the soil, the predisposition of the banana plant to the BW disease, and the potential productivity of the crop can be predicted. This approach can allow the differentiation of zones with different levels of productivity and BW risk, and as an immediate consequence, avoid areas of high risk or low productivity, or adapt agronomical practices to enhance productivity and sustainability of banana cropping systems in Venezuela.La banana, fruta comestible de las Musáceas, es un alimento básico para más de 400 millones de personas en todo el mundo debido a sus atributos nutricionales y energéticos. Esto hace de las Musáceas cultivos de importancia global, particularmente en regiones tropicales, remarcando la importancia de la mejora de los sistemas de cultivo en Musáceas dentro de los esfuerzos actuales a nivel mundial orientados a una nueva revolución agrícola basada en la sostenibilidad productiva. Para lograrlo, son necesarias buenas prácticas para la producción de alimentos basadas en la investigación científica y técnica capaces de considerar la complejidad y variabilidad dentro del sector agroalimentario. La investigación presentada en esta Tesis Doctoral está orientada a dar respuesta a las causas de dos aspectos considerados de alta relevancia para la producción bananera, que afectan tanto la productividad como la sostenibilidad, siempre dirigidas hacia las condiciones de Venezuela, uno de los principales países productores a nivel mundial: 1- El impacto del riesgo fitosanitario relacionado con la Fusariosis Vascular y la influencia del suelo en la incidencia de la Marchitez del Banano (MB) causada por un complejo fúngico-bacteriano. 2- Una tendencia observada hacia la pérdida de productividad y la disminución de la calidad del suelo en algunas fincas comerciales de los estados de Aragua y Trujillo en Venezuela. El primer tema, relacionado con la sanidad vegetal del banano, se ha abordado en dos estudios consecutivos. En primer lugar, en el Capítulo I se presenta una revisión sistemática sobre el efecto de los factores agroambientales en el impacto de la Fusariosis Vascular del banano, causada por Fusarium oxysporum f. sp. cubense (Foc) raza tropical 4 (TR4), y las implicaciones de esta enfermedad para el sistema de producción venezolano. Este Capítulo caracteriza sintéticamente información fiable sobre los factores bióticos y abióticos relacionados con la ocurrencia de Foc TR4, de forma conjunta al desarrollo de un análisis de riesgos y mapas de idoneidad climática para Foc TR4 en Venezuela. Este capítulo puede servir como un resumen básico del conocimiento disponible para el manejo de la enfermedad para que lo utilicen los técnicos y profesionales de la sanidad vegetal, así como para otras partes interesadas. La investigación orientada a los aspectos fitosanitarios del banano se completa con el estudio presentado en el Capítulo II. Este capítulo analiza la relación entre las propiedades del suelo y la incidencia de la Marchitez del Banano (MB) una enfermedad de etiología desconocida, atribuida a un complejo fúngico-bacteriano, en un estudio de caso de una finca comercial bananera en el estado de Aragua en Venezuela, cuya incidencia ha reducido la superficie plantada en más de un 35,0% en los últimos años. La aplicación del algoritmo Random Forest permitió clasificar la incidencia de MB en suelos lacustres de Venezuela con base a las propiedades físicas y químicas del suelo con buena precisión, siendo una herramienta eficaz para la toma de decisiones en campo. Además, el uso de información de suelos en áreas bananeras de Venezuela permitió la identificación de lotes de banano con alta y baja incidencia de MB utilizando también el algoritmo Random Forest. El modelo mostró que el nivel de incidencia (alta o baja) de la MB se puede distinguir a través de su relación con el contenido de Zn, Fe, K, Ca, Mn y arcilla en el suelo. Estos resultados contribuyen a mejorar nuestra comprensión acerca de los mecanismos básicos y la progresión de la incidencia de MB, e identifican las variables del suelo que pueden jugar un papel determinante en la predicción del riesgo y la evolución de MB en fincas bananeras de suelos lacustres tropicales. El segundo tema, relacionado con la productividad del banano y las propiedades del suelo, también se ha abordado en dos estudios. El Capítulo III contiene la investigación orientada al desarrollo de un modelo de correlación empírico para predecir la productividad del banano en base a las características del suelo. Se encontró que cinco propiedades del suelo tienen una clara importancia agronómica y ambiental: Mg, resistencia a la penetración, respiración microbiana total, densidad aparente del suelo y nematodos omnívoros de vida libre. Este modelo podría utilizarse a nivel de campo para la identificación confiable de áreas de alta y baja productividad bananera en las zonas estudiadas de Venezuela. Finalmente, el Capítulo IV presenta un estudio que puede ampliar la utilidad de la información derivada de las descripciones del perfil del suelo. Se validó la hipótesis de que es posible delimitar áreas de diferente productividad dentro de las fincas bananeras, en las dos principales áreas productoras de banano de Venezuela (estados Aragua y Trujillo) utilizando propiedades morfológicas del suelo (por ejemplo, estructura del suelo). Para ello, se desarrolló un modelo de predicción de regresión categórica calibrado con propiedades morfológicas del suelo tales como actividad biológica, textura, consistencia seca, reacción al HCl y tipo de estructura. En el futuro, si se llevan a cabo más estudios que validen este enfoque en otras condiciones ambientales, la productividad del banano podría mejorarse utilizando información que podría estar ya disponible o puede adquirirse a un costo moderado utilizando descripciones estándar del perfil de suelo. Esta Tesis Doctoral ha combinado una revisión sistemática de literatura, información de cultivos y suelos a partir de un muestreo sistemático de diferentes tipos de fincas en Venezuela con descripciones de perfiles de suelos. Con esa información, se ha validado la hipótesis de que, al identificar las propiedades abióticas del suelo, se puede predecir la predisposición de la planta de banano a la enfermedad de la MB y la productividad potencial del cultivo. Esta aproximación puede permitir la diferenciación de zonas con diferentes niveles de productividad y riesgo de la MB y, como consecuencia inmediata, evitar áreas de alto riesgo o baja productividad, incluso adaptar prácticas agronómicas para mejorar la productividad y sostenibilidad de los sistemas bananeros en Venezuela

    FEPI-MB: identifying SNPs-disease association using a Markov Blanket-based approach

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The interactions among genetic factors related to diseases are called epistasis. With the availability of genotyped data from genome-wide association studies, it is now possible to computationally unravel epistasis related to the susceptibility to common complex human diseases such as asthma, diabetes, and hypertension. However, the difficulties of detecting epistatic interaction arose from the large number of genetic factors and the enormous size of possible combinations of genetic factors. Most computational methods to detect epistatic interactions are predictor-based methods and can not find true causal factor elements. Moreover, they are both time-consuming and sample-consuming.</p> <p>Results</p> <p>We propose a new and fast Markov Blanket-based method, FEPI-MB (Fast EPistatic Interactions detection using Markov Blanket), for epistatic interactions detection. The Markov Blanket is a minimal set of variables that can completely shield the target variable from all other variables. Learning of Markov blankets can be used to detect epistatic interactions by a heuristic search for a minimal set of SNPs, which may cause the disease. Experimental results on both simulated data sets and a real data set demonstrate that FEPI-MB significantly outperforms other existing methods and is capable of finding SNPs that have a strong association with common diseases.</p> <p>Conclusions</p> <p>FEPI-MB algorithm outperforms other computational methods for detection of epistatic interactions in terms of both the power and sample-efficiency. Moreover, compared to other Markov Blanket learning methods, FEPI-MB is more time-efficient and achieves a better performance.</p

    The importance of disease incidence rate on performance of GBLUP, threshold BayesA and machine learning methods in original and imputed data set

    Get PDF
    Aim of study: To predict genomic accuracy of binary traits considering different rates of disease incidence.Area of study: SimulationMaterial and methods: Two machine learning algorithms including Boosting and Random Forest (RF) as well as threshold BayesA (TBA) and genomic BLUP (GBLUP) were employed. The predictive ability methods were evaluated for different genomic architectures using imputed (i.e. 2.5K, 12.5K and 25K panels) and their original 50K genotypes. We evaluated the three strategies with different rates of disease incidence (including 16%, 50% and 84% threshold points) and their effects on genomic prediction accuracy.Main results: Genotype imputation performed poorly to estimate the predictive ability of GBLUP, RF, Boosting and TBA methods when using the low-density single nucleotide polymorphisms (SNPs) chip in low linkage disequilibrium (LD) scenarios. The highest predictive ability, when the rate of disease incidence into the training set was 16%, belonged to GBLUP, RF, Boosting and TBA methods. Across different genomic architectures, the Boosting method performed better than TBA, GBLUP and RF methods for all scenarios and proportions of the marker sets imputed. Regarding the changes, the RF resulted in a further reduction compared to Boosting, TBA and GBLUP, especially when the applied data set contained 2.5K panels of the imputed genotypes.Research highlights: Generally, considering high sensitivity of methods to imputation errors, the application of imputed genotypes using RF method should be carefully evaluated

    CLINICAL AND BIOLOGICALLY-BASED APPROACHES FOR CLASSIFYING AND PREDICTING EARLY OUTCOMES OF CHRONIC CHILDHOOD ARTHRITIS

    Get PDF
    Background: Juvenile idiopathic arthritis (JIA) comprises a heterogeneous group of conditions that share chronic arthritis as a common characteristic. Current classification criteria for chronic childhood arthritis have limitations. Despite new treatment strategies and medications, some continue to have persistently active and disabling disease as adults. Few predictors of poor outcomes have been identified. Objectives: This thesis comprises two complementary studies. The objective of the first study was to identify discrete clusters comprising clinical features and inflammatory biomarkers in children with JIA and to compare them with the current JIA categories that have been proposed by the International League of Associations for Rheumatology. The second study aimed to identify predictors of short-term arthritis activity based on clinical and biomarker profiles in JIA patients. Methods: For both studies we utilized data that were collected in a Canadian nation-wide, prospective, longitudinal cohort study titled Biologically-Based Outcome Predictors in JIA. Clustering and classification algorithms were applied to the data to accomplish both study objectives. Results: This research identified three clusters of patients in visit 1 (enrolment) and five clusters in visit 2 (6-month). Clusters revealed in this analysis exposed different and more homogenous subgroups compared to the seven conventional JIA categories. In the second study, the presence or absence of active joints, physician global assessments, and Wallace criteria were chosen as outcome variables 18 months post-enrolment. Among 112 variables, 17 were selected as the best predictors of 18-month outcomes. The panel predicted presence or absence of active arthritis, physician global assessment, and Wallace criteria of inactive disease 18 months after diagnosis with 79%, 82%, and 71% accuracy and 0.83, 0.86, 0.82 area under the curve (AUC), respectively. The accuracy and AUC values were higher compared to when only clinical features were used for prediction. Conclusion: Results of this study suggest that certain groups of patients within different JIA categories are more aligned pathobiologically than their separate clinical categorizations suggest. Further, the research found a small number of clinical and inflammatory variables at diagnosis can more accurately predict short-term arthritis activity in JIA than clinical characteristics only

    Transcriptome Prediction Performance Across Machine Learning Models and Diverse Ancestries

    Get PDF
    Transcriptome prediction methods such as PrediXcan and FUSION have become popular in complex trait mapping. Most transcriptome prediction models have been trained in European populations using methods that make parametric linear assumptions like the elastic net (EN). To potentially further optimize imputation performance of gene expression across global populations, we built transcriptome prediction models using both linear and non-linear machine learning (ML) algorithms and evaluated their performance in comparison to EN. We trained models using genotype and blood monocyte transcriptome data from the Multi-Ethnic Study of Atherosclerosis (MESA) comprising individuals of African, Hispanic, and European ancestries and tested them using genotype and whole-blood transcriptome data from the Modeling the Epidemiology Transition Study (METS) comprising individuals of African ancestries. We show that the prediction performance is highest when the training and the testing population share similar ancestries regardless of the prediction algorithm used. While EN generally outperformed random forest (RF), support vector regression (SVR), and K nearest neighbor (KNN), we found that RF outperformed EN for some genes, particularly between disparate ancestries, suggesting potential robustness and reduced variability of RF imputation performance across global populations. When applied to a high-density lipoprotein (HDL) phenotype, we show including RF prediction models in PrediXcan revealed potential gene associations missed by EN models. Therefore, by integrating other ML modeling into PrediXcan and diversifying our training populations to include more global ancestries, we may uncover new genes associated with complex traits

    Artificial Neural Networks in Agriculture

    Get PDF
    Modern agriculture needs to have high production efficiency combined with a high quality of obtained products. This applies to both crop and livestock production. To meet these requirements, advanced methods of data analysis are more and more frequently used, including those derived from artificial intelligence methods. Artificial neural networks (ANNs) are one of the most popular tools of this kind. They are widely used in solving various classification and prediction tasks, for some time also in the broadly defined field of agriculture. They can form part of precision farming and decision support systems. Artificial neural networks can replace the classical methods of modelling many issues, and are one of the main alternatives to classical mathematical models. The spectrum of applications of artificial neural networks is very wide. For a long time now, researchers from all over the world have been using these tools to support agricultural production, making it more efficient and providing the highest-quality products possible

    CAD Tools for DNA Micro-Array Design, Manufacture and Application

    Get PDF
    Motivation: As the human genome project progresses and some microbial and eukaryotic genomes are recognized, numerous biotechnological processes have attracted increasing number of biologists, bioengineers and computer scientists recently. Biotechnological processes profoundly involve production and analysis of highthroughput experimental data. Numerous sequence libraries of DNA and protein structures of a large number of micro-organisms and a variety of other databases related to biology and chemistry are available. For example, microarray technology, a novel biotechnology, promises to monitor the whole genome at once, so that researchers can study the whole genome on the global level and have a better picture of the expressions among millions of genes simultaneously. Today, it is widely used in many fields- disease diagnosis, gene classification, gene regulatory network, and drug discovery. For example, designing organism specific microarray and analysis of experimental data require combining heterogeneous computational tools that usually differ in the data format; such as, GeneMark for ORF extraction, Promide for DNA probe selection, Chip for probe placement on microarray chip, BLAST to compare sequences, MEGA for phylogenetic analysis, and ClustalX for multiple alignments. Solution: Surprisingly enough, despite huge research efforts invested in DNA array applications, very few works are devoted to computer-aided optimization of DNA array design and manufacturing. Current design practices are dominated by ad-hoc heuristics incorporated in proprietary tools with unknown suboptimality. This will soon become a bottleneck for the new generation of high-density arrays, such as the ones currently being designed at Perlegen [109]. The goal of the already accomplished research was to develop highly scalable tools, with predictable runtime and quality, for cost-effective, computer-aided design and manufacturing of DNA probe arrays. We illustrate the utility of our approach by taking a concrete example of combining the design tools of microarray technology for Harpes B virus DNA data
    corecore