59 research outputs found

    Gene Function Classification Using Bayesian Models with Hierarchy-Based Priors

    Get PDF
    We investigate the application of hierarchical classification schemes to the annotation of gene function based on several characteristics of protein sequences including phylogenic descriptors, sequence based attributes, and predicted secondary structure. We discuss three Bayesian models and compare their performance in terms of predictive accuracy. These models are the ordinary multinomial logit (MNL) model, a hierarchical model based on a set of nested MNL models, and a MNL model with a prior that introduces correlations between the parameters for classes that are nearby in the hierarchy. We also provide a new scheme for combining different sources of information. We use these models to predict the functional class of Open Reading Frames (ORFs) from the E. coli genome. The results from all three models show substantial improvement over previous methods, which were based on the C5 algorithm. The MNL model using a prior based on the hierarchy outperforms both the non-hierarchical MNL model and the nested MNL model. In contrast to previous attempts at combining these sources of information, our approach results in a higher accuracy rate when compared to models that use each data source alone. Together, these results show that gene function can be predicted with higher accuracy than previously achieved, using Bayesian models that incorporate suitable prior information

    Incorporating field wind data to improve crop evapotranspiration parameterization in heterogeneous regions

    Get PDF
    Accurate parameterization of reference evapotranspiration ( ET0) is necessary for optimizing irrigation scheduling and avoiding costs associated with over-irrigation (water expense, loss of water productivity, energy costs, and pollution) or with under-irrigation (crop stress and suboptimal yields or quality). ET0 is often estimated using the FAO-56 method with meteorological data gathered over a reference surface, usually short grass. However, the density of suitable ET0 stations is often low relative to the microclimatic variability of many arid and semi-arid regions, leading to a potentially inaccurate ET0 for irrigation scheduling. In this study, we investigated multiple ET0 products from six meteorological stations, a satellite ET0 product, and integration (merger) of two stations’ data in Southern California, USA. We evaluated ET0 against lysimetric ET observations from two lysimeter systems (weighing and volumetric) and two crops (wine grapes and Jerusalem artichoke) by calculating crop ET ( ETc) using crop coefficients for the lysimetric crops with the different ET0. ETc calculated with ET0 products that incorporated field-specific wind speed had closer agreement with lysimetric ET, with RMSE reduced by 36 and 45% for grape and Jerusalem artichoke, respectively, with on-field anemometer data compared to wind data from the nearest station. The results indicate the potential importance of on-site meteorological sensors for ET0 parameterization; particularly where microclimates are highly variable and/or irrigation water is expensive or scarce

    Stroke genetics informs drug discovery and risk prediction across ancestries

    Get PDF
    Previous genome-wide association studies (GWASs) of stroke - the second leading cause of death worldwide - were conducted predominantly in populations of European ancestry(1,2). Here, in cross-ancestry GWAS meta-analyses of 110,182 patients who have had a stroke (five ancestries, 33% non-European) and 1,503,898 control individuals, we identify association signals for stroke and its subtypes at 89 (61 new) independent loci: 60 in primary inverse-variance-weighted analyses and 29 in secondary meta-regression and multitrait analyses. On the basis of internal cross-ancestry validation and an independent follow-up in 89,084 additional cases of stroke (30% non-European) and 1,013,843 control individuals, 87% of the primary stroke risk loci and 60% of the secondary stroke risk loci were replicated (P < 0.05). Effect sizes were highly correlated across ancestries. Cross-ancestry fine-mapping, in silico mutagenesis analysis(3), and transcriptome-wide and proteome-wide association analyses revealed putative causal genes (such as SH3PXD2A and FURIN) and variants (such as at GRK5 and NOS3). Using a three-pronged approach(4), we provide genetic evidence for putative drug effects, highlighting F11, KLKB1, PROC, GP1BA, LAMC2 and VCAM1 as possible targets, with drugs already under investigation for stroke for F11 and PROC. A polygenic score integrating cross-ancestry and ancestry-specific stroke GWASs with vascular-risk factor GWASs (integrative polygenic scores) strongly predicted ischaemic stroke in populations of European, East Asian and African ancestry(5). Stroke genetic risk scores were predictive of ischaemic stroke independent of clinical risk factors in 52,600 clinical-trial participants with cardiometabolic disease. Our results provide insights to inform biology, reveal potential drug targets and derive genetic risk prediction tools across ancestries.</p

    Stroke genetics informs drug discovery and risk prediction across ancestries

    Get PDF
    Previous genome-wide association studies (GWASs) of stroke — the second leading cause of death worldwide — were conducted predominantly in populations of European ancestry1,2. Here, in cross-ancestry GWAS meta-analyses of 110,182 patients who have had a stroke (five ancestries, 33% non-European) and 1,503,898 control individuals, we identify association signals for stroke and its subtypes at 89 (61 new) independent loci: 60 in primary inverse-variance-weighted analyses and 29 in secondary meta-regression and multitrait analyses. On the basis of internal cross-ancestry validation and an independent follow-up in 89,084 additional cases of stroke (30% non-European) and 1,013,843 control individuals, 87% of the primary stroke risk loci and 60% of the secondary stroke risk loci were replicated (P < 0.05). Effect sizes were highly correlated across ancestries. Cross-ancestry fine-mapping, in silico mutagenesis analysis3, and transcriptome-wide and proteome-wide association analyses revealed putative causal genes (such as SH3PXD2A and FURIN) and variants (such as at GRK5 and NOS3). Using a three-pronged approach4, we provide genetic evidence for putative drug effects, highlighting F11, KLKB1, PROC, GP1BA, LAMC2 and VCAM1 as possible targets, with drugs already under investigation for stroke for F11 and PROC. A polygenic score integrating cross-ancestry and ancestry-specific stroke GWASs with vascular-risk factor GWASs (integrative polygenic scores) strongly predicted ischaemic stroke in populations of European, East Asian and African ancestry5. Stroke genetic risk scores were predictive of ischaemic stroke independent of clinical risk factors in 52,600 clinical-trial participants with cardiometabolic disease. Our results provide insights to inform biology, reveal potential drug targets and derive genetic risk prediction tools across ancestries

    Gold

    No full text

    Amine-synthesizing enzyme N-substituted formamide deformylase: Screening, purification, characterization, and gene cloning

    No full text
    N-substituted formamide was produced through the hydration of an isonitrile by isonitrile hydratase in the isonitrile metabolism. The former compound was further degraded by a microorganism, strain F164, which was isolated from soil through an acclimatization culture. The N-substituted formamide-degrading microorganism was identified as Arthrobacter pascens. The microbial degradation was found to proceed through an enzymatic reaction, the N-substituted formamide being hydrolyzed to yield the corresponding amine and formate. The enzyme, designated as N-substituted formamide deformylase (NfdA), was purified and characterized. The native enzyme had a molecular mass of ≈61 kDa and consisted of two identical subunits. It stoichiometrically catalyzed the hydrolysis of N-benzylformamide (an N-substituted formamide) to benzylamine and formate. Of all of the N-substituted formamides tested, N-benzylformamide was the most suitable substrate for the enzyme. However, no amides were accepted as substrates. The gene (nfdA) encoding this enzyme was also cloned. The deduced amino acid sequence of nfdA exhibited the highest overall sequence identity (28%) with those of regulatory proteins among known proteins. Only the N-terminal region (residues 58–72) of NfdA also showed significant sequence identity (27–73%) to that of each member of the amidohydrolase superfamily, although there was no similarity in the overall sequence except in the above limited region
    corecore