59 research outputs found
Gene Function Classification Using Bayesian Models with Hierarchy-Based Priors
We investigate the application of hierarchical classification schemes to the
annotation of gene function based on several characteristics of protein
sequences including phylogenic descriptors, sequence based attributes, and
predicted secondary structure. We discuss three Bayesian models and compare
their performance in terms of predictive accuracy. These models are the
ordinary multinomial logit (MNL) model, a hierarchical model based on a set of
nested MNL models, and a MNL model with a prior that introduces correlations
between the parameters for classes that are nearby in the hierarchy. We also
provide a new scheme for combining different sources of information. We use
these models to predict the functional class of Open Reading Frames (ORFs) from
the E. coli genome. The results from all three models show substantial
improvement over previous methods, which were based on the C5 algorithm. The
MNL model using a prior based on the hierarchy outperforms both the
non-hierarchical MNL model and the nested MNL model. In contrast to previous
attempts at combining these sources of information, our approach results in a
higher accuracy rate when compared to models that use each data source alone.
Together, these results show that gene function can be predicted with higher
accuracy than previously achieved, using Bayesian models that incorporate
suitable prior information
Incorporating field wind data to improve crop evapotranspiration parameterization in heterogeneous regions
Accurate parameterization of reference evapotranspiration ( ET0) is necessary for optimizing irrigation scheduling and avoiding costs associated with over-irrigation (water expense, loss of water productivity, energy costs, and pollution) or with under-irrigation (crop stress and suboptimal yields or quality). ET0 is often estimated using the FAO-56 method with meteorological data gathered over a reference surface, usually short grass. However, the density of suitable ET0 stations is often low relative to the microclimatic variability of many arid and semi-arid regions, leading to a potentially inaccurate ET0 for irrigation scheduling. In this study, we investigated multiple ET0 products from six meteorological stations, a satellite ET0 product, and integration (merger) of two stations’ data in Southern California, USA. We evaluated ET0 against lysimetric ET observations from two lysimeter systems (weighing and volumetric) and two crops (wine grapes and Jerusalem artichoke) by calculating crop ET ( ETc) using crop coefficients for the lysimetric crops with the different ET0. ETc calculated with ET0 products that incorporated field-specific wind speed had closer agreement with lysimetric ET, with RMSE reduced by 36 and 45% for grape and Jerusalem artichoke, respectively, with on-field anemometer data compared to wind data from the nearest station. The results indicate the potential importance of on-site meteorological sensors for ET0 parameterization; particularly where microclimates are highly variable and/or irrigation water is expensive or scarce
Stroke genetics informs drug discovery and risk prediction across ancestries
Previous genome-wide association studies (GWASs) of stroke - the second leading cause of death worldwide - were conducted predominantly in populations of European ancestry(1,2). Here, in cross-ancestry GWAS meta-analyses of 110,182 patients who have had a stroke (five ancestries, 33% non-European) and 1,503,898 control individuals, we identify association signals for stroke and its subtypes at 89 (61 new) independent loci: 60 in primary inverse-variance-weighted analyses and 29 in secondary meta-regression and multitrait analyses. On the basis of internal cross-ancestry validation and an independent follow-up in 89,084 additional cases of stroke (30% non-European) and 1,013,843 control individuals, 87% of the primary stroke risk loci and 60% of the secondary stroke risk loci were replicated (P < 0.05). Effect sizes were highly correlated across ancestries. Cross-ancestry fine-mapping, in silico mutagenesis analysis(3), and transcriptome-wide and proteome-wide association analyses revealed putative causal genes (such as SH3PXD2A and FURIN) and variants (such as at GRK5 and NOS3). Using a three-pronged approach(4), we provide genetic evidence for putative drug effects, highlighting F11, KLKB1, PROC, GP1BA, LAMC2 and VCAM1 as possible targets, with drugs already under investigation for stroke for F11 and PROC. A polygenic score integrating cross-ancestry and ancestry-specific stroke GWASs with vascular-risk factor GWASs (integrative polygenic scores) strongly predicted ischaemic stroke in populations of European, East Asian and African ancestry(5). Stroke genetic risk scores were predictive of ischaemic stroke independent of clinical risk factors in 52,600 clinical-trial participants with cardiometabolic disease. Our results provide insights to inform biology, reveal potential drug targets and derive genetic risk prediction tools across ancestries.</p
Stroke genetics informs drug discovery and risk prediction across ancestries
Previous genome-wide association studies (GWASs) of stroke — the second leading cause of death worldwide — were conducted predominantly in populations of European ancestry1,2. Here, in cross-ancestry GWAS meta-analyses of 110,182 patients who have had a stroke (five ancestries, 33% non-European) and 1,503,898 control individuals, we identify association signals for stroke and its subtypes at 89 (61 new) independent loci: 60 in primary inverse-variance-weighted analyses and 29 in secondary meta-regression and multitrait analyses. On the basis of internal cross-ancestry validation and an independent follow-up in 89,084 additional cases of stroke (30% non-European) and 1,013,843 control individuals, 87% of the primary stroke risk loci and 60% of the secondary stroke risk loci were replicated (P < 0.05). Effect sizes were highly correlated across ancestries. Cross-ancestry fine-mapping, in silico mutagenesis analysis3, and transcriptome-wide and proteome-wide association analyses revealed putative causal genes (such as SH3PXD2A and FURIN) and variants (such as at GRK5 and NOS3). Using a three-pronged approach4, we provide genetic evidence for putative drug effects, highlighting F11, KLKB1, PROC, GP1BA, LAMC2 and VCAM1 as possible targets, with drugs already under investigation for stroke for F11 and PROC. A polygenic score integrating cross-ancestry and ancestry-specific stroke GWASs with vascular-risk factor GWASs (integrative polygenic scores) strongly predicted ischaemic stroke in populations of European, East Asian and African ancestry5. Stroke genetic risk scores were predictive of ischaemic stroke independent of clinical risk factors in 52,600 clinical-trial participants with cardiometabolic disease. Our results provide insights to inform biology, reveal potential drug targets and derive genetic risk prediction tools across ancestries
Amine-synthesizing enzyme N-substituted formamide deformylase: Screening, purification, characterization, and gene cloning
N-substituted formamide was produced through the hydration of an isonitrile by isonitrile hydratase in the isonitrile metabolism. The former compound was further degraded by a microorganism, strain F164, which was isolated from soil through an acclimatization culture. The N-substituted formamide-degrading microorganism was identified as Arthrobacter pascens. The microbial degradation was found to proceed through an enzymatic reaction, the N-substituted formamide being hydrolyzed to yield the corresponding amine and formate. The enzyme, designated as N-substituted formamide deformylase (NfdA), was purified and characterized. The native enzyme had a molecular mass of ≈61 kDa and consisted of two identical subunits. It stoichiometrically catalyzed the hydrolysis of N-benzylformamide (an N-substituted formamide) to benzylamine and formate. Of all of the N-substituted formamides tested, N-benzylformamide was the most suitable substrate for the enzyme. However, no amides were accepted as substrates. The gene (nfdA) encoding this enzyme was also cloned. The deduced amino acid sequence of nfdA exhibited the highest overall sequence identity (28%) with those of regulatory proteins among known proteins. Only the N-terminal region (residues 58–72) of NfdA also showed significant sequence identity (27–73%) to that of each member of the amidohydrolase superfamily, although there was no similarity in the overall sequence except in the above limited region
- …