532 research outputs found

    Improved genome annotation through untargeted detection of pathway-specific metabolites

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Mass spectrometry-based metabolomics analyses have the potential to complement sequence-based methods of genome annotation, but only if raw mass spectral data can be linked to specific metabolic pathways. In untargeted metabolomics, the measured mass of a detected compound is used to define the location of the compound in chemical space, but uncertainties in mass measurements lead to "degeneracies" in chemical space since multiple chemical formulae correspond to the same measured mass. We compare two methods to eliminate these degeneracies. One method relies on natural isotopic abundances, and the other relies on the use of stable-isotope labeling (SIL) to directly determine C and N atom counts. Both depend on combinatorial explorations of the "chemical space" comprised of all possible chemical formulae comprised of biologically relevant chemical elements.</p> <p>Results</p> <p>Of 1532 metabolic pathways curated in the MetaCyc database, 412 contain a metabolite having a chemical formula unique to that metabolic pathway. Thus, chemical formulae alone can suffice to infer the presence of some metabolic pathways. Of 248,928 unique chemical formulae selected from the PubChem database, more than 95% had at least one degeneracy on the basis of accurate mass information alone. Consideration of natural isotopic abundance reduced degeneracy to 64%, but mainly for formulae less than 500 Da in molecular weight, and only if the error in the relative isotopic peak intensity was less than 10%. Knowledge of exact C and N atom counts as determined by SIL enabled reduced degeneracy, allowing for determination of unique chemical formula for 55% of the PubChem formulae.</p> <p>Conclusions</p> <p>To facilitate the assignment of chemical formulae to unknown mass-spectral features, profiling can be performed on cultures uniformly labeled with stable isotopes of nitrogen (<sup>15</sup>N) or carbon (<sup>13</sup>C). This makes it possible to accurately count the number of carbon and nitrogen atoms in each molecule, providing a robust means for reducing the degeneracy of chemical space and thus obtaining unique chemical formulae for features measured in untargeted metabolomics having a mass greater than 500 Da, with relative errors in measured isotopic peak intensity greater than 10%, and without the use of a chemical formula generator dependent on heuristic filtering. These chemical formulae can serve as indicators for the presence of particular metabolic pathways.</p

    Tales from the EMR: Does a 21st-Century Data Warehouse Facilitate Clinical Research for Pancreatic Cancer?

    Get PDF
    Background: The importance of an electronic medical record has been highlighted for both clinical care and research. In the current era, data warehouses and repositories have been established to serve the dual function of patient care and investigation. Purpose: The aim of this study was to compare a newly developed institutional clinical data warehouse, linked with the hospital information system (HIS), to a prospectively-maintained departmental database. Methods: A novel HIS-linked institutional clinical data warehouse was queried for 9 primary and secondary ICD-9-CM discharge diagnosis codes for pancreatic cancer. The database captured inpatient and outpatient clinical and billing information from a pool of over 2 million patients evaluated at an academic medical institution and its affiliates since 1995. A cohort was identified; following Institutional Review Board approval, demographic and clinical data was obtained. This data was compared to a manually-entered and prospectively-maintained surgical oncology database of the same institution, tracking 394 patients since 1999. Duplicated patients, and those unique to either dataset, were flagged. Patients with diagnosis dates prior to 1999 were excluded to allow comparison over the same time period. For validation purposes, a 10% random sample of remaining patients unique to each dataset underwent manual review of medical records including clinic notes, admission/discharge notes, diagnostic imaging, and pathology reports. Results: 1107 patients were identified from the HIS-linked dataset with pancreatic neoplasm-associated diagnosis codes dating from 1999 to 2009. Of these, 254 (22.9%) were captured in both datasets, while 853 (77.1%) were only in the HIS-linked dataset. Manual review of the 10% subset of the HIS-only group demonstrated that 55.6% of patients were without identifiable pancreatic pathology, suggesting miscoding, while 31.7% had diagnoses consistent with pancreatic neoplasm, and 12.7% had pseudocyst or pancreatitis. Of the 394 patients tracked by surgical oncology, 254 (64.5%) were captured in both datasets, while 140 (35.5%) had not been captured in the HIS-linked dataset. Manual review of the 10% subset of the non-captured patients demonstrated 93.3% with pancreatic neoplasm and 6.7% with pseudocyst or pancreatitis. Lastly, a review of the 10% subset of the 254 patient overlap demonstrated that 87.5% of patients were with pancreatic neoplasm, 8.3% with pseudocyst or pancreatitis, and 4.2% without pancreatic pathology. Conclusions: While technological advances provide a powerful means to automate institutional-level cohort identification and data collection, a high degree of misclassification may be present if queries are based solely on ICD-9-CM discharge codes. For that reason, careful validation and data cleaning are critical steps prior to research use. These results also suggest cautious interpretation of national-level administrative data utilizing ICD-9-CM diagnosis codes. Our findings suggest that the current state-of-the-art data warehouses continue to require clinical correlation and validation through traditional retrospective mechanisms

    A Semi-Quantitative, Synteny-Based Method to Improve Functional Predictions for Hypothetical and Poorly Annotated Bacterial and Archaeal Genes

    Get PDF
    During microbial evolution, genome rearrangement increases with increasing sequence divergence. If the relationship between synteny and sequence divergence can be modeled, gene clusters in genomes of distantly related organisms exhibiting anomalous synteny can be identified and used to infer functional conservation. We applied the phylogenetic pairwise comparison method to establish and model a strong correlation between synteny and sequence divergence in all 634 available Archaeal and Bacterial genomes from the NCBI database and four newly assembled genomes of uncultivated Archaea from an acid mine drainage (AMD) community. In parallel, we established and modeled the trend between synteny and functional relatedness in the 118 genomes available in the STRING database. By combining these models, we developed a gene functional annotation method that weights evolutionary distance to estimate the probability of functional associations of syntenous proteins between genome pairs. The method was applied to the hypothetical proteins and poorly annotated genes in newly assembled acid mine drainage Archaeal genomes to add or improve gene annotations. This is the first method to assign possible functions to poorly annotated genes through quantification of the probability of gene functional relationships based on synteny at a significant evolutionary distance, and has the potential for broad application

    Comparative genomics in acid mine drainage biofilm communities reveals metabolic and structural differentiation of co-occurring archaea

    Get PDF
    Background Metal sulfide mineral dissolution during bioleaching and acid mine drainage (AMD) formation creates an environment that is inhospitable to most life. Despite dominance by a small number of bacteria, AMD microbial biofilm communities contain a notable variety of coexisting and closely related Euryarchaea, most of which have defied cultivation efforts. For this reason, we used metagenomics to analyze variation in gene content that may contribute to niche differentiation among co-occurring AMD archaea. Our analyses targeted members of the Thermoplasmatales and related archaea. These results greatly expand genomic information available for this archaeal order. Results We reconstructed near-complete genomes for uncultivated, relatively low abundance organisms A-, E-, and Gplasma, members of Thermoplasmatales order, and for a novel organism, Iplasma. Genomic analyses of these organisms, as well as Ferroplasma type I and II, reveal that all are facultative aerobic heterotrophs with the ability to use many of the same carbon substrates, including methanol. Most of the genomes share genes for toxic metal resistance and surface-layer production. Only Aplasma and Eplasma have a full suite of flagellar genes whereas all but the Ferroplasma spp. have genes for pili production. Cryogenic-electron microscopy (cryo-EM) and tomography (cryo-ET) strengthen these metagenomics-based ultrastructural predictions. Notably, only Aplasma, Gplasma and the Ferroplasma spp. have predicted iron oxidation genes and Eplasma and Iplasma lack most genes for cobalamin, valine, (iso)leucine and histidine synthesis. Conclusion The Thermoplasmatales AMD archaea share a large number of metabolic capabilities. All of the uncultivated organisms studied here (A-, E-, G-, and Iplasma) are metabolically very similar to characterized Ferroplasma spp., differentiating themselves mainly in their genetic capabilities for biosynthesis, motility, and possibly iron oxidation. These results indicate that subtle, but important genomic differences, coupled with unknown differences in gene expression, distinguish these organisms enough to allow for co-existence. Overall this study reveals shared features of organisms from the Thermoplasmatales lineage and provides new insights into the functioning of AMD communities.United States. Dept. of Energy. Genomics:GTL (Grant DE-FG02-05ER64134)National Science Foundation (U.S.). Graduate Research Fellowshi

    Is pancreatic cancer palliatable? A national study

    Get PDF
    Background: Pancreatic cancer is frequently diagnosed at advanced stages where potentially curative resection is no longer possible. Palliative procedures can be performed; however, results on a national level are unknown. This study examines pancreatic cancer patients who underwent potentially palliative procedures including gastric bypass, biliary bypass surgery, celiac block, biliary stent, gastrostomy or jejunostomy, and examines post-intervention complications and 30-day mortality. Methods: SEER-Medicare 1991-2005 was used to identify patients with Stage 3-4 pancreatic cancer. Complication rates were calculated including post-op infection, myocardial infarction, aspiration pneumonia, DVT/PE, pulmonary compromise, gastric bleed, acute renal failure, and reoperation. Kaplan-Meier survival analysis was performed. Finally, Cox proportional hazards modeling was used to control for the effects of age, sex, race, stage, and resection. Results: Of 22,314 pancreatic cancer patients, 858 (3.9%) patients were Stage 3, and 11,149 (50.0%) stage 4. Post-procedure median survival for all patients is approximately two months, with longest survival for biliary bypass patients (3.2mo, 95% CI(2.9-3.7), and lowest survival for jejunostomy 1.3 mo (1.2-1.5) and gastrostomy 1.5 mo (1.4-1.8). Post-procedure 30-day mortality was highest for gastrostomy patients at 41.5%; followed by jejunostomy (39.1%), celiac plexus block (30.0%), gastric bypass (23.8%), biliary bypass (17.8%), and biliary stent (21.2%). The rate of complications averaged 40%, with highest rate for gastrostomy (47.4%) and gastric bypass (45.3%) and lowest for celiac plexus block (29.3%). Stage 4 disease was an independent predictor of death for patients undergoing five out of six procedures. Conclusion: We found that morbidity and mortality of palliative procedures in unresectable pancreatic cancer is high, especially in stage 4 patients. Further studies need to be conducted to identify patients who will have sufficient expected post-procedure survival to benefit from these palliative interventions

    Drift dynamics in microbial communities and the effective community size

    Get PDF
    The structure and diversity of all open microbial communities are shaped by individual births, deaths, speciation and immigration events; the precise timings of these events are unknowable and unpredictable. This randomness is manifest as ecological drift in the population dynamics, the importance of which has been a source of debate for decades. There are theoretical reasons to suppose that drift would be imperceptible in large microbial communities, but this is at odds with circumstantial evidence that effects can be seen even in huge, complex communities. To resolve this dichotomy we need to observe dynamics in simple systems where key parameters, like migration, birth and death rates can be directly measured. We monitored the dynamics in the abundance of two genetically modified strains of Escherichia coli, with tuneable growth characteristics, that were mixed and continually fed into 10 identical chemostats. We demonstrated that the effects of demographic (non‐environmental) stochasticity are very apparent in the dynamics. However, they do not conform to the most parsimonious and commonly applied mathematical models, where each stochastic event is independent. For these simple models to reproduce the observed dynamics we need to invoke an “effective community size”, which is smaller than the census community size

    A Large, Uniform Sample of X-ray Emitting AGN from the ROSAT All-Sky and Sloan Digital Sky Surveys: the Data Release 5 Sample

    Get PDF
    We describe further results of a program aimed to yield ~10^4 fully characterized optical identifications of ROSAT X-ray sources. Our program employs X-ray data from the ROSAT All-Sky Survey (RASS), and both optical imaging and spectroscopic data from the Sloan Digital Sky Survey (SDSS). RASS/SDSS data from 5740 deg^2 of sky spectroscopically covered in SDSS Data Release 5 (DR5) provide an expanded catalog of 7000 confirmed quasars and other AGN that are probable RASS identifications. Again in our expanded catalog, the identifications as X-ray sources are statistically secure, with only a few percent of the SDSS AGN likely to be randomly superposed on unrelated RASS X-ray sources. Most identifications continue to be quasars and Seyfert 1s with 15<m<21 and 0.01<z<4; but the total sample size has grown to include very substantial numbers of even quite rare AGN, e.g., now including several hundreds of candidate X-ray emitting BL Lacs and narrow-line Seyfert 1 galaxies. In addition to exploring rare subpopulations, such a large total sample may be useful when considering correlations between the X-ray and the optical, and may also serve as a resource list from which to select the "best" object (e.g., X-ray brightest AGN of a certain subclass, at a preferred redshift or luminosity) for follow-on X-ray spectral or alternate detailed studies.Comment: Accepted for publication in AJ; 32 pages, including 11 figures, and 6 example table

    Buildings behaving badly:A behavioral experiment on how different motivational frames influence residential energy label adoption in the Netherlands

    Get PDF
    Heating buildings contributes to approximately 36% of Europe’s energy demand and several EU member states have adopted mandatory energy labels to improve energy efficiency by promoting home weatherization investments. This paper focuses on the perception of the energy label for residential buildings in the Netherlands and the role of different frames (egoistic, biospheric and social norms and neutral frames) in motivating adoption of energy labels for housing. We used a behavioral email experiment and an online survey to investigate these motivational factors. We find that biospheric frames are weaker than the other three motivational frames in terms of engaging interest in the energy label, but that the biospheric frame results in higher willingness to pay (WTP) for the energy label. We also find that age (rather than income) correlates with higher willingness to pay for home energy labels
    • 

    corecore