14 research outputs found
Meningococcus genome informatics platform: a system for analyzing multilocus sequence typing data
The Meningococcus Genome Informatics Platform (MGIP) is a suite of computational tools for the analysis of multilocus sequence typing (MLST) data, at http://mgip.biology.gatech.edu. MLST is used to generate allelic profiles to characterize strains of Neisseria meningitidis, a major cause of bacterial meningitis worldwide. Neisseria meningitidis strains are characterized with MLST as specific sequence types (ST) and clonal complexes (CC) based on the DNA sequences at defined loci. These data are vital to molecular epidemiology studies of N. meningitidis, including outbreak investigations and population biology. MGIP analyzes DNA sequence trace files, returns individual allele calls and characterizes the STs and CCs. MGIP represents a substantial advance over existing software in several respects: (i) ease of useâMGIP is user friendly, intuitive and thoroughly documented; (ii) flexibilityâbecause MGIP is a website, it is compatible with any computer with an internet connection, can be used from any geographic location, and there is no installation; (iii) speedâMGIP takes just over one minute to process a set of 96 trace files; and (iv) expandabilityâMGIP has the potential to expand to more loci than those used in MLST and even to other bacterial species
A computational genomics pipeline for prokaryotic sequencing projects
Motivation: New sequencing technologies have accelerated research on prokaryotic genomes and have made genome sequencing operations outside major genome sequencing centers routine. However, no off-the-shelf solution exists for the combined assembly, gene prediction, genome annotation and data presentation necessary to interpret sequencing data. The resulting requirement to invest significant resources into custom informatics support for genome sequencing projects remains a major impediment to the accessibility of high-throughput sequence data
Comparative Genomic Analysis of Clinical Strains of Campylobacter jejuni from South Africa
BACKGROUND: Campylobacter jejuni is a common cause of acute gastroenteritis and is also associated with the post-infectious neuropathies, Guillain-Barré and Miller Fisher syndromes. In the Cape Town area of South Africa, C. jejuni strains with Penner heat-stable (HS) serotype HS:41 have been observed to be overrepresented among cases of Guillain-Barré syndrome. The present study examined the genetic content of a collection of 32 South African C. jejuni strains with different serotypes, including 13 HS:41 strains, that were recovered from patients with enteritis, Guillain-Barré or Miller Fisher syndromes. The sequence-based typing methods, multilocus sequence typing and DNA microarrays, were employed to potentially identify distinguishing features within the genomes of these C. jejuni strains with various disease outcomes. METHODOLOGY/PRINCIPAL FINDINGS: Comparative genomic analyses demonstrated that the HS:41 South African strains were clearly distinct from the other South African strains. Further DNA microarray analysis demonstrated that the HS:41 strains from South African patients with the Guillain-Barré syndrome or enteritis were highly similar in gene content. Interestingly, the South African HS:41 strains were distinct in gene content when compared to HS:41 strains from other geographical locations due to the presence of genomic islands, referred to as Campylobacter jejuni integrated elements (CJIEs). Only the integrated element CJIE1, a Campylobacter Mu-like prophage, was present in the South African HS:41 strains whereas this element was absent in two closely-related HS:41 strains from Mexico. A more distantly-related HS:41 strain from Canada possessed both integrated elements CJIE1 and CJIE2. CONCLUSION/SIGNIFICANCE: These findings demonstrate that CJIEs may contribute to the differentiation of closely-related C. jejuni strains. In addition, the presence of bacteriophage-related genes in CJIE1 may contribute to the genomic diversity of C. jejuni strains. This comparative genomic analysis of C. jejuni provides fundamental information that potentially could lead to improved methods for analyzing the epidemiology of disease outbreaks
Finishing the euchromatic sequence of the human genome
The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers âŒ99% of the euchromatic genome and is accurate to an error rate of âŒ1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead
Summation of series,
A number of pages left blank for additional series.Bibliography: p. xi.Mode of access: Internet
Rangeland CEAP: An Assessment of Natural Resources Conservation Service Practices
On The Ground âą The Conservation Effects Assessment Project (CEAP) is a multi-agency effort to quantify the Environmental effects of conservation practices and programs and develop the science base for managing the agricultural landscape for environmental quality. âą The rangeland CEAP review evaluated the scientific literature on seven core NRCS conservation practices: prescribed grazing, prescribed burning, brush management, range planting, riparian herbaceous cover, upland wildlife habitat management, and herbaceous weed control. âą The scientific literature âbroadly supportsâ the reviewed rangeland conservation practices standards; however, there is a disjunct in integrating science and field-based knowledge so that managers and conservationists can fully understand the individualistic dynamic aspects of rangeland conservation practices. âą The CEAP synthesis establishes a precedent for partnerships among scientists, land managers, conservation specialists, and policymakers to provide NRCS with useful, current, science-based information for rangeland conservation practices.The Rangelands archives are made available by the Society for Range Management and the University of Arizona Libraries. Contact [email protected] for further information.Migrated from OJS platform March 202
Recommended from our members
Comparative performances of machine learning methods for classifying Crohn Disease patients using genome-wide genotyping data
Abstract: Crohn Disease (CD) is a complex genetic disorder for which more than 140 genes have been identified using genome wide association studies (GWAS). However, the genetic architecture of the trait remains largely unknown. The recent development of machine learning (ML) approaches incited us to apply them to classify healthy and diseased people according to their genomic information. The Immunochip dataset containing 18,227 CD patients and 34,050 healthy controls enrolled and genotyped by the international Inflammatory Bowel Disease genetic consortium (IIBDGC) has been re-analyzed using a set of ML methods: penalized logistic regression (LR), gradient boosted trees (GBT) and artificial neural networks (NN). The main score used to compare the methods was the Area Under the ROC Curve (AUC) statistics. The impact of quality control (QC), imputing and coding methods on LR results showed that QC methods and imputation of missing genotypes may artificially increase the scores. At the opposite, neither the patient/control ratio nor marker preselection or coding strategies significantly affected the results. LR methods, including Lasso, Ridge and ElasticNet provided similar results with a maximum AUC of 0.80. GBT methods like XGBoost, LightGBM and CatBoost, together with dense NN with one or more hidden layers, provided similar AUC values, suggesting limited epistatic effects in the genetic architecture of the trait. ML methods detected near all the genetic variants previously identified by GWAS among the best predictors plus additional predictors with lower effects. The robustness and complementarity of the different methods are also studied. Compared to LR, non-linear models such as GBT or NN may provide robust complementary approaches to identify and classify genetic markers