47 research outputs found
Low-pass sequencing for microbial comparative genomics
BACKGROUND: We studied four extremely halophilic archaea by low-pass shotgun sequencing: (1) the metabolically versatile Haloarcula marismortui; (2) the non-pigmented Natrialba asiatica; (3) the psychrophile Halorubrum lacusprofundi and (4) the Dead Sea isolate Halobaculum gomorrense. Approximately one thousand single pass genomic sequences per genome were obtained. The data were analyzed by comparative genomic analyses using the completed Halobacterium sp. NRC-1 genome as a reference. Low-pass shotgun sequencing is a simple, inexpensive, and rapid approach that can readily be performed on any cultured microbe. RESULTS: As expected, the four archaeal halophiles analyzed exhibit both bacterial and eukaryotic characteristics as well as uniquely archaeal traits. All five halophiles exhibit greater than sixty percent GC content and low isoelectric points (pI) for their predicted proteins. Multiple insertion sequence (IS) elements, often involved in genome rearrangements, were identified in H. lacusprofundi and H. marismortui. The core biological functions that govern cellular and genetic mechanisms of H. sp. NRC-1 appear to be conserved in these four other halophiles. Multiple TATA box binding protein (TBP) and transcription factor IIB (TFB) homologs were identified from most of the four shotgunned halophiles. The reconstructed molecular tree of all five halophiles shows a large divergence between these species, but with the closest relationship being between H. sp. NRC-1 and H. lacusprofundi. CONCLUSION: Despite the diverse habitats of these species, all five halophiles share (1) high GC content and (2) low protein isoelectric points, which are characteristics associated with environmental exposure to UV radiation and hypersalinity, respectively. Identification of multiple IS elements in the genome of H. lacusprofundi and H. marismortui suggest that genome structure and dynamic genome reorganization might be similar to that previously observed in the IS-element rich genome of H. sp. NRC-1. Identification of multiple TBP and TFB homologs in these four halophiles are consistent with the hypothesis that different types of complex transcriptional regulation may occur through multiple TBP-TFB combinations in response to rapidly changing environmental conditions. Low-pass shotgun sequence analyses of genomes permit extensive and diverse analyses, and should be generally useful for comparative microbial genomics
Method for Independent Estimation of the False Localization Rate for Phosphoproteomics
Phosphoproteomic methods are commonly employed to identify and quantify phosphorylation sites on proteins. In recent years, various tools have been developed, incorporating scores or statistics related to whether a given phosphosite has been correctly identified or to estimate the global false localization rate (FLR) within a given data set for all sites reported. These scores have generally been calibrated using synthetic datasets, and their statistical reliability on real datasets is largely unknown, potentially leading to studies reporting incorrectly localized phosphosites, due to inadequate statistical control. In this work, we develop the concept of scoring modifications on a decoy amino acid, that is, one that cannot be modified, to allow for independent estimation of global FLR. We test a variety of amino acids, on both synthetic and real data sets, demonstrating that the selection can make a substantial difference to the estimated global FLR. We conclude that while several different amino acids might be appropriate, the most reliable FLR results were achieved using alanine and leucine as decoys. We propose the use of a decoy amino acid to control false reporting in the literature and in public databases that re-distribute the data. Data are available via ProteomeXchange with identifier PXD028840
Chromosomes 4 and 8 implicated in a genome wide SNP linkage scan of 762 prostate cancer families collected by the ICPCG
BACKGROUND In spite of intensive efforts, understanding of the genetic aspects of familial prostate cancer (PC) remains largely incomplete. In a previous microsatelliteâbased linkage scan of 1,233 PC families, we identified suggestive evidence for linkage (i.e., LODââ„â1.86) at 5q12, 15q11, 17q21, 22q12, and two loci on 8p, with additional regions implicated in subsets of families defined by age at diagnosis, disease aggressiveness, or number of affected members. METHODS In an attempt to replicate these findings and increase linkage resolution, we used the Illumina 6000 SNP linkage panel to perform a genomeâwide linkage scan of an independent set of 762 multiplex PC families, collected by 11 International Consortium for Prostate Cancer Genetics (ICPCG) groups. RESULTS Of the regions identified previously, modest evidence of replication was observed only on the short arm of chromosome 8, where HLOD scores of 1.63 and 3.60 were observed in the complete set of families and families with young average age at diagnosis, respectively. The most significant linkage signals found in the complete set of families were observed across a broad, 37âcM interval on 4q13â25, with LOD scores ranging from 2.02 to 2.62, increasing to 4.50 in families with older average age at diagnosis. In families with multiple cases presenting with more aggressive disease, LOD scores over 3.0 were observed at 8q24 in the vicinity of previously identified common PC risk variants, as well as MYC , an important gene in PC biology. CONCLUSIONS These results will be useful in prioritizing future susceptibility gene discovery efforts in this common cancer. Prostate 72:410â426, 2012. © 2011 Wiley Periodicals, Inc.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/90245/1/21443_ftp.pd
Analysis of Xq27-28 linkage in the international consortium for prostate cancer genetics (ICPCG) families.
BACKGROUND: Genetic variants are likely to contribute to a portion of prostate cancer risk. Full elucidation of the genetic etiology of prostate cancer is difficult because of incomplete penetrance and genetic and phenotypic heterogeneity. Current evidence suggests that genetic linkage to prostate cancer has been found on several chromosomes including the X; however, identification of causative genes has been elusive. METHODS: Parametric and non-parametric linkage analyses were performed using 26 microsatellite markers in each of 11 groups of multiple-case prostate cancer families from the International Consortium for Prostate Cancer Genetics (ICPCG). Meta-analyses of the resultant family-specific linkage statistics across the entire 1,323 families and in several predefined subsets were then performed. RESULTS: Meta-analyses of linkage statistics resulted in a maximum parametric heterogeneity lod score (HLOD) of 1.28, and an allele-sharing lod score (LOD) of 2.0 in favor of linkage to Xq27-q28 at 138 cM. In subset analyses, families with average age at onset less than 65 years exhibited a maximum HLOD of 1.8 (at 138 cM) versus a maximum regional HLOD of only 0.32 in families with average age at onset of 65 years or older. Surprisingly, the subset of families with only 2-3 affected men and some evidence of male-to-male transmission of prostate cancer gave the strongest evidence of linkage to the region (HLOD = 3.24, 134 cM). For this subset, the HLOD was slightly increased (HLOD = 3.47 at 134 cM) when families used in the original published report of linkage to Xq27-28 were excluded. CONCLUSIONS: Although there was not strong support for linkage to the Xq27-28 region in the complete set of families, the subset of families with earlier age at onset exhibited more evidence of linkage than families with later onset of disease. A subset of families with 2-3 affected individuals and with some evidence of male to male disease transmission showed stronger linkage signals. Our results suggest that the genetic basis for prostate cancer in our families is much more complex than a single susceptibility locus on the X chromosome, and that future explorations of the Xq27-28 region should focus on the subset of families identified here with the strongest evidence of linkage to this region.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are
Chromosomes 4 and 8 implicated in a genome wide SNP linkage scan of 762 prostate cancer families collected by the ICPCG
In spite of intensive efforts, understanding of the genetic aspects of familial prostate cancer remains largely incomplete. In a previous microsatellite-based linkage scan of 1233 prostate cancer (PC) families, we identified suggestive evidence for linkage (i.e. LODâ„1.86) at 5q12, 15q11, 17q21, 22q12, and two loci on 8p, with additional regions implicated in subsets of families defined by age at diagnosis, disease aggressiveness, or number of affected members
Genome-wide linkage analysis of 1,233 prostate cancer pedigrees from the International Consortium for prostate cancer Genetics using novel sumLINK and sumLOD analyses
Prostate cancer is generally believed to have a strong inherited component, but the search for susceptibility genes has been hindered by the effects of genetic heterogeneity. The recently developed sumLINK and sumLOD statistics are powerful tools for linkage analysis in the presence of heterogeneity
Behavior of the atomic oxygen 5577 à ngström emission intensity at mid-latitudes: a climatological view
Thesis (Ph. D.)--University of Washington, 2000A global mid-latitude study of the atomic oxygen green line emission intensity at 5577 A has been undertaken with the goal of developing a climatological understanding of the emission behavior and its usefulness as a tracer for the atmosphere near 97 km. Long-term observations have been analysed at nine stations covering periods of âŒ8--12 years, for a total of over 90 years of measurements. The results of this investigation show that the emission typically exhibits a maximum near the summer solstice and again near the fall equinox, before falling to a low winter-time level that persists into the middle of spring. Importantly, the oft-reported maximum at the spring equinox is not a statistically significant feature on the climatological time scale. This finding has implications on our understanding of the dominant processes operating in the region. Specifically, the role of seasonally varying vertical diffusion caused by breaking gravity waves must be readdressed in light of the absence of a strong maximum at the spring equinox.This work also addresses the relationship between the green line emission intensity and geomagnetic and solar activity. Results show that failing to exclude observations taken under high geomagnetic activity conditions leads to increased springtime emission levels and may be one explanation for this feature as has been reported by others. The influence of solar activity on the green line emission over the long term is shown to exhibit a hysteresis effect within a given solar cycle, confirming that there is not a simple linear relationship between the two processes.Finally, a critical examination is made of how long of a data series is necessary to fully achieve a climatological understanding of this emission and how this understanding may reasonably be used to advance our understanding of the upper middle atmosphere region. After âŒ10 years, features with periods less than one year become stable (or achieve climatology), but the data examined here show unresolved power at periods approaching the series length which need longer data coverage to fully characterize
A method for independent estimation of false localisation rate for phosphoproteomics
Phosphoproteomics methods are commonly employed in labs to identify and quantify the sites of phosphorylation on proteins. In recent years, various software tools have been developed, incorporating scores or statistics related to whether a given phosphosite has been correctly identified, or to estimate the global false localisation rate (FLR) within a given data set for all sites reported. These scores have generally been calibrated using synthetic data sets, and their statistical reliability on real datasets is largely unknown. As a result, there is considerable problem in the field of reporting incorrectly localised phosphosites, due to inadequate statistical control. In this work, we develop the concept of using scoring and ranking modifications on a decoy amino acid, i.e. one that cannot be modified, to allow for independent estimation of global FLR. We test a variety of different amino acids to act as the decoy, on both synthetic and real data sets, demonstrating that the amino acid selection can make a substantial difference to the estimated global FLR. We conclude that while several different amino acids might be appropriate, the most reliable FLR results were achieved using alanine and leucine as decoys, although we have a preference for alanine due to the risk of potential confusion between leucine and isoleucine amino acids. We propose that the phosphoproteomics field should adopt the use of a decoy amino acid, so that there is better control of false reporting in the literature, and in public databases that re-distribute the data. Data are available via ProteomeXchange with identifier PXD028840
The offset correlation, a novel quality measure for planning geochemical surveys of the soil by kriging
This paper presents a quality measure to plan geostatistical soil surveys when measures based on the kriging variance are not applicable. The criterion is the consistency of estimates made from two non-coincident instantiations of a proposed sample design. We consider square sample grids, one instantiation is offset from the second by half the grid spacing along the rows and along the columns. If a sample grid is coarse relative to the important scales of variation in the target property then the consistency of predictions from two instantiations is expected to be small, and can be increased by reducing the grid spacing. The measure of consistency is the correlation between estimates from the two instantiations of the sample grid, averaged over a grid cell. We call this the offset correlation, it can be calculated from the variogram. This quality measure is illustrated for some hypothetical examples, considering both ordinary kriging and factorial kriging of the variable of interest. The factorial kriging case is considered since, when planning a small-scale synoptic geochemical survey we may wish only to map components of the variation of the target variable at certain spatial scales. The quality measure is then computed for ordinary and factorial kriging with variograms estimated from data on nickel, chromium and cobalt content of soil in the north-east of England. Our results show how the offset correlation responds to sample density and the form of the variogram, and how larger correlations can be achieved for factorial kriging than ordinary kriging at a given density. The results for data on soil metals showed that an offset correlation of 0.8 could not be achieved (ordinary kriging) by sampling at 5-km intervals, the density at which all of England and Wales is sampled. However, if the objective were to map by factorial kriging the coarser-scale components of variation, driven primarily by parent material, then for two of the metals (Co and Cr) the 5-km grid was adequate, and the sample effort of the survey from which the data were taken (0.44 samples kmâ 2) was excessive