5 research outputs found

    INVESTIGATION OF SOME POSSIBLE ORIGINS OF PROTEIN FAMILIES

    Get PDF
    ABSTRACT Title of Document: INVESTIGATION OF SOME POSSIBLE ORIGINS OF PROTEIN FAMILIES Nuttinee Teerakulkittipong, Ph.D., 2013 Directed By: Professor John Moult, Institute for Bioscience and Biotechnology Research Department of Cell Biology and Molecular Genetics The prevailing view of the evolutionary history of proteins has been that all protein domains are descendents of distinct evolutionary lines, and that these lines are all relatively ancient families. The primary basis for that view was that known protein structures could be grouped by similarity of topology into a small number of folds. However, two lines of evidence challenge that view of protein evolution. First, analysis of sequence relationships within and between sets of complete genomes has established that a large proportion of protein sequence families are narrowly distributed in phylogenetic space and so appear to be relatively recent in origin. Second, analysis of the relationship between known protein structures shows that there are many more than a 1000 distinct folds, appearing to imply many more evolutionary lines. There are four hypotheses for the discrepancy between the traditional view and the observed structural and sequence distributions within protein families. Specifically, these are that apparently young protein families may arise from (1) previously non-coding DNA, or frame-shifted from existing coding sequence, (2) recombination of structural fragments between proteins or recombination with non-coding DNA, (3) older families where the rapid rate of sequence change makes relatives hard to detect, and (4) lateral gene transfer (LGT) from other organisms. In the investigation of these hypotheses, phylogenetic analysis provides a means of estimating the relative age of protein families and of detecting lateral gene transfer effects. Phylogeny based investigation of prokaryotic species divergence has generally been performed using a small number of families resulting in significant bias that affects age analysis. Therefore, we decided to use information from many protein families for constructing a species tree, utilizing a new procedure for combining these diverse sources. The resulting tree for 66 Prokaryotic species incorporates information from 1,379 protein families. The families were selected on the basis of consistent family evolutionary rates obtained using three different methods. Noise resistant methods were used to combat the effects of lateral gene transfer and some inevitable errors in protein sequence alignment and identification of orthologous families. Most topological features of the tree are robust as assessed by bootstrap testing, and previous distortions of inter-kingdom distances and poor determination of short branch lengths have been corrected. The tree is used to obtain estimates of the age of all protein families, key to the investigation of all four hypotheses. Proteins affected by LGT events were detected using a previously developed method, and removed before the age calculation. We used the estimated family ages obtained from the phylogenetic analysis to examine five properties of proteins as a function of the age of the corresponding families. The goal here is to ascertain whether the age dependence of these properties supports hypotheses (1) and (2) for the origin of apparently young families - that is, these are truly new open reading frames. The five properties are the mRNA expression level, relative evolutionary rate, predicted percentage of structural disorder, number of protein interaction partners and codon composition bias. The results are consistent with the new open reading frame model: Expression is found to increase substantially as a function of family age, suggesting that young proteins are not yet adapted sufficiently to tolerate high concentration conditions. The rate of change of amino acid change is faster for young proteins, consistent with overall positive selection for improved structural and functional properties. The fraction of predicted disorder is highest in the youngest proteins, consistent with immature structural properties. The number of known protein-protein interactions increases steadily with age, with low levels for young proteins, suggesting an ongoing process of increasing functional complexity. Analysis of these four factors is reported in Chapter 3. Results for the final factor, codon compositional bias, are reported in Chapter 4. Here we found that the codon composition of young proteins is markedly different from that of old proteins and similar to that of proteins constructed with random codon assignment. Thus the results are consistent with a model of many young proteins having newly formed open reading frames, and that during the subsequent evolution process, the codon composition is gradually optimized to fit the specific genomic conditions of the organism concerned. Overall, results for all five properties lend statistical support to the new open reading frame hypotheses. Further investigation is needed however. In particular, examination of the structural properties of young proteins, such as super-secondary structure composition and the distribution of use of rare and common structural fragments, should be useful

    Identification of β-Lactamase in Antibiotic-Resistant Bacillus cereus Spores▿

    No full text
    β-Lactamase type I is reported for the first time to occur in the sporulated form in a penicillin-resistant Bacillus species. The enzyme was readily characterized from the B. cereus 5/B line (ATCC 13061) by mass spectrometry and two-dimensional gel electrophoresis

    A comprehensive Thai pharmacogenomics database (TPGxD‐1): Phenotype prediction and variants identification in 942 whole‐genome sequencing data

    No full text
    Abstract Computational methods analyze genomic data to identify genetic variants linked to drug responses, thereby guiding personalized medicine. This study analyzed 942 whole‐genome sequences from the Electricity Generating Authority of Thailand (EGAT) cohort to establish a population‐specific pharmacogenomic database (TPGxD‐1) in the Thai population. Sentieon (version 201808.08) implemented the GATK best workflow practice for variant calling. We then annotated variant call format (VCF) files using Golden Helix VarSeq 2.5.0 and employed Stargazer v2.0.2 for star allele analysis. The analysis of 63 very important pharmacogenes (VIPGx) reveals 85,566 variants, including 13,532 novel discoveries. Notably, we identified 464 known PGx variants and 275 clinically relevant novel variants. The phenotypic prediction of 15 VIPGx demonstrated a varied metabolic profile for the Thai population. Genes like CYP2C9 (9%), CYP3A5 (45.2%), CYP2B6 (9.4%), NUDT15 (15%), CYP2D6 (47%) and CYP2C19 (43%) showed a high number of intermediate metabolizers; CYP3A5 (41%), and CYP2C19 (9.9%) showed more poor metabolizers. CYP1A2 (52.7%) and CYP2B6 (7.6%) were found to have a higher number of ultra‐metabolizers. The functional prediction of the remaining 10 VIPGx genes reveals a high frequency of decreased functional alleles in SULT1A1 (12%), NAT2 (84%), and G6PD (12%). SLCO1B1 reports 20% poor functional alleles, while PTGIS (42%), SLCO1B1 (4%), and TPMT (5.96%) showed increased functional alleles. This study discovered new variants and alleles in the 63 VIPGx genes among the Thai population, offering insights into advancing clinical pharmacogenomics (PGx). However, further validation is needed using other computational and genotyping methods
    corecore